Splitting a data frame to create new columns
I have a data frame with columns for "Count", "Transect Number", "Data" and "Year". My goal is to split the dataframe by Transect, then again by Year and create a new dataframe with a column for "Transect" and then the corresponding data for the year in the following columns.
To create a dummy data frame:
Count1<-1:27 Count2<-1:30 Count3<-1:25 T1<-c(1,1,1,2,2,2,3,3,3,1,1,1,2,2,2,3,3,3,1,1,1,2,2,2,3,3,3) T2<-c(1,1,1,2,2,2,3,3,3,1,1,1,2,2,2,3,3,3,1,1,1,1,2,2,2,2,3,3,3,3) T3<-c(1,1,1,2,2,2,3,3,3,1,1,1,2,2,2,3,3,3,1,1,1,2,2,2,3) Data1<-c(1,2,3,2,1,2,3,4,3,2,1,2,3,4,3,2,1,2,3,4,5,4,3,2,3,3,2) Data2<-c(1,2,3,2,1,4,3,2,1,2,4,3,2,3,4,3,2,3,4,5,6,4,3,2,1,4,5,4,3,2) Data3<-c(1,2,3,4,5,4,3,3,3,4,5,4,3,3,2,3,4,5,4,3,4,3,2,3,4) Year1<-c(2014,2014,2014,2014,2014,2014,2014,2014,2014,2015,2015,2015,2015,2015,2015,2015,2015,2015,2016,2016,2016,2016,2016,2016,2016,2016,2016) Year2<-c(2014,2014,2014,2014,2014,2014,2014,2014,2014,2015,2015,2015,2015,2015,2015,2015,2015,2015,2016,2016,2016,2016,2016,2016,2016,2016,2016,2016,2016,2016) Year3<-c(2014,2014,2014,2014,2014,2014,2014,2014,2014,2015,2015,2015,2015,2015,2015,2015,2015,2015,2016,2016,2016,2016,2016,2016,2016) DF1<-data.frame(Count1,T1,Data1,Year1) colnames(DF1)<-c("Count","Transect","Data","Year") DF2<-data.frame(Count2,T2,Data2,Year2) colnames(DF2)<-c("Count","Transect","Data","Year") DF3<-data.frame(Count3,T3,Data3,Year3) colnames(DF3)<-c("Count","Transect","Data","Year") All<-rbind(DF1,DF2,DF3)
Once I have a dataframe I thought about splitting the data into a transect as this would be a permanent aspect of my current dataset.
#Step 1-Break down by T Trans1<-All[All$Transect==1,] Trans2<-All[All$Transect==2,] Trans3<-All[All$Transect==3,] Trans4<-All[All$Transect==4,] Trans5<-All[All$Transect==5,]
But in the next step, I'm a little less clear. I need to pull data from a Data column, organized by year. Something similar to further data destruction:
Trans1_Year1<-Trans1[Trans1$Year==2014,] Trans2_Year1<-Trans2[Trans2$Year==2014,] Trans3_Year1<-Trans3[Trans3$Year==2014,] Trans4_Year1<-Trans4[Trans4$Year==2014,] Trans5_Year1<-Trans5[Trans5$Year==2014,]
or even using split
But I would rather not write the code like above as I hope to add new data every year as this dataset moves forward. And I would like the code to be able to accommodate new Year data as it is added, rather than writing new lines of code every year.
Once I have my data set like this, I would like to create a second data frame with columns for each year. One problem is that each year contains a different number of lines, which was a problem for me. But my end result would have columns:
"Transect", "Data 2014", "Data 2015", "Data 2016"
Since each year can have a different number of lines inside the transom, I would like to leave NA at the end of each Transect section when the number of lines in each cut differs between years.
source to share
It looks like you have been basically trying to convert your data to semi-wide format with columns for years, not "long" format.
If so, you'd be better off adding an extra index column that shows the repeated combination of "Transect" and "Year".
This can be easily done with the help
from my splitstackshape package. "splitstackshape" also loads "data.table" from which you can then use
to get a wide format.
library(splitstackshape) dcast.data.table(getanID(All, c("Transect", "Year")), Transect + .id ~ Year, value.var = "Data")
Then, if you really want to split the "Transect" column, you can go ahead and use
, but since you now have "data.table" it would be better to stick with that and take advantage of its many convenient features, including subset-related and aggregation.
source to share