Subset of data using conditions and saving each subset as a new dataframe
I have a dataset that consists of different dimensions in each column and the last column consists of the values ββ(0,1,2).
for example, let's say my dataframe looks like this (ignore v1: v5 values)
1. v1 v2 v3 v4 v5 v6 2. 24 76 98 89 87 2 3. 24 76 98 89 87 2 4. 24 76 98 89 87 1 5. 24 76 98 89 87 2 6. 24 76 98 89 87 2
I am interested in column v6 values ββand I want to extract rows where the value is 2. In the above example, I would like to extract the first 2 rows and store them as a new dataframe, and also extract the 5th and 6th rows as another data block and save that too. To be clearer, when my values ββare 2 and are consequtive, I need them to be saved as a new dataframe. when the value is different, I need the loop to ignore it and find the value of interest again (which is 2). If my dataframe has 70 consequtive 2 blocks in the last column, I need 70 dataframes in total.
I've tried for loop, but I'm pretty new to R and programming and I'm stuck.
this is what i have tried so far:
>
>
>
>x=1
>for (i in 1:nrow(dataframe)) {
>
> if (dataframe[i,lastcolumn] == 2 && x==1) {
>
> start.event <- dataframe[i,]
>
> }
>
> if (dataframe[i,lastcolumn] != 2) {
> end.event <- dataframe[i-1,]
>
> }
>
> else {
>
> df[1] <- dataframe( start.event:end.event , )
> x = 1
> }
> }
I would really appreciate any help.
Thank you in advance
source to share
One way is to create groups ( grp
) based on the change v6
. Filter all lines where v6 != 2
and divide bygrp
new_d <- subset(transform(df, grp = cumsum(c(1, diff(v6) != 0))), v6 == 2)
split(new_d, new_d$grp)
#$`1`
# v1 v2 v3 v4 v5 v6 grp
#1 24 76 98 89 87 2 1
#2 24 76 98 89 87 2 1
#$`3`
# v1 v2 v3 v4 v5 v6 grp
#4 24 76 98 89 87 2 3
#5 24 76 98 89 87 2 3
Or through dplyr
,
library(dplyr)
new_d <- df %>%
mutate(grp = cumsum(c(1, diff(v6) != 0))) %>%
filter(v6 == 2)
split(new_d, new_d$grp)
USED ββDATA
structure(list(v1 = c(24L, 24L, 24L, 24L, 24L), v2 = c(76L, 76L,
76L, 76L, 76L), v3 = c(98L, 98L, 98L, 98L, 98L), v4 = c(89L,
89L, 89L, 89L, 89L), v5 = c(87L, 87L, 87L, 87L, 87L), v6 = c(2L,
2L, 1L, 2L, 2L)), .Names = c("v1", "v2", "v3", "v4", "v5", "v6"
), class = "data.frame", row.names = c(NA, -5L))
source to share
Here is one way to use the R base
#use rle to set indicator variable for groups of 2
rl <- rle(df$v6)
rl$values <- cumsum(rl$lengths==2)
df$ind <- inverse.rle(rl)
#filter out other values from df
df <- df[df$v6==2,]
#split by indicator (and remove it)
dflist <- split(df[,-ncol(df)],df$ind)
dflist #elements of list are named after number of 2-group
$`1`
v1 v2 v3 v4 v5 v6
2. 24 76 98 89 87 2
3. 24 76 98 89 87 2
$`2`
v1 v2 v3 v4 v5 v6
5. 24 76 98 89 87 2
6. 24 76 98 89 87 2
source to share