How to multiply everything after the first line containing a numeric value

Subset question. I would prefer to use the built-in R functions, but not required. I believe the solution is simple, but Im new to R.

Here's some sample data:

df <- data.frame(year = c("2001", "2002", "2003", "2004", "2005", "2006"),
              C1 = c("a", "b", "c", "d", "e", "f"), 
              C2 = c(NA, NA, 35, 20, NA, 50),
              C3=1:6)

      

The result looks like this:

  year C1 C2 C3
1 2001  a NA  1
2 2002  b NA  2
3 2003  c 35  3
4 2004  d 20  4
5 2005  e NA  5
6 2006  f 50  6

      

I want to select all columns starting from the first row with a numeric value (i.e.> 0) in column C2, so my output looks like this:

  year C1 C2 C3
1 2003  c 35  3
2 2004  d 20  4
3 2005  e NA  5
4 2006  f 50  6

      

Note that the NA in column C2 row 3 is not excluded (which is desirable). I tried the following, but it excludes NA lines:

new_df=subset(df, C2>0)

      

I've also tried this but doesn't work:

new_df=subset(df, C2>0 | is.na(C2))

      

+3


source to share


3 answers


df[which(!is.na(df$C2))[1]:nrow(df),]

      

Output:



  year C1 C2 C3
3 2003  c 35  3
4 2004  d 20  4
5 2005  e NA  5
6 2006  f 50  6

      

+2


source


Using the R database, you can create a custom function that takes as input a dataframe and a column that you want to use for signing,

f1 <- function(df, x){
  i1 <- which(is.na(x))
  v1 <- i1==1
  l2 <- c(v1[1], diff(i1) == 1)
  ifelse(v1, return(df[-which(l2),]), 
         return(df))
}

#apply the function
f1(df, df$C2)

      



what gives,

  year C1 C2 C3
3 2003  c 35  3
4 2004  d 20  4
5 2005  e NA  5
6 2006  f 50  6

      

+2


source


Here is an option using tidyverse

library(dplyr)
df %>%
   slice(which(!is.na(C2)):n())
# A tibble: 4 x 4
#    year     C1    C2    C3
#  <fctr> <fctr> <dbl> <int>
#1   2003      c    35     3
#2   2004      d    20     4
#3   2005      e    NA     5
#4   2006      f    50     6

      


Or using cumsum/filter

df %>%
     filter(cumsum(!is.na(C2))>0)
#  year C1 C2 C3
#1 2003  c 35  3
#2 2004  d 20  4
#3 2005  e NA  5
#4 2006  f 50  6

      

These methods can also be accomplished with base R

df[cumsum(!is.na(df$C2)) > 0,]

      

+2


source







All Articles