How to multiply everything after the first line containing a numeric value
Subset question. I would prefer to use the built-in R functions, but not required. I believe the solution is simple, but Im new to R.
Here's some sample data:
df <- data.frame(year = c("2001", "2002", "2003", "2004", "2005", "2006"),
C1 = c("a", "b", "c", "d", "e", "f"),
C2 = c(NA, NA, 35, 20, NA, 50),
C3=1:6)
The result looks like this:
year C1 C2 C3
1 2001 a NA 1
2 2002 b NA 2
3 2003 c 35 3
4 2004 d 20 4
5 2005 e NA 5
6 2006 f 50 6
I want to select all columns starting from the first row with a numeric value (i.e.> 0) in column C2, so my output looks like this:
year C1 C2 C3
1 2003 c 35 3
2 2004 d 20 4
3 2005 e NA 5
4 2006 f 50 6
Note that the NA in column C2 row 3 is not excluded (which is desirable). I tried the following, but it excludes NA lines:
new_df=subset(df, C2>0)
I've also tried this but doesn't work:
new_df=subset(df, C2>0 | is.na(C2))
source to share
Using the R database, you can create a custom function that takes as input a dataframe and a column that you want to use for signing,
f1 <- function(df, x){
i1 <- which(is.na(x))
v1 <- i1==1
l2 <- c(v1[1], diff(i1) == 1)
ifelse(v1, return(df[-which(l2),]),
return(df))
}
#apply the function
f1(df, df$C2)
what gives,
year C1 C2 C3
3 2003 c 35 3
4 2004 d 20 4
5 2005 e NA 5
6 2006 f 50 6
source to share
Here is an option using tidyverse
library(dplyr)
df %>%
slice(which(!is.na(C2)):n())
# A tibble: 4 x 4
# year C1 C2 C3
# <fctr> <fctr> <dbl> <int>
#1 2003 c 35 3
#2 2004 d 20 4
#3 2005 e NA 5
#4 2006 f 50 6
Or using cumsum/filter
df %>%
filter(cumsum(!is.na(C2))>0)
# year C1 C2 C3
#1 2003 c 35 3
#2 2004 d 20 4
#3 2005 e NA 5
#4 2006 f 50 6
These methods can also be accomplished with base R
df[cumsum(!is.na(df$C2)) > 0,]
source to share