R - create a column with records for only the first row of each subset

For example, if I have this data:

ID  Value   
1   2
1   2
1   3
1   4
1   10
2   9
2   9
2   12
2   13

      

And my goal is to find the smallest value for each subset of IDs, and I want the number to be on the first line of the id group, leaving the rest of the lines blank so that:

ID  Value   Start
1   2       2
1   2       
1   3       
1   4       
1   10      
2   9       9
2   9       
2   12      
2   13      

      

My first instinct is to create an index for ids using

A <- transform(A, INDEX=ave(ID, ID, FUN=seq_along)) ## A being the name of my data

      

Since I am a nob, I am stuck at this point. For each ID = n, I want to find the min (A $ value) for that subset of IDs and put it in the cell matching condition for ID = n and INDEX = 1.

Any help is greatly appreciated! I am sorry that I ask questions :(

+3


source to share


4 answers


Here's the solution:

within(A, INDEX <- "is.na<-"(ave(Value, ID, FUN = min), c(FALSE, !diff(ID))))

  ID Value INDEX
1  1     2     2
2  1     2    NA
3  1     3    NA
4  1     4    NA
5  1    10    NA
6  2     9     9
7  2     9    NA
8  2    12    NA
9  2    13    NA

      



Update:

How it works? The command ave(Value, ID, FUN = min)

applies the function min

to each subset Value

by value ID

. For example, it returns a vector five times 2

and four times 9

. Since all values ​​except the first in each subset must be NA

, the function "is.na<-"

replaces all values ​​in the logical index specified by c(FALSE, !diff(ID))

. This index TRUE

if the value is the same as the previous one.

+3


source


You're almost there. We just need to create a custom function instead seq_along

and split value

by ID

(not ID

by ID

).

first_min <- function(x){
  nas <- rep(NA, length(x))
  nas[which.min(x)] <- min(x, na.rm=TRUE)
  nas
}

      



This function creates a vector NA and replaces the first element with the minimum value value

.

transform(dat, INDEX=ave(Value, ID, FUN=first_min)) 

##   ID Value INDEX
## 1  1     2     2
## 2  1     2    NA
## 3  1     3    NA
## 4  1     4    NA
## 5  1    10    NA
## 6  2     9     9
## 7  2     9    NA
## 8  2    12    NA
## 9  2    13    NA

      

+2


source


You can achieve this with a one-line layout

df$Start<-as.vector(unlist(tapply(df$Value,df$ID,FUN = function(x){ return (c(min(x),rep("",length(x)-1)))})))

      

+1


source


I keep coming back to this question and the above answers helped me a lot. There is a basic solution for beginners:

A$Start<-NA A[!duplicated(A$ID),]$Start<-A[!duplicated(A$ID),]$Value

Thank.

0


source







All Articles