R - create a column with records for only the first row of each subset
For example, if I have this data:
ID Value
1 2
1 2
1 3
1 4
1 10
2 9
2 9
2 12
2 13
And my goal is to find the smallest value for each subset of IDs, and I want the number to be on the first line of the id group, leaving the rest of the lines blank so that:
ID Value Start
1 2 2
1 2
1 3
1 4
1 10
2 9 9
2 9
2 12
2 13
My first instinct is to create an index for ids using
A <- transform(A, INDEX=ave(ID, ID, FUN=seq_along)) ## A being the name of my data
Since I am a nob, I am stuck at this point. For each ID = n, I want to find the min (A $ value) for that subset of IDs and put it in the cell matching condition for ID = n and INDEX = 1.
Any help is greatly appreciated! I am sorry that I ask questions :(
source to share
Here's the solution:
within(A, INDEX <- "is.na<-"(ave(Value, ID, FUN = min), c(FALSE, !diff(ID))))
ID Value INDEX
1 1 2 2
2 1 2 NA
3 1 3 NA
4 1 4 NA
5 1 10 NA
6 2 9 9
7 2 9 NA
8 2 12 NA
9 2 13 NA
Update:
How it works? The command ave(Value, ID, FUN = min)
applies the function min
to each subset Value
by value ID
. For example, it returns a vector five times 2
and four times 9
. Since all values except the first in each subset must be NA
, the function "is.na<-"
replaces all values in the logical index specified by c(FALSE, !diff(ID))
. This index TRUE
if the value is the same as the previous one.
source to share
You're almost there. We just need to create a custom function instead seq_along
and split value
by ID
(not ID
by ID
).
first_min <- function(x){
nas <- rep(NA, length(x))
nas[which.min(x)] <- min(x, na.rm=TRUE)
nas
}
This function creates a vector NA and replaces the first element with the minimum value value
.
transform(dat, INDEX=ave(Value, ID, FUN=first_min))
## ID Value INDEX
## 1 1 2 2
## 2 1 2 NA
## 3 1 3 NA
## 4 1 4 NA
## 5 1 10 NA
## 6 2 9 9
## 7 2 9 NA
## 8 2 12 NA
## 9 2 13 NA
source to share