Categorize multiple lines into one variable
Simple question, but apparently not answered yet at StO.
I have a long dataframe where there are three columns:
person | trip | driver
=======================
1 car
1 bike
1 train
1 walk
2 walk
2 train
2 boat
I would like to fill in the "driver" column so that it reads 1 if at least one of the journeys is by car, 0 otherwise:
person | driver
================
1 1
1 1
1 1
1 1
2 0
2 0
2 0
I have a slight preference for doing this without repeating fancy packages, but I'm happy with most of the popular ones (e.g. plyr, data.table, sqldf ....) or even new ones that prove useful in the long run.
Thanks in advance, .p.
source to share
We could use data.table
, convert 'data.frame' to 'data.table' ( setDT(df1)
), we check if any
'car' is in 'trip' grouped by 'person', convert inference to numeric ( +0L
or wrap with as.numeric
) and assign ( :=
) the "driver" column. If necessary, we can remove the "trip" column by assigning it NULL
or a subset[, c(1,3), with=FALSE]
library(data.table)
setDT(df1)[, driver := any(trip == 'car')+0L, by = person][, trip := NULL]
Or instead any
, we can use max(trip=='car')
like @Arun mentioned in the comments
setDT(df1)[, driver := max(trip == 'car'), by = person]
Or using similar logic as above, we group_by
'person' and create a new column with mutate
and remove unneeded columns withselect
library(dplyr)
df1 %>%
group_by(person) %>%
mutate(driver= any(trip=='car')+0L) %>%
select(-trip)
Or with, base R
we can use ave
to create a "driver" and then subset
remove the "trip" column.
df1$driver <- with(df1, ave(trip=='car', person, FUN=any)+0L)
subset(df1, select=-trip)
source to share