Categorize multiple lines into one variable

Simple question, but apparently not answered yet at StO.

I have a long dataframe where there are three columns:

person | trip |  driver
=======================
1       car 
1       bike
1       train
1       walk
2       walk
2       train
2       boat

      

I would like to fill in the "driver" column so that it reads 1 if at least one of the journeys is by car, 0 otherwise:

person | driver
================
1       1 
1       1
1       1
1       1
2       0
2       0
2       0

      

I have a slight preference for doing this without repeating fancy packages, but I'm happy with most of the popular ones (e.g. plyr, data.table, sqldf ....) or even new ones that prove useful in the long run.

Thanks in advance, .p.

+3


source to share


1 answer


We could use data.table

, convert 'data.frame' to 'data.table' ( setDT(df1)

), we check if any

'car' is in 'trip' grouped by 'person', convert inference to numeric ( +0L

or wrap with as.numeric

) and assign ( :=

) the "driver" column. If necessary, we can remove the "trip" column by assigning it NULL

or a subset[, c(1,3), with=FALSE]

library(data.table)
setDT(df1)[, driver := any(trip == 'car')+0L, by = person][, trip := NULL]

      

Or instead any

, we can use max(trip=='car')

like @Arun mentioned in the comments

setDT(df1)[, driver := max(trip == 'car'), by = person]

      




Or using similar logic as above, we group_by

'person' and create a new column with mutate

and remove unneeded columns withselect

library(dplyr)
df1 %>%
   group_by(person) %>% 
   mutate(driver= any(trip=='car')+0L) %>%
   select(-trip)

      




Or with, base R

we can use ave

to create a "driver" and then subset

remove the "trip" column.

df1$driver <- with(df1, ave(trip=='car', person, FUN=any)+0L)
subset(df1, select=-trip)

      

+4


source







All Articles