Categorize multiple lines into one variable

Question

Categorize multiple lines into one variable

Simple question, but apparently not answered yet at StO.

I have a long dataframe where there are three columns:

person | trip |  driver
=======================
1       car 
1       bike
1       train
1       walk
2       walk
2       train
2       boat

I would like to fill in the "driver" column so that it reads 1 if at least one of the journeys is by car, 0 otherwise:

person | driver
================
1       1 
1       1
1       1
1       1
2       0
2       0
2       0

I have a slight preference for doing this without repeating fancy packages, but I'm happy with most of the popular ones (e.g. plyr, data.table, sqldf ....) or even new ones that prove useful in the long run.

Thanks in advance, .p.

+3

r categories

user3310782 02 june 15 at 12:46

source to share

1 answer

akrun · Accepted Answer · 2015-06-02T12:48:12+0000

We could use data.table

, convert 'data.frame' to 'data.table' ( setDT(df1)

), we check if any

'car' is in 'trip' grouped by 'person', convert inference to numeric ( +0L

or wrap with as.numeric

) and assign ( :=

) the "driver" column. If necessary, we can remove the "trip" column by assigning it NULL

or a subset[, c(1,3), with=FALSE]

library(data.table)
setDT(df1)[, driver := any(trip == 'car')+0L, by = person][, trip := NULL]

Or instead any

, we can use max(trip=='car')

like @Arun mentioned in the comments

setDT(df1)[, driver := max(trip == 'car'), by = person]

Or using similar logic as above, we group_by

'person' and create a new column with mutate

and remove unneeded columns withselect

library(dplyr)
df1 %>%
   group_by(person) %>% 
   mutate(driver= any(trip=='car')+0L) %>%
   select(-trip)

Or with, base R

we can use ave

to create a "driver" and then subset

remove the "trip" column.

df1$driver <- with(df1, ave(trip=='car', person, FUN=any)+0L)
subset(df1, select=-trip)

Categorize multiple lines into one variable

More articles: