R - fill in missing values (spaces) based on values of the same row but different columns
I am using R and have the following sample dataframe in which all variables are factors:
first second third
social birth control high
birth control high
medical Anorexia Nervosa low
medical Anorexia Nervosa low
Alcoholism high
family Alcoholism high
Basically, I need a function to help me fill in the blanks in the first column based on the values in the second and third columns. For example, if I have “birth control” in the second column and “high” in the third column, I need to fill in the gap in the first column with “social”. If it is "alcoholism" and "high" in the second and third columns respectively, then I need to fill in the blanks in the first column with "family".
source to share
From the data shown, it is not entirely clear if you have other values in 'first' for each combination of 'second' and 'third'. If there is only one value and you need to replace ''
with this, you can try
library(data.table)
setDT(df1)[, replace(first, first=='', first[first!='']),
list(second, third)]
Or a more efficient method would be
setDT(df1)[, first:= first[first!=''] , list(second, third)]
# first second third
#1: social birth control high
#2: social birth control high
#3: medical Anorexia Nervosa low
#4: medical Anorexia Nervosa low
#5: family Alcoholism high
#6: family Alcoholism high
data
df1 <- structure(list(first = c("social", "", "medical", "medical",
"", "family"), second = c("birth control", "birth control",
"Anorexia Nervosa",
"Anorexia Nervosa", "Alcoholism", "Alcoholism"), third = c("high",
"high", "low", "low", "high", "high")), .Names = c("first", "second",
"third"), class = "data.frame", row.names = c(NA, -6L))
source to share
One way is to create a lookup list of some type (for example, either with a named vector, factor
or something similar) and then replace any values ""
with values from the lookup list.
Here's an example (although I think your problem is not fully defined and perhaps oversimplified).
library(dplyr)
library(tidyr)
mydf %>%
unite(condition, second, third, remove = FALSE) %>%
mutate(condition = factor(condition,
c("birth control_high", "Anorexia Nervosa_low", "Alcoholism_high"),
c("social", "medical", "family"))) %>%
mutate(condition = as.character(condition)) %>%
mutate(first = replace(first, first == "", condition[first == ""])) %>%
select(-condition)
# first second third
# 1 social birth control high
# 2 social birth control high
# 3 medical Anorexia Nervosa low
# 4 medical Anorexia Nervosa low
# 5 family Alcoholism high
# 6 family Alcoholism high
The data.table approach will follow the same steps, but will take advantage of being modified by reference rather than copying.
library(data.table)
as.data.table(mydf)[
, condition := sprintf("%s_%s", second, third)][
, condition := as.character(
factor(condition,
c("birth control_high", "Anorexia Nervosa_low", "Alcoholism_high"),
c("social", "medical", "family")))][
first == "", first := condition][
, condition := NULL][]
source to share
Another approach dplyr
using @akrun very nice solution
library(dplyr)
df1 %>% group_by(second, third) %>%
mutate(first=replace(first, first=='', first[first!=''])) %>% ungroup
Data
df1 <- structure(list(first = c("social", "", "medical", "medical",
"", "family"), second = c("birth control", "birth control",
"Anorexia Nervosa",
"Anorexia Nervosa", "Alcoholism", "Alcoholism"), third = c("high",
"high", "low", "low", "high", "high")), .Names = c("first", "second",
"third"), class = "data.frame", row.names = c(NA, -6L))
source to share