Rename a column of data in the list with its name dataframe

I have multiple dataframes / tibbles with the same exact structure, but with different content. Their name is the only way I can tell them apart. The goal is to combine them all together into one data block with a column of factors. The original dataframes have one column for each hour / dimension, so I want to collect everything first.

Imagine columns 5 through 11 from mtcars df are my hourly columns.

mt1 <- mtcars
mt2 <- mtcars
mt3 <- mtcars
mt4 <- mtcars

mtlist <- list(m1 = mt1,
               m2 = mt2,
               m3 = mt3,
               m4 = mt4)

require(tidyverse)

mtlist_tidy <- lapply(mtlist, function(x){
  df <- x %>%
    gather(exp, temp_name, 5:11)

  return(df)
})

      

Now I am stuck. I need to rename the "temp_name" column in each of the dfs inside mtlist_tidy with the name of this df ie M1, m2, etc .:

> head(mtlist_tidy$m1)
   mpg cyl disp  hp  exp temp_name
1 21.0   6  160 110 drat      3.90
2 21.0   6  160 110 drat      3.90
3 22.8   4  108  93 drat      3.85
4 21.4   6  258 110 drat      3.08
5 18.7   8  360 175 drat      3.15
6 18.1   6  225 105 drat      2.76

      

should become

> head(mtlist_tidy$m1)
   mpg cyl disp  hp  exp      m1
1 21.0   6  160 110 drat      3.90
2 21.0   6  160 110 drat      3.90
3 22.8   4  108  93 drat      3.85
4 21.4   6  258 110 drat      3.08
5 18.7   8  360 175 drat      3.15
6 18.1   6  225 105 drat      2.76

      

Then it purrr::reduce(mtlist_tidy, full_join)

will work by completing my task.

I suppose there should be a solution using only purrr

and skipping it, but I am not familiar with this package yet.

+3


source to share


3 answers


Some ideas:

First, to approach the problem since you are the current one, you can use map2

to loop through the list and list names at the same time. You can then name the new columns by their list names with gather_

(for standard evaluation).

map2(mtlist, names(mtlist), ~gather_(.x, "exp", .y, names(.x)[5:11]) )

      

Note that the next version of purrr will have imap

as short circuit to loop through the list and list names. In addition, the next version of tidyr will use tidyeval

and gather_

will be deprecated.



Second, you can store things in a long format using map_df

for loop, not for lapply

. map_df

uses bind_rows

at the end under the hood, and you can include a grouping variable for each list with an argument .id

.

mtlist %>%
    map_df(~.x %>% gather("exp", "temp_name", 5:11), .id = "name" )

      

To accommodate your dataset in a wide format, you can use spread

. This example requires a little more work because some identifying variables, such as hp

and disp

, have the same meaning across multiple lines.

mtlist %>%
    map_df(~.x %>% gather("exp", "temp_name", 5:11), .id = "name" ) %>%
    group_by(name) %>%
    mutate( rows = 1:n() ) %>%
    spread(name, temp_name)

      

+4


source


You might want to add some NSE magic:



library(rlang)
mtlist_tidy %>% map2(., names(.), ~rename(.x, UQ(sym(.y)) := temp_name))

      

+2


source


Does it do it?

lapply(mtlist_tidy, function(i) {
  names(i)["temp_name"] <- names(mtlist)[i] } )

      

0


source







All Articles