How can I change a specific field in a list of data frames?

Question

How can I change a specific field in a list of data frames?

Suppose I am writing the following R code:

first.value <- sample(100, 100, replace=TRUE)
second.value <- sample(10, 100, replace=TRUE)

X <- data.frame(first.value, second.value)
split.X <- split(X, second.value)

This code creates a data frame with two fields and splits into cells according to the second. Now suppose I wanted to normalize every bit; that is, subtract the mean and divide by the standard deviation. I could accomplish this

normalized.first.value <- sapply(split.X, function(X) {(X$first.value - mean(X$first.value)) / sd(X$first.value)})

But this creates a new list with normalized versions of each bin. I really want to replace the copy of the data in split.X

my normalized version.

To illustrate here some examples:

> first.value <- sample(100, 100, replace=TRUE)
> second.value <- sample(10, 100, replace=TRUE)
> X <- data.frame(first.value, second.value)
> split.X <- split(X, second.value)
> normalized.first.value <- sapply(split.X, function(X) {(X$first.value - mean(X$first.value)) / sd(X$first.value)})
> split.X[[1]]
   first.value second.value
4           34            1
8           40            1
24          21            1
31          34            1
37          23            1
40          22            1
> normalized.first.value[[1]]
[1]  0.625  1.375 -1.000  0.625 -0.750 -0.875

What I really want to do is put the values normalized.first.value[[1]]

in split.X[[1]]$first.value

, and the same for the other indices.

This can be achieved with a loop for

like this:

for (i in 1:length(split.X)) {
  split.X[[i]]$first.value <- (split.X[[i]]$first.value - mean(split.X[[i]]$first.value) / sd(split.X[[i]]$first.value);
}

But loops are for

BAD in R, and I would like to use sapply

, lapply

etc. if possible. Unfortunately when working with a list of dataframes sapply

and lapply

doesn't seem to be repeated the way I want.

+3

r

John gowers Jul 24 15 at 12:12

source to share

2 answers

Here's a more arcane way (although I still think the loop for

is fine in this case)

new.split.X <- mapply(`[<-`, split.X, T, 'first.value', normalized.first.value,
                      SIMPLIFY=F)

How it works: Applies [<-

to everyone split.X[[i]]

. T

is the index i

to replace (i.e. All of them), 'first.value'

is the index j

to replace (this column), normalized.first.value

contains replacements.

The loop may be easier to read at the end, although it may not be slower than complex solutions *apply

.

library(rbenchmark)
benchmark(loop={
    for (i in 1:length(split.X))
        split.X[[i]]$first.value <- normalized.first.value[[i]]
  },
  mapply={
    mapply(`[<-`, split.X, T, 'first.value', normalized.first.value,
                          SIMPLIFY=F)
  },
  Map={
    Map(function(x,y) {x[['first.value']] <- y;x} ,split.X, normalized.first.value)
  },
  lapply={
    lapply(seq_along(split.X), function(i) {
             x1 <- split.X[[i]]
             x1[,'first.value'] <- normalized.first.value[[i]]
             x1})
  })
    test replications elapsed relative user.self sys.self user.child sys.child
4 lapply          100   0.034    4.857     0.035        0          0         0
1   loop          100   0.007    1.000     0.007        0          0         0
3    Map          100   0.012    1.714     0.013        0          0         0
2 mapply          100   0.030    4.286     0.032        0          0         0

So the explicit loop is the fastest, and the easieset to read anyway.

+2

mathematical.coffee Jul 24 15 at 12:58

source to share

akrun · Accepted Answer · 2015-07-24T12:14:42+0000

You can use Map

as both lists are the same length. It works by replacing the first column "split.X" with the corresponding element list

in "normalized.first.value"

  Map(function(x,y) {x[['first.value']] <- y;x} ,split.X, normalized.first.value)

Or we can skip the length of "split.X", get the list items "split.X" and "normalized.first.value" based on the index, and then replace.

  lapply(seq_along(split.X), function(i) {
             x1 <- split.X[[i]]
             x1[,'first.value'] <- normalized.first.value[[i]]
             x1})

How can I change a specific field in a list of data frames?

More articles: