R: get one data frame from the zoo object list

I have a zoo-bject list consisting of irregular time series, lodf, in the following format:

> head(lodf)

[[1]]
2014-08-08 2014-08-14 2014-09-12  
  1.15       1.32       2.39 

[[2]]  
2014-07-22 2014-07-24 2014-08-14 2014-08-20 2014-08-27 2014-09-12 
  0.50       0.75       1.29       1.36       1.28       1.28   

[[3]]
2012-11-01 2012-11-02 2013-07-12 2013-08-13 2013-09-11 2014-07-01 
  1.00       1.27       0.91       1.00       0.99       0.98 

...

      

I end up trying to sum all these time series into one combined time series i.e. sum each column. For this I am trying to transform the time series zoo / xts for further manipulation, i.e. apply na.locf and other features of the zoo library, before summing the individual data frames / dates using rowsum. those. I am trying to get my list of date frames above into a combined zoo object similar to this:

           Value
12/09/2014  1.07
14/08/2014  1.32
08/08/2014  1.15
12/09/2014  0.48
27/08/2014  0.53
20/08/2014  0.61
14/08/2014  0.54
24/07/2014  0.75
22/07/2014  0.5
01/07/2014  0.98
01/07/2014  0
...

      

There is often interleaving between individual data frames, i.e. multiple values โ€‹โ€‹corresponding to the same date index, and what I would like to do in these cases is to sum the values. For example. if I have

012-11-01
  0.7

012-11-01
 1.5

012-11-01
 0.7

      

I would like to have

012-11-01
2.9

      

as the value for this date index in the resulting larger data frame.

I tried to merge, like zoo object, do.call (rbind) etc. in the current format, but I'm stumped. For further context, this question is part of a larger project described here: R: Time series with duplicate time index entries . Any help would be greatly appreciated!

Update: please find below data object:

> dput(head(lodf))
list(structure(c(1.15, 1.32, 2.39), index = structure(c(16290L, 
16296L, 16325L), class = "Date"), class = "zoo"), structure(c(0.5, 
0.75, 1.29, 1.36, 1.28, 1.28), index = structure(c(16273L, 16275L, 
16296L, 16302L, 16309L, 16325L), class = "Date"), class = "zoo"), 
structure(c(1, 1.27, 0.91, 1, 0.99, 0.98), index = structure(c(15645L, 
15646L, 15898L, 15930L, 15959L, 16252L), class = "Date"), class = "zoo"), 
structure(c(1.27, 1.29, 1.28, 1.17, 0.59, 0), index = structure(c(15645L, 
15651L, 15665L, 15679L, 15686L, 15747L), class = "Date"), class = "zoo"), 
structure(c(1.9, 1.35, 0.66, 1.16, 0.66, 1.16, 1.26, 1.23, 
1.28, 1.23, 1.17, 0.66, 1.18, 0.66, 1.29, 1.35, 1.45, 1.53, 
1.61, 1.82, 1.8, 1.89, 1.8, 1.81, 1.78, 1.68, 2.18, 1.68, 
1.56, 1.93, 1.84, 1.69, 1.18, 1.73, 1.18, 1.72, 1.83, 1.9, 
1.99, 1.93, 1.87, 1.96, 2.1, 2.22, 2.33, 2.38, 2.35, 2.23, 
2.16, 2.18, 2.17, 2.2, 2.29, 2.27, 2.28, 2.42, 2.48, 2.99, 
2.56, 2.65, 2.69, 3.21, 2.7, 2.8, 2.79, 2.8, 2.78, 2.26, 
2.78, 2.26, 2.12, 2.07, 1.97, 1.84, 1.77, 1.18, 1.7, 1.78, 
1.91, 1.98, 1.93, 1.83, 1.76, 1.18, 1.01, 0.97, 0.86, 0.69, 
0.56), index = structure(c(15645L, 15652L, 15660L, 15740L, 
15797L, 15841L, 15860L, 15867L, 15876L, 15887L, 15890L, 15897L, 
15901L, 15905L, 15908L, 15909L, 15910L, 15911L, 15915L, 15926L, 
15931L, 15932L, 15938L, 15953L, 15954L, 15975L, 15978L, 15979L, 
15981L, 15982L, 15985L, 15986L, 15987L, 16001L, 16003L, 16006L, 
16008L, 16010L, 16014L, 16016L, 16021L, 16022L, 16023L, 16027L, 
16029L, 16031L, 16045L, 16052L, 16059L, 16072L, 16077L, 16078L, 
16084L, 16091L, 16098L, 16100L, 16101L, 16106L, 16132L, 16133L, 
16134L, 16139L, 16146L, 16150L, 16153L, 16157L, 16160L, 16163L, 
16167L, 16169L, 16170L, 16171L, 16175L, 16177L, 16182L, 16184L, 
16212L, 16216L, 16220L, 16224L, 16248L, 16254L, 16258L, 16261L, 
16297L, 16301L, 16309L, 16310L, 16317L), class = "Date"), class = "zoo"), 
structure(c(3.35, 3.44, 3.41, 3.14, 3.11, 2.55, 2.65, 2.87, 
3.14, 3.24, 3.41, 4.04, 4.19, 4.34, 4.44, 1.2, 1.3, 1.29, 
1.3, 1.27, 0.77, 0.69, 0.55, 0), index = structure(c(15645L, 
15650L, 15694L, 15740L, 15741L, 15742L, 15743L, 15749L, 15750L, 
15751L, 15755L, 15756L, 15758L, 15762L, 15784L, 15800L, 15805L, 
15810L, 15824L, 15835L, 15838L, 15840L, 15847L, 15849L), class = "Date"), class = "zoo"))
> 

      

+3


source to share


3 answers


The input displayed at the top of the question is represented by the first three input components listed at the bottom of the question. The variable name used at the bottom of the question lodf

seems to suggest it contains a list of data frames, but it actually contains a list of zoo objects.

The question asks one result with a dataframe, but we assume the output should also be one zoo series for consistency. We will also use the name L

for input as it will lodf

incorrectly suggest a list of data frames. If z

is the result in the form of a zoo row, then

data.frame(index = index(z), data = coredata(z))

      

can be used if a data frame is really needed.

In the output section at the end of this answer, we will show the result of using as our input L <- lodf[1:3]

(i.e. only the first 3 components) and separately show the output using L <- lodf

(i.e. all components) as our input.

1) Reduce . We concatenate a series of zoos in a list, L

return the list and fill in the missing values โ€‹โ€‹with 0. Then use Reduce

to sum the components:

Reduce(`+`, do.call(merge, c(L, retclass = "list", fill = 0)))

      

1a) . A variation on this is to return the zoo object from merge

(which is the default if we don't specify retclass

), then fill it with NA with 0, turn it back into a list, and use Reduce

:

Reduce(`+`, as.list(na.fill(do.call(merge, L), 0)))

      

2) rowSums . In this solution, we concatenate lists to provide a zoo object z

, optionally add the column names, and then add rows that produce the zoo target.



z <- do.call(merge, L)
colnames(L) <- seq_along(L) # optionally add names
zoo(rowSums(z, na.rm = TRUE), time(z))

      

Please note that previously a rowSums

solution to the zoo objects was found here

3) + If we knew that there were exactly 3 components in the list, then an alternative way to write above would be this. We optionally add names 1, 2, 3, concatenate the zoo objects and fill NA with 0. Finally, we add the series together. Modify in an obvious way if the number of components is different.

z0 <- na.fill(do.call(merge, L), 0)
colnames(z0) <- 1:3 # optionally add names 1, 2, 3
z0[, 1] + z0[, 2] + z0[, 3]

      

Output Using L <- lodf[1:3]

as shown at the beginning of the question, where lodf

displayed at the bottom of the question, our output is:

2012-11-01 2012-11-02 2013-07-12 2013-08-13 2013-09-11 2014-07-01 2014-07-22 
      1.00       1.27       0.91       1.00       0.99       0.98       0.50 
2014-07-24 2014-08-08 2014-08-14 2014-08-20 2014-08-27 2014-09-12 
      0.75       1.15       2.61       1.36       1.28       3.67 

      

or using L <- locf

the above, we get the following (except for solution 3, which would have to be modified in an obvious way to use 6 rather than 3 components):

2012-11-01 2012-11-02 2012-11-06 2012-11-07 2012-11-08 2012-11-16 2012-11-21 
      7.52       1.27       3.44       1.29       1.35       0.66       1.28 
2012-12-05 2012-12-12 2012-12-20 2013-02-04 2013-02-05 2013-02-06 2013-02-07 
      1.17       0.59       3.41       4.30       3.11       2.55       2.65 
2013-02-11 2013-02-13 2013-02-14 2013-02-15 2013-02-19 2013-02-20 2013-02-22 
      0.00       2.87       3.14       3.24       3.41       4.04       4.19 
2013-02-26 2013-03-20 2013-04-02 2013-04-05 2013-04-10 2013-04-15 2013-04-29 
      4.34       4.44       0.66       1.20       1.30       1.29       1.30 
2013-05-10 2013-05-13 2013-05-15 2013-05-16 2013-05-22 2013-05-24 2013-06-04 
      1.27       0.77       0.69       1.16       0.55       0.00       1.26 
2013-06-11 2013-06-20 2013-07-01 2013-07-04 2013-07-11 2013-07-12 2013-07-15 
      1.23       1.28       1.23       1.17       0.66       0.91       1.18 
2013-07-19 2013-07-22 2013-07-23 2013-07-24 2013-07-25 2013-07-29 2013-08-09 
      0.66       1.29       1.35       1.45       1.53       1.61       1.82 
2013-08-13 2013-08-14 2013-08-15 2013-08-21 2013-09-05 2013-09-06 2013-09-11 
      1.00       1.80       1.89       1.80       1.81       1.78       0.99 
2013-09-27 2013-09-30 2013-10-01 2013-10-03 2013-10-04 2013-10-07 2013-10-08 
      1.68       2.18       1.68       1.56       1.93       1.84       1.69 
2013-10-09 2013-10-23 2013-10-25 2013-10-28 2013-10-30 2013-11-01 2013-11-05 
      1.18       1.73       1.18       1.72       1.83       1.90       1.99 
2013-11-07 2013-11-12 2013-11-13 2013-11-14 2013-11-18 2013-11-20 2013-11-22 
      1.93       1.87       1.96       2.10       2.22       2.33       2.38 
2013-12-06 2013-12-13 2013-12-20 2014-01-02 2014-01-07 2014-01-08 2014-01-14 
      2.35       2.23       2.16       2.18       2.17       2.20       2.29 
2014-01-21 2014-01-28 2014-01-30 2014-01-31 2014-02-05 2014-03-03 2014-03-04 
      2.27       2.28       2.42       2.48       2.99       2.56       2.65 
2014-03-05 2014-03-10 2014-03-17 2014-03-21 2014-03-24 2014-03-28 2014-03-31 
      2.69       3.21       2.70       2.80       2.79       2.80       2.78 
2014-04-03 2014-04-07 2014-04-09 2014-04-10 2014-04-11 2014-04-15 2014-04-17 
      2.26       2.78       2.26       2.12       2.07       1.97       1.84 
2014-04-22 2014-04-24 2014-05-22 2014-05-26 2014-05-30 2014-06-03 2014-06-27 
      1.77       1.18       1.70       1.78       1.91       1.98       1.93 
2014-07-01 2014-07-03 2014-07-07 2014-07-10 2014-07-22 2014-07-24 2014-08-08 
      0.98       1.83       1.76       1.18       0.50       0.75       1.15 
2014-08-14 2014-08-15 2014-08-19 2014-08-20 2014-08-27 2014-08-28 2014-09-04 
      2.61       1.01       0.97       1.36       2.14       0.69       0.56 
2014-09-12 

      

Updates . Additional solutions have been added and presentations reorganized and expanded.

+5


source


Try it (if the elements of the list are a list of objects zoo

and if you need to get the sum

corresponding index).

 library(xts)
 library(zoo)

  z1 <- setNames(do.call(`merge`, lodf), paste0("Value", seq_along(lodf)))
  xts(data.frame(value=rowSums(z1, na.rm=TRUE)), order.by=index(z1))
  #            value
  #2012-11-01  1.00
  #2012-11-02  1.27
  #2013-07-12  0.91
  #2013-08-13  1.00
  #2013-09-11  0.99
  #2014-07-01  0.98
  #2014-07-22  0.50
  #2014-07-24  0.75
  #2014-08-08  1.15
  #2014-08-14  2.61
  #2014-08-20  1.36
  #2014-08-27  1.28
  #2014-09-12  3.67

      

If you need to use na.locf

beforesumming



   z2 <- na.locf(z1)
   xts(data.frame(value=rowSums(z2, na.rm=TRUE)), order.by=index(z2))

      

data

  lodf <- list(structure(c(1.15, 1.32, 2.39), index = structure(c(16290, 
  16296, 16325), class = "Date"), class = "zoo"), structure(c(0.5, 
  0.75, 1.29, 1.36, 1.28, 1.28), index = structure(c(16273, 16275, 
  16296, 16302, 16309, 16325), class = "Date"), class = "zoo"), 
  structure(c(1, 1.27, 0.91, 1, 0.99, 0.98), index = structure(c(15645, 
  15646, 15898, 15930, 15959, 16252), class = "Date"), class = "zoo"))

      

+1


source


With base R:

lodf = list(structure(list(`014-08-08` = 1.15, `2014-08-14` = 1.32, 
    `2014-09-12` = 2.39), .Names = c("014-08-08", "2014-08-14", 
"2014-09-12"), class = "data.frame", row.names = c(NA, -1L)), 
    structure(list(`2014-07-22` = 0.5, `2014-07-24` = 0.75, `2014-08-14` = 1.29, 
        `2014-08-20` = 1.36, `2014-08-27` = 1.28, `2014-09-12` = 1.28), .Names = c("2014-07-22", 
    "2014-07-24", "2014-08-14", "2014-08-20", "2014-08-27", "2014-09-12"
    ), class = "data.frame", row.names = c(NA, -1L)), structure(list(
        `2012-11-01` = 1, `2012-11-02` = 1.27, `2013-07-12` = 0.91, 
        `2013-08-13` = 1, `2013-09-11` = 0.99, `2014-07-01` = 0.98), .Names = c("2012-11-01", 
    "2012-11-02", "2013-07-12", "2013-08-13", "2013-09-11", "2014-07-01"
    ), class = "data.frame", row.names = c(NA, -1L)))

lodf
[[1]]
  014-08-08 2014-08-14 2014-09-12
1      1.15       1.32       2.39

[[2]]
  2014-07-22 2014-07-24 2014-08-14 2014-08-20 2014-08-27 2014-09-12
1        0.5       0.75       1.29       1.36       1.28       1.28

[[3]]
  2012-11-01 2012-11-02 2013-07-12 2013-08-13 2013-09-11 2014-07-01
1          1       1.27       0.91          1       0.99       0.98


ddf = data.frame(full=character(), stringsAsFactors=F)
ll = unlist(lapply(lodf, function(x) paste(names(x), x, sep='_')))
ddf[1:length(ll),1]=ll
ddf
              full
1   014-08-08_1.15
2  2014-08-14_1.32
3  2014-09-12_2.39
4   2014-07-22_0.5
5  2014-07-24_0.75
6  2014-08-14_1.29
7  2014-08-20_1.36
8  2014-08-27_1.28
9  2014-09-12_1.28
10    2012-11-01_1
11 2012-11-02_1.27
12 2013-07-12_0.91
13    2013-08-13_1
14 2013-09-11_0.99
15 2014-07-01_0.98


ddf$date = unlist(lapply(strsplit(ddf$full, '_'),function(x)x[1]))
ddf$value = as.numeric(unlist(lapply(strsplit(ddf$full, '_'),function(x)x[2])))
ddf = ddf[,-1]
ddf
         date value
1   014-08-08  1.15
2  2014-08-14  1.32
3  2014-09-12  2.39
4  2014-07-22  0.50
5  2014-07-24  0.75
6  2014-08-14  1.29
7  2014-08-20  1.36
8  2014-08-27  1.28
9  2014-09-12  1.28
10 2012-11-01  1.00
11 2012-11-02  1.27
12 2013-07-12  0.91
13 2013-08-13  1.00
14 2013-09-11  0.99
15 2014-07-01  0.98

      

Finally:

aggregate(value~date, ddf, sum)
         date value
1  2012.11.01  1.00
2  2012.11.02  1.27
3  2013.07.12  0.91
4  2013.08.13  1.00
5  2013.09.11  0.99
6  2014.07.01  0.98
7  2014.07.22  0.50
8  2014.07.24  0.75
9  2014.08.08  1.15
10 2014.08.14  2.61
11 2014.08.20  1.36
12 2014.08.27  1.28
13 2014.09.12  3.67

      

+1


source







All Articles