How can I make my dataset continuous over time? [R]

Question

How can I make my dataset continuous over time? [R]

I have a dataset for x, y dates and times.

My initial dataset:

x    y   date    time
1    2    1-1-01  15:00
2    5    1-1-01  17:00
3    1    1-1-01  18:00
5    7    1-1-01  21:00
2    6    1-1-01  22:00
6    3    1-1-01  23:00
9    2    2-1-01  01:00
6    1    2-1-01  04:00
.....

I want it like:

x    y   date    time
1    2    1-1-01  15:00
n/a n/a   1-1-01  16:00
2    5    1-1-01  17:00
3    1    1-1-01  18:00
n/a n/a   1-1-01  19:00
n/a n/a   1-1-01  20:00
5    7    1-1-01  21:00
2    6    1-1-01  22:00
6    3    1-1-01  23:00
n/a n/a   2-1-01  00:00
9    2    2-1-01  01:00
n/a n/a   2-1-01  02:00
n/a n/a   2-1-01  03:00
6    1    2-1-01  04:00
.....

How can I fill in n / a values?

I tried to use xspline function to interpolate "x" and "y"

plot(df[,2:1])
xspline(df[,2:1], shape=-0.3, lwd=1)

Using this graph I can find values for n / a or is there another way to find values for n / a?

+3

r

user4993868 Jul 25 15 at 7:28 am

source to share

2 answers

about getting the required table

you can do it in base r:

Data

in.data <- read.table(text='x    y    date    time
1    2    1-1-01  15:00
2    5    1-1-01  17:00
3    1    1-1-01  18:00
5    7    1-1-01  21:00
2    6    1-1-01  22:00
6    3    1-1-01  23:00
9    2    2-1-01  1:00
6    1    2-1-01  4:00
', header=TRUE)

times <- paste0(0:23,':00')
dates <- paste0(1:2,'-1-01')

create desired table

all.dt <- expand.grid(date=dates,time=times)

big.data <- merge(all.dt, in.data, all.x=TRUE)

about filling nas:

tools provided by zoo

They have many functions to solve this problem: na.approx

, na.spline

and na.locf

. For example.

library(zoo)
big.data <- within(big.data,{
         x <- na.approx(x,na.rm=FALSE)
         y <- na.approx(y,na.rm=FALSE)
})

big.data then contains:

     date  time        x        y
1  1-1-01  0:00       NA       NA
2  1-1-01  1:00       NA       NA
...
15 1-1-01 14:00       NA       NA
16 1-1-01 15:00 1.000000 2.000000
17 1-1-01 16:00 1.500000 3.500000
18 1-1-01 17:00 2.000000 5.000000
19 1-1-01 18:00 3.000000 1.000000
20 1-1-01 19:00 3.666667 3.000000
21 1-1-01 20:00 4.333333 5.000000
22 1-1-01 21:00 5.000000 7.000000
23 1-1-01 22:00 2.000000 6.000000
24 1-1-01 23:00 6.000000 3.000000
25 2-1-01  0:00 7.500000 2.500000
26 2-1-01  1:00 9.000000 2.000000
27 2-1-01  2:00 8.000000 1.666667
28 2-1-01  3:00 7.000000 1.333333
29 2-1-01  4:00 6.000000 1.000000
30 2-1-01  5:00       NA       NA
31 2-1-01  6:00       NA       NA
...

+2

bdecaf Jul 25 15 at 8:10

source to share

akrun · Accepted Answer · 2015-07-25T07:40:35+0000

We can create another dataset with the time sequence grouped by date and join the original dataset. This can be done using devel

version data.table

. Devel version installation instructions:here

library(data.table)
DT <- setDT(df1)[, {tmp <- as.numeric(substr(time,1,2))
  list(time=sprintf('%02d:00', min(tmp):max(tmp)))}, date]
df1[DT, on=c('date', 'time')]
# x  y   date  time
# 1:  1  2 1-1-01 15:00
# 2: NA NA 1-1-01 16:00
# 3:  2  5 1-1-01 17:00
# 4:  3  1 1-1-01 18:00
# 5: NA NA 1-1-01 19:00
# 6: NA NA 1-1-01 20:00
# 7:  5  7 1-1-01 21:00
# 8:  2  6 1-1-01 22:00
# 9:  6  3 1-1-01 23:00
#10:  9  2 2-1-01 01:00
#11: NA NA 2-1-01 02:00
#12: NA NA 2-1-01 03:00
#13:  6  1 2-1-01 04:00

Or if we want to create a "time" 00

before 23

hours, then delete lines that are NA before the first non-NA value in 'x' and 'y', and similar for lines that are NA after the last non-NA

 DT <- setDT(df1)[, list(time=sprintf('%02d:00', 0:23)) , date]
 res <- df1[DT, on=c('date', 'time')
             ][,{tmp <- which(!(is.na(x) & is.na(y)))
            .SD[tmp[1L]:tmp[length(tmp)]]}]
 res 
 # x  y   date  time
 #1:  1  2 1-1-01 15:00
 #2: NA NA 1-1-01 16:00
 #3:  2  5 1-1-01 17:00
 #4:  3  1 1-1-01 18:00
 #5: NA NA 1-1-01 19:00
 #6: NA NA 1-1-01 20:00
 #7:  5  7 1-1-01 21:00
 #8:  2  6 1-1-01 22:00
 #9:  6  3 1-1-01 23:00
 #10:NA NA 2-1-01 00:00
 #11: 9  2 2-1-01 01:00
 #12:NA NA 2-1-01 02:00
 #13:NA NA 2-1-01 03:00
 #14: 6  1 2-1-01 04:00

I haven't read the last part. If you need to populate the NA values as mentioned in @bdecaf's post (and the same one I commented and removed earlier), you can use na.approx

fromlibrary(zoo)

library(zoo)
res[, c('x', 'y') :=lapply(.SD, na.approx), .SDcols= x:y]
#           x        y   date  time
# 1: 1.000000 2.000000 1-1-01 15:00
# 2: 1.500000 3.500000 1-1-01 16:00
# 3: 2.000000 5.000000 1-1-01 17:00
# 4: 3.000000 1.000000 1-1-01 18:00
# 5: 3.666667 3.000000 1-1-01 19:00
# 6: 4.333333 5.000000 1-1-01 20:00
# 7: 5.000000 7.000000 1-1-01 21:00
# 8: 2.000000 6.000000 1-1-01 22:00
# 9: 6.000000 3.000000 1-1-01 23:00
#10: 7.500000 2.500000 2-1-01 00:00
#11: 9.000000 2.000000 2-1-01 01:00
#12: 8.000000 1.666667 2-1-01 02:00
#13: 7.000000 1.333333 2-1-01 03:00
#14: 6.000000 1.000000 2-1-01 04:00

data

df1 <- structure(list(x = c(1L, 2L, 3L, 5L, 2L, 6L, 9L, 6L), y = c(2L, 
5L, 1L, 7L, 6L, 3L, 2L, 1L), date = c("1-1-01", "1-1-01", "1-1-01", 
"1-1-01", "1-1-01", "1-1-01", "2-1-01", "2-1-01"), time = c("15:00", 
"17:00", "18:00", "21:00", "22:00", "23:00", "01:00", "04:00"
)), .Names = c("x", "y", "date", "time"), class = "data.frame",
row.names = c(NA, -8L))

How can I make my dataset continuous over time? [R]

data

about getting the required table

about filling nas:

More articles: