The loop needs to run forever
I have two tables. Everyone has information from 2012 to 2014 with a period of 3 hours. It looks like this:
B C
1 01.06.2012 00:00 10 0
2 01.06.2012 03:00 10 0
3 01.06.2012 06:00 10 6
4 01.06.2012 09:00 7,5 0
5 01.06.2012 12:00 6 2,5
6 01.06.2012 15:00 6 0
7 01.06.2012 18:00 4 0
8 01.06.2012 21:00 4 0
9 02.06.2012 00:00 0 0
10 02.06.2012 03:00 0 0
Another table is the same time, but sampled for 1 minute:
1 01.06.2012 00:00
2 01.06.2012 00:01
3 01.06.2012 00:01
4 01.06.2012 00:03
5 01.06.2012 00:03
6 01.06.2012 00:05
7 01.06.2012 00:05
8 01.06.2012 00:07
9 01.06.2012 00:08
10 01.06.2012 00:09
11 01.06.2012 00:10
Now I need the values of the 2nd and 3rd rows of the second table to correlate with the first, so that if the timestamp from the second table is between timestamp(i)
and the timestamp(i+1)
first table, it will accept B(i)
and C(i)
and copy them. I have this code and I know it works, but it takes over 12 hours to run and I have many files like this that I need to work with in the same way.
clouds <- read.csv('~/2012-2014 clouds info.csv', sep=";", header = FALSE)
cloudFull <- read.csv('~/2012-2014 clouds.csv', sep=";", header = FALSE)
for (i in 1:nrow(cloudFull)){
dateOne <- strptime(cloudFull[i,1], '%d.%m.%Y %H:%M')
for (j in 1:nrow(clouds)){
bottomDate = strptime(clouds[j,1], '%d.%m.%Y %H:%M')
upperDate = strptime(clouds[j+1,1], '%d.%m.%Y %H:%M')
if ((dateOne >= bottomDate) && (dateOne < upperDate)) {
cloudFull[i,2] <- clouds[j,2]
cloudFull[i,3] <- clouds[j,3]
break
}
}
}
write.csv(cloudFull, file = 'cc.csv')
Now how do I make it run faster? object.size(cloudFull)
gives me 39580744
bytes, has strings 470000
, but other files will have even more data. I'm just starting out with R (only worked 2 days in it) and I would appreciate any advice in a very simple language: D
source to share
Difficult to know what your real data looks like, but line by line
full <- strptime(cloudFull[,1], '%d.%m.%Y %H:%M')
ref <- strptime(clouds[,1], '%d.%m.%Y %H:%M')
## ref <- sort(ref)
cloudsFull[, 2:3] <- clouds[findInterval(full, ref), 2:3]
Usage findInterval()
changes the task to one that scales linearly, not quadratically.
source to share