The loop needs to run forever

I have two tables. Everyone has information from 2012 to 2014 with a period of 3 hours. It looks like this:

                    B   C
1   01.06.2012 00:00    10  0   
2   01.06.2012 03:00    10  0   
3   01.06.2012 06:00    10  6   
4   01.06.2012 09:00    7,5 0   
5   01.06.2012 12:00    6   2,5 
6   01.06.2012 15:00    6   0   
7   01.06.2012 18:00    4   0   
8   01.06.2012 21:00    4   0   
9   02.06.2012 00:00    0   0   
10  02.06.2012 03:00    0   0 

      

Another table is the same time, but sampled for 1 minute:

1   01.06.2012 00:00       
2   01.06.2012 00:01       
3   01.06.2012 00:01       
4   01.06.2012 00:03       
5   01.06.2012 00:03       
6   01.06.2012 00:05       
7   01.06.2012 00:05       
8   01.06.2012 00:07       
9   01.06.2012 00:08       
10  01.06.2012 00:09       
11  01.06.2012 00:10

      

Now I need the values ​​of the 2nd and 3rd rows of the second table to correlate with the first, so that if the timestamp from the second table is between timestamp(i)

and the timestamp(i+1)

first table, it will accept B(i)

and C(i)

and copy them. I have this code and I know it works, but it takes over 12 hours to run and I have many files like this that I need to work with in the same way.

clouds <- read.csv('~/2012-2014 clouds info.csv', sep=";", header = FALSE)
cloudFull <- read.csv('~/2012-2014 clouds.csv', sep=";", header = FALSE)

for (i in 1:nrow(cloudFull)){
  dateOne <- strptime(cloudFull[i,1], '%d.%m.%Y %H:%M')

  for (j in 1:nrow(clouds)){
    bottomDate = strptime(clouds[j,1], '%d.%m.%Y %H:%M')
    upperDate = strptime(clouds[j+1,1], '%d.%m.%Y %H:%M')
    if  ((dateOne >= bottomDate) && (dateOne < upperDate)) {
      cloudFull[i,2] <- clouds[j,2]
      cloudFull[i,3] <- clouds[j,3]
      break

    } 

  }
}

write.csv(cloudFull, file = 'cc.csv')

      

Now how do I make it run faster? object.size(cloudFull)

gives me 39580744

bytes, has strings 470000

, but other files will have even more data. I'm just starting out with R (only worked 2 days in it) and I would appreciate any advice in a very simple language: D

+3


source to share


1 answer


Difficult to know what your real data looks like, but line by line

full <- strptime(cloudFull[,1], '%d.%m.%Y %H:%M')
ref <- strptime(clouds[,1], '%d.%m.%Y %H:%M')
## ref <- sort(ref)
cloudsFull[, 2:3] <- clouds[findInterval(full, ref), 2:3]

      



Usage findInterval()

changes the task to one that scales linearly, not quadratically.

+4


source







All Articles