Output the closest value from a sorted dataframe to an unsorted dataframe

I have two data frames in a R

. The first data frame is the cumulative frequency distribution ( cumFreqDist

) with associated periods. The first lines of the data frame look like this:

Time        cumfreq
0         0.0000000
4         0.9009009
6         1.8018018
8         7.5075075
12       23.4234234
16       39.6396396
18       53.4534535
20       58.2582583
24       75.3753754
100     100.0000000

      

The second data frame is 10,000 draws from the distribution runif

using the code:

 testData <- (runif(10000))*100

      

For each row in, testData

I want to search for a matching cumfreq

in cumFreqDist

and add the corresponding value Time

to a new column in testData

. Since testData

this is a test dataframe standing for a real dataframe, I don't want to sort testData

.

Since I am dealing with cumulative frequencies, if the value testData

is equal 23.30...

, the value Time

that should be returned is equal 8

. That is, I need to find the closest value cumfreq

that does not exceed the value testData

and only returns one value.

The package data.table

is mentioned for other similar questions, but my limited understanding is that this package requires a key to be identified in both data frames (after converting to data tables), and I cannot assume that I testData

meet the requirements for being assigned as a key - and it seems that the key assignment will sort the data. This will cause me problems when I set the seed later in the further work I do.

+3


source to share


1 answer


findInterval()

perfect for this:



set.seed(1);
cumFreqDist <- data.frame(Time=c(0,4,6,8,12,16,18,20,24,100), cumfreq=c(0.0000000,0.9009009,1.8018018,7.5075075,23.4234234,39.6396396,53.4534535,58.2582583,75.3753754,100.0000000) );
testData <- data.frame(x=runif(10000)*100);
testData$Time <- cumFreqDist$Time[findInterval(testData$x,cumFreqDist$cumfreq)];
head(testData,20);
##            x Time
## 1  26.550866   12
## 2  37.212390   12
## 3  57.285336   18
## 4  90.820779   24
## 5  20.168193    8
## 6  89.838968   24
## 7  94.467527   24
## 8  66.079779   20
## 9  62.911404   20
## 10  6.178627    6
## 11 20.597457    8
## 12 17.655675    8
## 13 68.702285   20
## 14 38.410372   12
## 15 76.984142   24
## 16 49.769924   16
## 17 71.761851   20
## 18 99.190609   24
## 19 38.003518   12
## 20 77.744522   24

      

+5


source







All Articles