Output the closest value from a sorted dataframe to an unsorted dataframe
I have two data frames in a R
. The first data frame is the cumulative frequency distribution ( cumFreqDist
) with associated periods. The first lines of the data frame look like this:
Time cumfreq
0 0.0000000
4 0.9009009
6 1.8018018
8 7.5075075
12 23.4234234
16 39.6396396
18 53.4534535
20 58.2582583
24 75.3753754
100 100.0000000
The second data frame is 10,000 draws from the distribution runif
using the code:
testData <- (runif(10000))*100
For each row in, testData
I want to search for a matching cumfreq
in cumFreqDist
and add the corresponding value Time
to a new column in testData
. Since testData
this is a test dataframe standing for a real dataframe, I don't want to sort testData
.
Since I am dealing with cumulative frequencies, if the value testData
is equal 23.30...
, the value Time
that should be returned is equal 8
. That is, I need to find the closest value cumfreq
that does not exceed the value testData
and only returns one value.
The package data.table
is mentioned for other similar questions, but my limited understanding is that this package requires a key to be identified in both data frames (after converting to data tables), and I cannot assume that I testData
meet the requirements for being assigned as a key - and it seems that the key assignment will sort the data. This will cause me problems when I set the seed later in the further work I do.
source to share
findInterval()
perfect for this:
set.seed(1);
cumFreqDist <- data.frame(Time=c(0,4,6,8,12,16,18,20,24,100), cumfreq=c(0.0000000,0.9009009,1.8018018,7.5075075,23.4234234,39.6396396,53.4534535,58.2582583,75.3753754,100.0000000) );
testData <- data.frame(x=runif(10000)*100);
testData$Time <- cumFreqDist$Time[findInterval(testData$x,cumFreqDist$cumfreq)];
head(testData,20);
## x Time
## 1 26.550866 12
## 2 37.212390 12
## 3 57.285336 18
## 4 90.820779 24
## 5 20.168193 8
## 6 89.838968 24
## 7 94.467527 24
## 8 66.079779 20
## 9 62.911404 20
## 10 6.178627 6
## 11 20.597457 8
## 12 17.655675 8
## 13 68.702285 20
## 14 38.410372 12
## 15 76.984142 24
## 16 49.769924 16
## 17 71.761851 20
## 18 99.190609 24
## 19 38.003518 12
## 20 77.744522 24
source to share