Reordering rows in a data frame according to the order of rows in another piece of data

I am a new R user and new to StackOverflow. I will do my best to ask your question concisely and clearly and apologize if it is not conveyed in the best possible way.

I am working with two data files. I want to change the row order of one data frame so that it is identical to the row order in the second data core, so I can add data from one to the other, and their formats are the same. The column I want to reorder the rows is the column with the string character IDs of the different scopes.

The first dataframe "dfverif" looks like (briefly) like

Variable Value  
DAFQX   9   
DAFQX   9   
DAFQX   9   
DAFQX   9   
DAHEI   9   
DAHEI   9   
DAHEI   9   
DAHEI   9   
BAARG   9       
BAARG   9       
BAARG   9   
BAARG   9   
CBUCG   9   
CBUCG   9   
CBUCG   9   
CBUCG   9   
DALZZ   9   
DALZZ   9   
DALZZ   9   
DALZZ   9   

      

The second dataframe "dfmax" looks like

variable value
DALZZ   2.14
DALZZ   2.02
DALZZ   2.04
CBUCG   1.83
CBUCG   2.09
CBUCG   1.96
CBUCG   1.98
DAHEI   2.25
DAHEI   2.05
DAHEI   2.08
DAFQX   2.12
DAFQX   2.12
DAFQX   2.04
BAARG   2.12
BAARG   2.56
BAARG   2.56

      

I want to reorder the rows of the second data frame in terms of the row order of the character vector in the first data frame. But there are a lot of duplicate rows because it is time series data, so I cannot use match and I cannot remove duplicates because they contain the data I need. In addition, the second block of data is much smaller than the first (these are the maximum values โ€‹โ€‹of the time series data, not raw observations). I know what limits cbind and rbind, but rbind.fill and cbindX can be used if needed, although I'm not sure if they are here. Actually these dataframes have more columns, but I've only included 2 here for brevity.

Based on the question here Order the rows of the dataframe according to the target vector, which gives the desired order

I have tried doing this code

target <- dfverif
idx <- sapply(target,function(x){
which(dfmax$variable==x)
})
idx <- unlist(idx) ##I added this because the code gave me errors because idx is classified as a list so R couldn't do the dfmax[idx,] component
dfmax <- dfmax[idx,]
rownames(dfmist) <- NULL

      

But now when I do head (dfmax) I get

[1] V1 V2
<0 rows> (or 0-length row.names)

      

Which I can't figure out and when I do str (dfmax) I get the same character order as before, nothing has changed. Am I barking the wrong tree? Is there any other way to approach this that I am not aware of? Or am I trying to get this function wrong?

Thanks for your time and help.

+3


source to share


1 answer


I do not agree that match

it cannot be used. It returns a possibly not unique result, but you didn't say anything about the need for a secondary sort, and if you did, it could easily be added as a second argument to order

. I tested this on various subsets of the second data block shown, including one that only had one instance of each of the instances variable

.

Length differences shouldn't be a problem. Here I first demonstrate ordering d2 ('dfmax', shorter) to d1 ('dfverif', longer), and then ordering d1 to d2:



d2[ order(match(d2$variable, d1$Variable)), ]
   variable value
11    DAFQX  2.12
12    DAFQX  2.12
13    DAFQX  2.04
8     DAHEI  2.25
9     DAHEI  2.05
10    DAHEI  2.08
14    BAARG  2.12
15    BAARG  2.56
16    BAARG  2.56
4     CBUCG  1.83
5     CBUCG  2.09
6     CBUCG  1.96
7     CBUCG  1.98
1     DALZZ  2.14
2     DALZZ  2.02
3     DALZZ  2.04
d1[ order(match(d1$Variable, d2$variable)), ]

   Variable Value
17    DALZZ     9
18    DALZZ     9
19    DALZZ     9
20    DALZZ     9
13    CBUCG     9
14    CBUCG     9
15    CBUCG     9
16    CBUCG     9
5     DAHEI     9
6     DAHEI     9
7     DAHEI     9
8     DAHEI     9
1     DAFQX     9
2     DAFQX     9
3     DAFQX     9
4     DAFQX     9
9     BAARG     9
10    BAARG     9
11    BAARG     9
12    BAARG     9

      

+5


source







All Articles