Insert elements of two vectors alphabetically into r

Let's say I have two vectors:

a<-c("george", "harry", "harry", "chris", "steve", "steve", "steve", "harry")
b<-c("harry", "steve", "chris", "harry", "harry", "george", "chris", "george")

      

What I want to do is insert the 1st pair, the second pair, etc. together. However, I want to insert two elements of each pair in alphabetical order. In the example above, the first 2 pairs are already in alphabetical order, but the third pair "harry" and "chris" are not. I want to return "chris harry" for this couple.

I worked out how to do this in a two step process, but was wondering if there was a quick way (one line path) to do this just by using paste

?

My decision:

x <- apply(mapply(c, a, b, USE.NAMES = FALSE),2,sort)
paste(x[1,],x[2,])

      

which gives the pairs alphabetically ... but is there a single line path?

[1] "george harry" "harry steve"  "chris harry"  "chris harry"  "harry steve"  "george steve" "chris steve"  "george harry"

      

+3


source to share


5 answers


Here's one approach:

apply(cbind(a, b), 1, function(x) paste(sort(x), collapse=" "))

## [1] "george harry" "harry steve"  "chris harry"  "chris harry"  
## [5] "harry steve" "george steve" "chris steve"  "george harry"

      



Using your initial try, you can also do the following, but they both require more input (not sure about speed):

unlist(Map(function(x, y) paste(sort(c(x, y)), collapse=" "), a, b),,FALSE)
mapply(function(x, y) paste(sort(c(x, y)), collapse=" "), a, b, USE.NAMES = FALSE)

      

+3


source


a bit redundant because it is sorted twice but vectorized,

paste(pmin(a,b), pmax(a,b))

      



Edit: option c ifelse

,

ifelse(a < b, paste(a, b), paste(b, a))

      

+2


source


Here's a similar method for Tyler, but with Map

. Technically it's a one-liner ...

unlist(Map(function(x,y) {
    paste(sort(c(x,y)), collapse = " ")
    }, a, b, USE.NAMES = FALSE))
# [1] "george harry" "harry steve"  "chris harry"  "chris harry" 
# [5] "harry steve"  "george steve" "chris steve"  "george harry"

      

0


source


One liner from your own code:

apply(data.frame(apply(mapply(c, a, b, USE.NAMES = FALSE),1,paste)),1,function(x) paste(x[1],x[2]))
[1] "george harry" "harry steve"  "harry chris"  "chris harry"  "steve harry"  "steve george" "steve chris"  "harry george"


apply(apply(mapply(c, a, b, USE.NAMES = FALSE),2,sort),1,paste)

     [,1]     [,2]   
[1,] "george" "harry"
[2,] "harry"  "steve"
[3,] "chris"  "harry"
[4,] "chris"  "harry"
[5,] "harry"  "steve"
[6,] "george" "steve"
[7,] "chris"  "steve"
[8,] "george" "harry"

      

0


source


Here is a speed comparison of the above answers ...

I took data from my own dataset of all English football matches that have been played in four football league divisions, which are available here: https://github.com/jalapic/engsoccerdata

The dataset is "engsoccerdata" and I used the 3rd and 4th columns (home and guest team) to combine. I converted each column to a character vector. Each vector has 188,060 elements. Between 1888 and 2014, there were 188,060 football matches at the top four levels of English football.

Here's a comparison:

df<-engsoccerdata

a<-as.character(df[,3])
b<-as.character(df[,4])

#tyler1
system.time(apply(cbind(a, b), 1, function(x) paste(sort(x), collapse=" ")))

#tyler2
unlist(Map(function(x, y) paste(sort(c(x, y)), collapse=" "), a, b),,FALSE)

#tyler3
mapply(function(x, y) paste(sort(c(x, y)), collapse=" "), a, b, USE.NAMES = FALSE)

#baptiste1
paste(pmin(a,b), pmax(a,b))

#baptiste2
ifelse(a < b, paste(a, b), paste(b, a))  

#RichardS
unlist(Map(function(x,y) {
  paste(sort(c(x,y)), collapse = " ")
}, a, b, USE.NAMES = FALSE))


#rnso1
apply(data.frame(apply(mapply(c, a, b, USE.NAMES = FALSE),1,paste)),1,function(x) paste(x[1],x[2]))

#rnso2
apply(apply(mapply(c, a, b, USE.NAMES = FALSE),2,sort),1,paste) 

      

System.time () result:

#              user  system elapsed 
#tyler1       42.92    0.02   43.73 
#tyler2       14.68    0.03   15.04
#tyler3       14.78    0.00   14.88 
#baptiste1     0.79    0.00    0.84 
#baptiste2     1.25    0.00    1.28 
#RichardS     15.40    0.01   15.64
#rnso1         6.22    0.10    6.41
#rnso2        13.07    0.00   13.15 

      

Very interesting. Baptist's methods were lightning bolts!

0


source







All Articles