Insert elements of two vectors alphabetically into r
Let's say I have two vectors:
a<-c("george", "harry", "harry", "chris", "steve", "steve", "steve", "harry")
b<-c("harry", "steve", "chris", "harry", "harry", "george", "chris", "george")
What I want to do is insert the 1st pair, the second pair, etc. together. However, I want to insert two elements of each pair in alphabetical order. In the example above, the first 2 pairs are already in alphabetical order, but the third pair "harry" and "chris" are not. I want to return "chris harry" for this couple.
I worked out how to do this in a two step process, but was wondering if there was a quick way (one line path) to do this just by using paste
?
My decision:
x <- apply(mapply(c, a, b, USE.NAMES = FALSE),2,sort)
paste(x[1,],x[2,])
which gives the pairs alphabetically ... but is there a single line path?
[1] "george harry" "harry steve" "chris harry" "chris harry" "harry steve" "george steve" "chris steve" "george harry"
source to share
Here's one approach:
apply(cbind(a, b), 1, function(x) paste(sort(x), collapse=" "))
## [1] "george harry" "harry steve" "chris harry" "chris harry"
## [5] "harry steve" "george steve" "chris steve" "george harry"
Using your initial try, you can also do the following, but they both require more input (not sure about speed):
unlist(Map(function(x, y) paste(sort(c(x, y)), collapse=" "), a, b),,FALSE)
mapply(function(x, y) paste(sort(c(x, y)), collapse=" "), a, b, USE.NAMES = FALSE)
source to share
Here's a similar method for Tyler, but with Map
. Technically it's a one-liner ...
unlist(Map(function(x,y) {
paste(sort(c(x,y)), collapse = " ")
}, a, b, USE.NAMES = FALSE))
# [1] "george harry" "harry steve" "chris harry" "chris harry"
# [5] "harry steve" "george steve" "chris steve" "george harry"
source to share
One liner from your own code:
apply(data.frame(apply(mapply(c, a, b, USE.NAMES = FALSE),1,paste)),1,function(x) paste(x[1],x[2]))
[1] "george harry" "harry steve" "harry chris" "chris harry" "steve harry" "steve george" "steve chris" "harry george"
apply(apply(mapply(c, a, b, USE.NAMES = FALSE),2,sort),1,paste)
[,1] [,2]
[1,] "george" "harry"
[2,] "harry" "steve"
[3,] "chris" "harry"
[4,] "chris" "harry"
[5,] "harry" "steve"
[6,] "george" "steve"
[7,] "chris" "steve"
[8,] "george" "harry"
source to share
Here is a speed comparison of the above answers ...
I took data from my own dataset of all English football matches that have been played in four football league divisions, which are available here: https://github.com/jalapic/engsoccerdata
The dataset is "engsoccerdata" and I used the 3rd and 4th columns (home and guest team) to combine. I converted each column to a character vector. Each vector has 188,060 elements. Between 1888 and 2014, there were 188,060 football matches at the top four levels of English football.
Here's a comparison:
df<-engsoccerdata
a<-as.character(df[,3])
b<-as.character(df[,4])
#tyler1
system.time(apply(cbind(a, b), 1, function(x) paste(sort(x), collapse=" ")))
#tyler2
unlist(Map(function(x, y) paste(sort(c(x, y)), collapse=" "), a, b),,FALSE)
#tyler3
mapply(function(x, y) paste(sort(c(x, y)), collapse=" "), a, b, USE.NAMES = FALSE)
#baptiste1
paste(pmin(a,b), pmax(a,b))
#baptiste2
ifelse(a < b, paste(a, b), paste(b, a))
#RichardS
unlist(Map(function(x,y) {
paste(sort(c(x,y)), collapse = " ")
}, a, b, USE.NAMES = FALSE))
#rnso1
apply(data.frame(apply(mapply(c, a, b, USE.NAMES = FALSE),1,paste)),1,function(x) paste(x[1],x[2]))
#rnso2
apply(apply(mapply(c, a, b, USE.NAMES = FALSE),2,sort),1,paste)
System.time () result:
# user system elapsed
#tyler1 42.92 0.02 43.73
#tyler2 14.68 0.03 15.04
#tyler3 14.78 0.00 14.88
#baptiste1 0.79 0.00 0.84
#baptiste2 1.25 0.00 1.28
#RichardS 15.40 0.01 15.64
#rnso1 6.22 0.10 6.41
#rnso2 13.07 0.00 13.15
Very interesting. Baptist's methods were lightning bolts!
source to share