Is there an easy way to connect unique data points in a data frame?
I want to extract pairs of data from a dataframe where they are connected to data that is not in their own column. Each number in column 1 is paired with all numbers to the right of that column. Likewise, numbers in column 2 are linked only to numbers in columns 3 or higher.
I've created a script that does this using a bird's nest for 'for' loops, but I believe there should be a more elegant way to do this.
Sample data:
structure(list(A = 1:3, B = 4:6, C = 7:9), .Names = c("A", "B",
"C"), class = "data.frame", row.names = c(NA, -3L))
Desired output:
structure(list(X1 = c(1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 3,
3, 3, 3, 3, 3, 4, 4, 4, 5, 5, 5, 6, 6, 6), X2 = c(4, 5, 6, 7,
8, 9, 4, 5, 6, 7, 8, 9, 4, 5, 6, 7, 8, 9, 7, 8, 9, 7, 8, 9, 7,
8, 9)), .Names = c("X1", "X2"), row.names = c(NA, 27L), class = "data.frame")
source to share
Here's an approach using a package data.table
and its very efficient features CJ
and rbindlist
(assuming your dataset is named df
)
library(data.table)
res <- rbindlist(lapply(seq_len(length(df) - 1),
function(i) CJ(df[, i], unlist(df[, -(seq_len(i))]))))
Then you can specify the column names by reference (if you insist on "X1" and "X2") using setnames
setnames(res, 1:2, c("X1", "X2"))
You can also convert back to data.frame
by reference (if you want to specify exactly what you want ") withsetDF()
setDF(res)
source to share
Another approach:
res <- do.call(rbind, unlist(lapply(seq(ncol(dat) - 1), function(x)
lapply(seq(x + 1, ncol(dat)), function(y)
"names<-"(expand.grid(dat[c(x, y)]), c("X1", "X2")))),
recursive = FALSE))
where dat
is the name of your dataframe.
You can sort the result with this command:
res[order(res[[1]], res[[2]]), ]
source to share