How to add variable (column) name before every row of row in column
I would like to add the column name to each string of characters in the column. Here's a small data frame to use.
df <-structure(list(CoA = c("Baton Rouge", "Birmingham", "Chattanooga",
"Columbia", "Houston"), CoB = c("Haddonfield, NJ", "Haddonfield, NJ",
"Philadelphia, PA", "Hackensack, NJ", "Princeton, NJ"), CoC = c("St. Louis, Missouri",
"Kansas City, Missouri", "Jefferson City, Missouri", "Belleville, Illinois",
"Overland Park, Kansas")), .Names = c("CoA", "CoB", "CoC"), row.names = c(NA,
-5L), class = "data.frame")
I tried the following, but R recycles the object company and the df object.
company <- colnames(df)
new <- sapply(df, function(x) paste(company, x, sep = ", "))
This is what I want, but for all columns:
paste(colnames(df[1]), df$CoA, sep = ", ")
[1] "CoA, Baton Rouge" "CoA, Birmingham" "CoA, Chattanooga" "CoA, Columbia" "CoA, Houston"
I tried various regexes and didn't get it anywhere. How do I get sapply
to perform an insert operation on each column?
Thank you for your help.
source to share
Here's a possible solution:
mx <- sapply(colnames(df),function(name){ paste(name,df[,name],sep=", ")})
> mx
CoA CoB CoC
[1,] "CoA, Baton Rouge" "CoB, Haddonfield, NJ" "CoC, St. Louis, Missouri"
[2,] "CoA, Birmingham" "CoB, Haddonfield, NJ" "CoC, Kansas City, Missouri"
[3,] "CoA, Chattanooga" "CoB, Philadelphia, PA" "CoC, Jefferson City, Missouri"
[4,] "CoA, Columbia" "CoB, Hackensack, NJ" "CoC, Belleville, Illinois"
[5,] "CoA, Houston" "CoB, Princeton, NJ" "CoC, Overland Park, Kansas"
Note that it sapply
returns a matrix; if you data.frame
just want to doas.data.frame(mx)
Explanation:
sapply
applies a function to every element of the vector / list passed in the first argument X
(in this case we are passing through colnames(df)
).
The function to be applied to each element is passed as an argument FUN
.
In this case, we pass the following function to FUN
:
function(name){
paste(name,df[,name],sep=", ")
# equivalent to return(paste(name,df[,name],sep=", "))
}
this function is called for every element colname(df)
and every element is passed as the first argument (i.e. the argument name
).
So using name
(remember one column name) we select the column df
, we add the column name with a function paste
and return the resulting row vector.
The rest of the function remains sapply
, which automatically binds each resulting vector to one matrix (since simplify=TRUE
by default, otherwise a list of vectors will be returned, as it does using lapply
)
EDIT :
As @hadley correctly pointed out, the result sapply
with is simplify=TRUE
not always the same (for example it changes if you only have one row or one column).
So this is a safer solution:
df2 <- as.data.frame(sapply(colnames(df),
function(name){ paste(name,df[,name],sep=", ")},
simplify=F))
> df2
CoA CoB CoC
1 CoA, Baton Rouge CoB, Haddonfield, NJ CoC, St. Louis, Missouri
2 CoA, Birmingham CoB, Haddonfield, NJ CoC, Kansas City, Missouri
3 CoA, Chattanooga CoB, Philadelphia, PA CoC, Jefferson City, Missouri
4 CoA, Columbia CoB, Hackensack, NJ CoC, Belleville, Illinois
5 CoA, Houston CoB, Princeton, NJ CoC, Overland Park, Kansas
source to share