How to hide dcast function in reshape package in R
As a relatively new user of R, I am having trouble with any of the looping functions. I've looked at many tutorials, but the examples are usually very simple and therefore easy to follow. However, I need to create slightly more complex loops and it is very difficult for me to figure out how to do this. There are several loop related questions in this and other forums, but none of them match exactly what I need, and while I tried to adapt other answers for my current problem, I keep running into errors.
I have 2000 .csv files with data pushed into long format data (simplified example):
> sol1 sol2 Istat
> s1 s2 0.435
> s1 s3 0.456
> s1 s4 0.845
> s1 s5 0.234
This is basically a summary of the pairwise comparisons of the 2000 individual solutions I have, with the similarity of the solutions summed up in the "Istat" value.
I am trying to dcast each of these 2000 CSV files into a wide spreadsheet (using the reshape package in R) so that they look like this:
s1 s2 s3 s4 s5
s1 NA 0.435 0.456 0.845 0.234
I know how to do this only once with a single CSV file:
stat.cast <- dcast(solution1, sol2 ~ sol1, value.var="Istat")
But I cannot think of it as a loop function for
or even with lapply
, which seems to be a possible solution as well.
The closest I was able to get the function for
# Get files from directory
loopout = "/Users/jc219806/Documents/Chapter 1/ANALYSES/R work/Istat/last_LoopOut/"
# List of file names inside folder
solutions <- list.files(loopout)
# Read all 2000 files inside <- lapply(solutions, read.csv, header=TRUE)
# Loop for performing reshape cast function to each listed dataframe
for (i in 1:length(
all.cast <- dcast(, sol2 ~ sol1, value.var="Istat")
But it keeps giving me the error that it cannot recognize the "Istat" value from the input - even if it is present in the list of data cells (the "decisions" object in the code above).
And using a function lapply
lapply(solutions, dcast(, sol2 ~ sol1, value.var="Istat"))
I am getting the same type of error:
Error: value.var (Istat) not found in input
I don't understand why, because it is listed in the data list as one of the variables in each of the 2000 data frames. It looks like I don't get it to loop through each of the 2000 .csv 2000 files, but I don't know how to fix it. I also wondered if the code could also be written so that it iterates over all 2000 outputs according to the column names? This is looping.
Hopefully this is not as difficult a problem as it seems to me. Any help (along with some detailed explanations) or helpful direction would be widely and sincerely appreciated. Thanks to
source to share
I would melt
list your "" and then dcast
broad form it. Something like:
## Sample data
set1 <- set2 <- data.frame(sol1 = c("s1", "s1", "s1", "s1"),
sol2 = c("s2", "s3", "s4", "s5"),
Istat = c(0.435, 0.456, 0.845, 0.234))
set2$Istat <- set2$Istat + 1 ## Just to see some different data <- mget(ls(pattern = "set\\d+")) ## use your actual object
## The reshaping
dcast(melt(, id.vars = c("sol1", "sol2")),
L1 + sol1 ~ sol2, value.var = "value")
# L1 sol1 s2 s3 s4 s5
# 1 set1 s1 0.435 0.456 0.845 0.234
# 2 set2 s1 1.435 1.456 1.845 1.234
If your "" object has names, "L1" will display names that can be very convenient in the long run.
source to share
You wrote:
for (i in 1:length(
all.cast <- dcast(, sol2 ~ sol1, value.var="Istat")
What you should have written:
all.cast <- list()
for (i in 1:length( {
all.cast[[i]] <- dcast([[i]], sol2 ~ sol1, value.var = "Istat")
But a more "R-esque" solution would be:
all.cast <- lapply(, dcast, sol2 ~ sol1, value.var = "Istat")
Hopefully this makes it clear what you did wrong.
source to share
"" is a list of data. To iterate over a list, you can use lapply
both anonymous function call (just to be clear) and apply to it dcast
lapply(, function(x) dcast(x, sol1 ~ sol2, value.var="Istat"))
Or instead of a separate dcast
list it could be rbind
for a dataframe with a grouping variable for each list item and then either do dcast
or spread
unnest(, group) %>%
spread(sol2, Istat)
Or using data.table
dcast(rbindlist(Map(cbind,, group=seq_along(,
group + sol1 ~sol2, value.var='Istat')
data <- structure(list(solution1 = structure(list(sol1 = c("s1",
"s1", "s1"), sol2 = c("s2", "s3", "s4", "s5"), Istat = c(0.435,
0.456, 0.845, 0.234)), .Names = c("sol1", "sol2", "Istat"),
class = "data.frame", row.names = c(NA,
-4L)), solution2 = structure(list(sol1 = c("s1", "s1", "s1",
"s1"), sol2 = c("s2", "s3", "s4", "s5"), Istat = c(0.42, 0.536,
0.945, 0.324)), .Names = c("sol1", "sol2", "Istat"),
class = "data.frame", row.names = c(NA,
-4L))), .Names = c("solution1", "solution2"))
source to share