How to read multiple lines of a file into one line of a data block
I have a data file where the individual samples are separated by a blank line, and each field is on its own line:
age 20
weight 185
height 72
age 87
weight 109
height 60
age 15
weight 109
height 58
...
How can I read this file into a dataframe so that each row represents a sample with age, weight, height columns?
age weight height
1 20 185 72
2 87 109 60
3 15 109 58
...
source to share
@ user1317221_G showed the approach I would take, but resorted to downloading an additional package and generating the groups explicitly. Groups (id variable) are the key to getting the type response reshape
to work. Matrix answers do not have this limitation.
Here's a closely related approach in R base:
mydf <- read.table(header = FALSE, stringsAsFactors=FALSE,
text = "age 20
weight 185
height 72
age 87
weight 109
height 60
age 15
weight 109
height 58
")
# Create your id variable
mydf <- within(mydf, {
id <- ave(V1, V1, FUN = seq_along)
})
With an id variable, your transformation is simple:
reshape(mydf, direction = "wide",
idvar = "id", timevar="V1")
# id V2.age V2.weight V2.height
# 1 1 20 185 72
# 4 2 87 109 60
# 7 3 15 109 58
Or:
# Your ids become the "rownames" with this approach
as.data.frame.matrix(xtabs(V2 ~ id + V1, mydf))
# age height weight
# 1 20 72 185
# 2 87 60 109
# 3 15 58 109
source to share
To expand on @ BlueMagister's answer, you can use a scan with some parameters to read this directly into a list, then convert the list to a dataframe:
tmp <- scan(text = "
age 20
weight 185
height 72
age 87
weight 109
height 60
age 15
weight 109
height 58", multi.line=TRUE,
what=list('',0,'',0,'',0),
blank.lines.skip=TRUE)
mydf <- as.data.frame( tmp[ c(FALSE,TRUE) ] )
names(mydf) <- sapply( tmp[ c(TRUE,FALSE) ], '[', 1 )
This assumes that the variables in the record are always in the same order.
source to share
Here's something I tried with scan
:
##substitute text with file depending on your input
##read in three strings separated by spaces, multi-line input
y <- scan(text=x,what=list(character(),character(),character())
,sep="\n",multi.line=TRUE)
##combine into a matrix of strings
y <- do.call(cbind,y)
# [,1] [,2] [,3]
#[1,] "age 20" "weight 185" "height 72"
#[2,] "age 87" "weight 109" "height 60"
#[3,] "age 15" "weight 109" "height 58"
##set column names based on text from the first row
colnames(y) <- regmatches(y[1,],regexpr("^\\w+",y[1,]))
##remove non-numeric characters
y <- gsub("\\D+","",y)
##convert to number format, preserving matrix structure
y <- apply(y,2,as.numeric)
##convert to data frame (if necessary)
y <- data.frame(y)
source to share
If your original file has these three variables all the way down, one easy way is to just read the file as two columns (names first, numbers in second) and then translate the second column into a matrix. If I turn off the answer df
from user1317221_G,
matrix(df$V2,ncol=3,byrow=TRUE)
[,1] [,2] [,3]
[1,] 1 1 6
[2,] 2 7 2
[3,] 4 8 9
Adding row and / or column names is trivial. Sorry for getting the column order "age, weight, height" :-)
source to share
Another solution
data <- readLines('c:\\relatorios\\bla.txt') # Read the data
data <- data[data != ''] # Remove the white lines
names <- unique(gsub('[0-9]*','',data)) # Get the names
data <- matrix(as.real(gsub('[^0-9]*','',data)),ncol=3,byrow=T) # Create matrix
colnames(data) <- names # Set the names
source to share