How to read multiple lines of a file into one line of a data block

I have a data file where the individual samples are separated by a blank line, and each field is on its own line:

age 20
weight 185
height 72

age 87
weight 109
height 60

age 15
weight 109
height 58

...

      

How can I read this file into a dataframe so that each row represents a sample with age, weight, height columns?

    age    weight    height

1   20      185        72  
2   87      109        60
3   15      109        58
...

      

+3


source to share


6 answers


@ user1317221_G showed the approach I would take, but resorted to downloading an additional package and generating the groups explicitly. Groups (id variable) are the key to getting the type response reshape

to work. Matrix answers do not have this limitation.

Here's a closely related approach in R base:

mydf <- read.table(header = FALSE, stringsAsFactors=FALSE, 
                   text = "age 20
                   weight 185
                   height 72

                   age 87
                   weight 109
                   height 60

                   age 15
                   weight 109
                   height 58
                   ")

# Create your id variable
mydf <- within(mydf, {
  id <- ave(V1, V1, FUN = seq_along)
})

      

With an id variable, your transformation is simple:



reshape(mydf, direction = "wide", 
        idvar = "id", timevar="V1")
#   id V2.age V2.weight V2.height
# 1  1     20       185        72
# 4  2     87       109        60
# 7  3     15       109        58

      

Or:

# Your ids become the "rownames" with this approach
as.data.frame.matrix(xtabs(V2 ~ id + V1, mydf))
#   age height weight
# 1  20     72    185
# 2  87     60    109
# 3  15     58    109

      

+3


source


To expand on @ BlueMagister's answer, you can use a scan with some parameters to read this directly into a list, then convert the list to a dataframe:

tmp <- scan(text = "
age     20
weight  185
height  72

age     87
weight  109
height  60

age     15
weight  109
height  58", multi.line=TRUE, 
  what=list('',0,'',0,'',0), 
  blank.lines.skip=TRUE)

mydf <- as.data.frame( tmp[ c(FALSE,TRUE) ] )
names(mydf) <- sapply( tmp[ c(TRUE,FALSE) ], '[', 1 )

      



This assumes that the variables in the record are always in the same order.

+2


source


df <- read.table(text ="
age     1
weight  1
height  6

age     2
weight  7
height  2

age     4
weight  8
height  9", header=FALSE) 

df$ID <- rep(1:3, each=3)
library(reshape2)
newdf <- dcast(df, ID~V1, value.var="V2")

#     ID age height weight
#1  1   1      6      1
#2  2   2      2      7
#3  3   4      9      8

      

+1


source


Here's something I tried with scan

:

##substitute text with file depending on your input
##read in three strings separated by spaces, multi-line input
y <- scan(text=x,what=list(character(),character(),character())
  ,sep="\n",multi.line=TRUE)
##combine into a matrix of strings
y <- do.call(cbind,y)
#     [,1]     [,2]         [,3]       
#[1,] "age 20" "weight 185" "height 72"
#[2,] "age 87" "weight 109" "height 60"
#[3,] "age 15" "weight 109" "height 58"
##set column names based on text from the first row
colnames(y) <- regmatches(y[1,],regexpr("^\\w+",y[1,]))
##remove non-numeric characters
y <- gsub("\\D+","",y)
##convert to number format, preserving matrix structure
y <- apply(y,2,as.numeric)
##convert to data frame (if necessary)
y <- data.frame(y)

      

+1


source


If your original file has these three variables all the way down, one easy way is to just read the file as two columns (names first, numbers in second) and then translate the second column into a matrix. If I turn off the answer df

from user1317221_G,

matrix(df$V2,ncol=3,byrow=TRUE)
     [,1] [,2] [,3]
[1,]    1    1    6
[2,]    2    7    2
[3,]    4    8    9

      

Adding row and / or column names is trivial. Sorry for getting the column order "age, weight, height" :-)

0


source


Another solution

data <- readLines('c:\\relatorios\\bla.txt') # Read the data
data <- data[data != ''] # Remove the white lines
names <- unique(gsub('[0-9]*','',data)) # Get the names
data <- matrix(as.real(gsub('[^0-9]*','',data)),ncol=3,byrow=T) # Create matrix
colnames(data) <- names # Set the names

      

0


source







All Articles