Reading text in data.frame where string values contain spaces
What's the easiest way to read text from a printed data.frame into a data.frame when there are string values containing spaces that get in the way read.table
? For example, this piece of data.frame does not create a problem:
candname party elecVotes
1 BarackObama D 365
2 JohnMcCain R 173
I can insert it into the call read.table
without any problem:
dat <- read.table(text = " candname party elecVotes
1 BarackObama D 365
2 JohnMcCain R 173", header = TRUE)
But if there are lines with such spaces in the data:
candname party elecVotes
1 Barack Obama D 365
2 John McCain R 173
Then it read.table
throws an error because it interprets "Barack" and "Obama" as two separate variables.
source to share
Read the file into L
, remove line numbers and use sub
with the specified regexp to insert commas between the remaining fields. (Note that it "\\d"
matches any digit, but "\\S"
matches any character without spaces.) Now re-read it with read.csv
:
Lines <- " candname party elecVotes
1 Barack Obama D 365
2 John McCain R 173"
# L <- readLines("myfile") # read file; for demonstration use next line instead
L <- readLines(textConnection(Lines))
L2 <- sub("^ *\\d+ *", "", L) # remove row numbers
read.csv(text = sub("^ *(.*\\S) +(\\S+) +(\\S+)$", "\\1,\\2,\\3", L2), as.is = TRUE)
giving:
candname party elecVotes
1 Barack Obama D 365
2 John McCain R 173
Here's a visualization of the regex:
^ *(.*\S) +(\S+) +(\S+)$
source to share