Reading text in data.frame where string values ​​contain spaces

What's the easiest way to read text from a printed data.frame into a data.frame when there are string values ​​containing spaces that get in the way read.table

? For example, this piece of data.frame does not create a problem:

     candname party elecVotes
1 BarackObama     D       365
2  JohnMcCain     R       173

      

I can insert it into the call read.table

without any problem:

dat <- read.table(text = "     candname party elecVotes
1 BarackObama     D       365
2  JohnMcCain     R       173", header = TRUE)

      

But if there are lines with such spaces in the data:

      candname party elecVotes
1 Barack Obama     D       365
2  John McCain     R       173

      

Then it read.table

throws an error because it interprets "Barack" and "Obama" as two separate variables.

+3


source to share


1 answer


Read the file into L

, remove line numbers and use sub

with the specified regexp to insert commas between the remaining fields. (Note that it "\\d"

matches any digit, but "\\S"

matches any character without spaces.) Now re-read it with read.csv

:

Lines <- "      candname party elecVotes
1 Barack Obama     D       365
2  John McCain     R       173"

# L <- readLines("myfile")  # read file; for demonstration use next line instead
L <- readLines(textConnection(Lines))

L2 <- sub("^ *\\d+ *", "", L)  # remove row numbers
read.csv(text = sub("^ *(.*\\S) +(\\S+) +(\\S+)$", "\\1,\\2,\\3", L2), as.is = TRUE)

      

giving:

      candname party elecVotes
1 Barack Obama     D       365
2  John McCain     R       173

      



Here's a visualization of the regex:

^ *(.*\S) +(\S+) +(\S+)$

      

Regular expression visualization

Demo Debuggex

+7


source







All Articles