Reading text in data.frame where string values contain spaces

Question

Reading text in data.frame where string values contain spaces

What's the easiest way to read text from a printed data.frame into a data.frame when there are string values containing spaces that get in the way read.table

? For example, this piece of data.frame does not create a problem:

     candname party elecVotes
1 BarackObama     D       365
2  JohnMcCain     R       173

I can insert it into the call read.table

without any problem:

dat <- read.table(text = "     candname party elecVotes
1 BarackObama     D       365
2  JohnMcCain     R       173", header = TRUE)

But if there are lines with such spaces in the data:

      candname party elecVotes
1 Barack Obama     D       365
2  John McCain     R       173

Then it read.table

throws an error because it interprets "Barack" and "Obama" as two separate variables.

+3

r read.table

Sam firke 28 May '15 at 3:00

source to share

1 answer

G. Grothendieck · Accepted Answer · 2015-05-28T10:28:48+0000

Read the file into L

, remove line numbers and use sub

with the specified regexp to insert commas between the remaining fields. (Note that it "\\d"

matches any digit, but "\\S"

matches any character without spaces.) Now re-read it with read.csv

:

Lines <- "      candname party elecVotes
1 Barack Obama     D       365
2  John McCain     R       173"

# L <- readLines("myfile")  # read file; for demonstration use next line instead
L <- readLines(textConnection(Lines))

L2 <- sub("^ *\\d+ *", "", L)  # remove row numbers
read.csv(text = sub("^ *(.*\\S) +(\\S+) +(\\S+)$", "\\1,\\2,\\3", L2), as.is = TRUE)

giving:

      candname party elecVotes
1 Barack Obama     D       365
2  John McCain     R       173

Here's a visualization of the regex:

^ *(.*\S) +(\S+) +(\S+)$

Regular expression visualization

Demo Debuggex

Reading text in data.frame where string values ​​contain spaces

More articles:

Reading text in data.frame where string values contain spaces