Python csv newline character in field

Question

Python csv newline character in field

I have a problem reading a thorn-based delimited csv file that I think has a newline character in one of the fields. It forces the line over two lines, so I can't read the values in the last fields of the line. I tried to open in new line mode

, but not sure what is the best way to do this.

This is how I am trying to read the file into python

:

csv.register_dialect('BB', delimiter='\xfe')
with open(file, 'rU') as file_in: 
    log=csv.reader(file_in, dialect='BB')
    for row in log:
        print row

This works great for most of the file, but there is a line that I assume has a newline character in one of the fields - I'm not sure how best to diagnose it. This is a screenshot of what the line looks like in notepad, as you can see that it forces the line to two lines when it should look like two lines below.

Assuming this with csv.reader

, the line looks like this:

['06 -13-2015-10: 13: 41 ',' 0 ',' ',' ',' ',' ',' ',' ',' ',' ',' ',' ',' ',' ',' ',' 142 ',' ',' 5 ',' 7.0 ',' 2 ',' ',' cmhkl966 ',' amex_674 ',' 1 ',' 0.00 ',' ',' ',' '"]

i.e. truncated at this first apostrophe.

+3

python csv

Tim S_ Jul 29. 15 at 12:41

source to share

2 answers

You can also check if the first element of the next line starts with a timestamp, and if not, use the list function extend

to add it to the content of the current line before printing.

Disclaimer: Not Verified

import re

csv.register_dialect('BB', delimiter='\xfe')
with open(file, 'rU') as file_in: 
    log=csv.reader(file_in, dialect='BB')
    for i in range(0, len(log) - 1):
        if re.search('\d+-\d+-\d+-\d+:\d+:\d+', log[i+1][0]) is None:
            i.extend(log[i+1])
        print i

0

ILostMySpoon Jul 29. 15 at 13:40

source to share

hiro protagonist · Accepted Answer · 2015-07-29T13:01:59+0000

I have shortened your problem a bit (I hope I understood the cause of the problem):

import io
import csv

file_in = io.StringIO('''
aþbþ'hello
world'
''')

log=csv.reader(file_in, delimiter='\xfe', quotechar="'")
for row in log:
    print(row)

output:

['a', 'b', 'hello\nworld']

UPDATE:

as pointed out in the comments: here's the version that is .csv

read from the file. content test.csv

:

aþbþ'hello
world'þc
dþeþ'hello
other
things'þf
gþhþiþj

and python code:

import csv
from pathlib import Path

HERE = Path(__file__).parent
DATA_PATH = HERE / '../data/test.csv'

with DATA_PATH.open('rU') as file_in:
    log=csv.reader(file_in, delimiter='\xfe', quotechar="'")
    for row in log:
        print(row)

which outputs:

['a', 'b', 'hello\nworld', 'c']
['d', 'e', 'hello\nother\nthings', 'f']
['g', 'h', 'i', 'j']

Python csv newline character in field

More articles: