Python csv newline character in field
I have a problem reading a thorn-based delimited csv file that I think has a newline character in one of the fields. It forces the line over two lines, so I can't read the values in the last fields of the line. I tried to open in new line mode
, but not sure what is the best way to do this.
This is how I am trying to read the file into python
:
csv.register_dialect('BB', delimiter='\xfe')
with open(file, 'rU') as file_in:
log=csv.reader(file_in, dialect='BB')
for row in log:
print row
This works great for most of the file, but there is a line that I assume has a newline character in one of the fields - I'm not sure how best to diagnose it. This is a screenshot of what the line looks like in notepad, as you can see that it forces the line to two lines when it should look like two lines below.
Assuming this with csv.reader
, the line looks like this:
['06 -13-2015-10: 13: 41 ',' 0 ',' ',' ',' ',' ',' ',' ',' ',' ',' ',' ',' ',' ',' ',' 142 ',' ',' 5 ',' 7.0 ',' 2 ',' ',' cmhkl966 ',' amex_674 ',' 1 ',' 0.00 ',' ',' ',' '"]
i.e. truncated at this first apostrophe.
source to share
I have shortened your problem a bit (I hope I understood the cause of the problem):
import io
import csv
file_in = io.StringIO('''
aþbþ'hello
world'
''')
log=csv.reader(file_in, delimiter='\xfe', quotechar="'")
for row in log:
print(row)
output:
['a', 'b', 'hello\nworld']
UPDATE:
as pointed out in the comments: here's the version that is .csv
read from the file. content test.csv
:
aþbþ'hello
world'þc
dþeþ'hello
other
things'þf
gþhþiþj
and python code:
import csv
from pathlib import Path
HERE = Path(__file__).parent
DATA_PATH = HERE / '../data/test.csv'
with DATA_PATH.open('rU') as file_in:
log=csv.reader(file_in, delimiter='\xfe', quotechar="'")
for row in log:
print(row)
which outputs:
['a', 'b', 'hello\nworld', 'c'] ['d', 'e', 'hello\nother\nthings', 'f'] ['g', 'h', 'i', 'j']
source to share
You can also check if the first element of the next line starts with a timestamp, and if not, use the list function extend
to add it to the content of the current line before printing.
Disclaimer: Not Verified
import re
csv.register_dialect('BB', delimiter='\xfe')
with open(file, 'rU') as file_in:
log=csv.reader(file_in, dialect='BB')
for i in range(0, len(log) - 1):
if re.search('\d+-\d+-\d+-\d+:\d+:\d+', log[i+1][0]) is None:
i.extend(log[i+1])
print i
source to share