Using csv module to read ascii delimited text?
You may or may not know ASCII delimited text , which has the pleasant advantage of using non-keyboard characters to separate fields and lines.
Writing this is pretty simple:
import csv
with open('ascii_delim.adt', 'w') as f:
writer = csv.writer(f, delimiter=chr(31), lineterminator=chr(30))
writer.writerow(('Sir Lancelot of Camelot', 'To seek the Holy Grail', 'blue'))
writer.writerow(('Sir Galahad of Camelot', 'I seek the Grail', 'blue... no yellow!'))
However, when reading lineterminator
it does nothing, and if I try to do:
open('ascii_delim.adt', newline=chr(30))
He throws away ValueError: illegal newline value:
So how can I read in my ASCII delimited file? Have I retreated to execution line.split(chr(30))
?
source to share
You can do this by effectively translating the end-of-line characters in the file into newlines csv.reader
for hardcoding:
import csv
with open('ascii_delim.adt', 'w') as f:
writer = csv.writer(f, delimiter=chr(31), lineterminator=chr(30))
writer.writerow(('Sir Lancelot of Camelot', 'To seek the Holy Grail', 'blue'))
writer.writerow(('Sir Galahad of Camelot', 'I seek the Grail', 'blue... no yellow!'))
def readlines(f, newline='\n'):
while True:
line = []
while True:
ch = f.read(1)
if ch == '': # end of file?
return
elif ch == newline: # end of line?
line.append('\n')
break
line.append(ch)
yield ''.join(line)
with open('ascii_delim.adt', 'rb') as f:
reader = csv.reader(readlines(f, newline=chr(30)), delimiter=chr(31))
for row in reader:
print row
Output:
['Sir Lancelot of Camelot', 'To seek the Holy Grail', 'blue'] ['Sir Galahad of Camelot', 'I seek the Grail', 'blue... no yellow!']
source to share
The documentation says:
The reader is hardcoded to recognize either "\ r" or "\ n" as the end of the line, and ignores the liner. This may change in the future.
Thus, the module csv
cannot read CSV files that use their own line terminators.
source to share
Hey, I've been struggling with a similar problem all day. I wrote a function heavily inspired by @martineau that should solve it for you. My function is slower, but can parse files delimited by any lines. Hope this helps!
import csv
def custom_CSV_reader(csv_file,row_delimiter,col_delimiter):
with open(csv_file, 'rb') as f:
row = [];
result = [];
temp_row = ''
temp_col = ''
line = ''
go = 1;
while go == 1:
while go == 1:
ch = f.read(1)
if ch == '': # end of file?
go = 0
if ch != '\n' and ch != '\t' and ch != ',':
temp_row = temp_row + ch
temp_col = temp_col + ch
line = line + ch
if row_delimiter in temp_row:
line = line[:-len(row_delimiter)]
row.append(line)
temp_row = ''
line= ''
break
elif col_delimiter in temp_col:
line = line[:-len(col_delimiter)]
row.append(line)
result.append(row)
row = [];
temp_col = ''
line = ''
break
return result
source to share
Per documents foropen
:
newline controls the behavior of the universal newline (applicable only to text mode). This can be
None
,''
,'\n'
,'\r'
and'\r\n'
.
so open
will not process your file. Per csv
docs :
Note
reader
hard-coded to recognize either'\r'
or'\n'
both end of the line, and ignores the determinant.
so that he doesn't do it. I also looked to see if there was a config str.splitlines
, but it uses a certain set of boundaries.
Have I retreated to execution
line.split(chr(30))
?
It looks like this, sorry!
source to share