Negative look after new line?
I have a CSV-like text file that has about 1000 lines. There is a long series of dashes between each record in the file. Entries usually end with \ n, but sometimes an extra \ n appears before the \ n until the end of the entry. Simplified example:
"1x", "1y", "Hi there"
-------------------------------
"2x", "2y", "Hello - I'm lost"
-------------------------------
"3x", "3y", "How ya
doing?"
-------------------------------
I want to replace the extra \ n with spaces, i.e. concatenate lines between dashes. I thought I could do this (Python 2.5):
text = open("thefile.txt", "r").read()
better_text = re.sub(r'\n(?!\-)', ' ', text)
but that seems to replace every \ n, not just those not followed by a dash. What am I doing wrong?
I am asking this question trying to improve my own regex skills and understand the mistakes I have made. The end goal is to create a text file in a format that can be used with a custom VBA macro for Word that generates a Word document in a style that will then be digested by the Word-friendly CMS.
source to share
You need to exclude line breaks at the end of the separator lines. Try the following:
\n(?<!-\n)(?!-)
This regex uses a negative look-behind assertion to exclude \n
those that preceded -
.
source to share
This is a good place to use a generator function to skip lines ----
and get something the csv module can read.
def readCleanLines( someFile ):
for line in someFile:
if line.strip() == len(line.strip())*'-':
continue
yield line
reader= csv.reader( readCleanLines( someFile ) )
for row in reader:
print row
This should handle line breaks inside quotes easily and quietly.
If you want to do other things with this file, like save a copy with the lines removed ----
, you can do that.
with open( "source", "r" ) as someFile:
with open( "destination", "w" ) as anotherFile:
for line in readCleanLines( someFile ):
anotherFile.write( line )
This will make a copy with the lines removed ----
. It's not worth the effort as reading and skipping lines is very, very fast and doesn't require additional storage.
source to share