Count the number of list items that do not include certain lines
I have a very large text file containing 900,000 lines. I have to count lines that don't have "year1995" and "year1996" in the line. I did the following:
fname = r"data.txt"
with open(fname,'r') as fi:
lines = fi.read().splitlines()
print len(lines)
test = [l for l in lines if 'year1995' or 'year1996' not in l]
print len(test)
BUT my code is not producing the expected result.
Any ideas?
+3
source to share
3 answers
It's pointless to create a list to throw it away, just use the sum:
with open(fname,'r') as fi:
print sum(not any(x in line for x in ('year1995','year1996' ) ) for line in fi)
lines = fi.read().splitlines()
also not required, just iterate over the file object which will be each line.
+1
source to share