Count the number of list items that do not include certain lines
I have a very large text file containing 900,000 lines. I have to count lines that don't have "year1995" and "year1996" in the line. I did the following:
fname = r"data.txt"
with open(fname,'r') as fi:
lines = fi.read().splitlines()
print len(lines)
test = [l for l in lines if 'year1995' or 'year1996' not in l]
print len(test)
BUT my code is not producing the expected result.
Any ideas?
The code you have there will put each line in test
. This is because the first statement if
will always evaluate the value True
, since non-blank lines are true. Change the test to understanding:
[l for l in lines if not ('year1995' in l or 'year1996' in l)]
It's pointless to create a list to throw it away, just use the sum:
with open(fname,'r') as fi:
print sum(not any(x in line for x in ('year1995','year1996' ) ) for line in fi)
lines = fi.read().splitlines()
also not required, just iterate over the file object which will be each line.
You need to change the condition if
to:
if not 'year1995' in l or not 'year1996' in l
or
if not ('year1995' in l or 'year1996' in l)
Note: you need to put a condition after the not
operation !