Count the number of list items that do not include certain lines

I have a very large text file containing 900,000 lines. I have to count lines that don't have "year1995" and "year1996" in the line. I did the following:

fname = r"data.txt"
with open(fname,'r') as fi:
    lines = fi.read().splitlines()
    print len(lines)
    test = [l for l in lines if 'year1995' or 'year1996' not in l]
    print len(test)

      

BUT my code is not producing the expected result.

Any ideas?

+3


source to share


3 answers


The code you have there will put each line in test

. This is because the first statement if

will always evaluate the value True

, since non-blank lines are true. Change the test to understanding:



[l for l in lines if not ('year1995' in l or 'year1996' in l)]

      

+1


source


It's pointless to create a list to throw it away, just use the sum:

with open(fname,'r') as fi:
       print sum(not any(x in line for x in ('year1995','year1996' ) ) for line in fi)

      



lines = fi.read().splitlines()

also not required, just iterate over the file object which will be each line.

+1


source


You need to change the condition if

to:

if not 'year1995' in l or not 'year1996' in l

      

or

if not ('year1995' in l or 'year1996' in l)

      

Note: you need to put a condition after the not

operation
!

0


source







All Articles