Python script to count num lines in all files in a directory

So I'm new to python and I'm trying to write a script that iterates through all the .txt files in a directory, counts the number of lines in each one (excluding lines that are empty or commented out), and writes the final output to csv. The end result should look something like this:

agprices, avi, adp
132, 5, 8 

      

I am having syntax problems to store each count as a dictionary value. Here is my code below:

#!/usr/bin/env python

import csv
import copy
import os
import sys

#get current working dir, set count, and select file delimiter
d = os.getcwd()
count = 0
ext = '.txt'

#parses through files and saves to a dict
series_dict = {}
txt_files = [i for i in os.listdir(d) if os.path.splitext(i)[1] == ext] 
 #selects all files with .txt extension
for f in txt_files:
    with open(os.path.join(d,f)) as file_obj:
        series_dict[f] = file_obj.read()

            if line.strip():                #Exclude blank lines
                continue
            else if line.startswith("#"):   #Exclude commented lines
                continue
            else
                count +=1
                #Need to save count as val in dict here

#save the dictionary with key/val pairs to a csv
with open('seriescount.csv', 'wb') as f: 
w = csv.DictWriter(f, series_dict.keys())
w.writeheader()
w.writerow(series_dict)

      

So here's the edit:

#!/usr/bin/env python

import csv
import copy
import os
import sys
import glob

#get current working dir, set count, and select file delimiter
os.chdir('/Users/Briana/Documents/Misc./PythonTest')

#parses through files and saves to a dict
series = {}
for fn in glob.glob('*.txt'):
    with open(fn) as f:
        series[fn] = (1 for line in f if line.strip() and not line.startswith('#')) 

print series

#save the dictionary with key/val pairs to a csv
with open('seriescount.csv', 'wb') as f: 
    w = csv.DictWriter(f, series.keys())
    sum(names.values())

      

I'm getting an error indented by the second to the last line and not really sure why? Also, I'm not sure if I'm writing the syntax correctly on the last part. Again, I'm just trying to return a dictionary with filenames and number of lines in files like {a: 132, b: 245, c: 13}

+3


source to share


4 answers


You can try something along these lines:

os.chdir(ur_directory)
names={}
for fn in glob.glob('*.txt'):
    with open(fn) as f:
        names[fn]=sum(1 for line in f if line.strip() and not line.startswith('#'))    

print names     

      

This will print a dictionary like:

{'test_text.txt': 20, 'f1.txt': 3, 'lines.txt': 101, 'foo.txt': 6, 'dat.txt': 6, 'hello.txt': 1, 'f2.txt': 4, 'neglob.txt': 8, 'bar.txt': 6, 'test_reg.txt': 6, 'mission_sp.txt': 71, 'test_nums.txt': 8, 'test.txt': 7, '2591.txt': 8303} 

      



And you can use this Python dict in csv.DictWriter

.

If you want to get their sum, just do:

sum(names.values())

      

+4


source


I think you should make two changes to your script:

  • Use glob.glob()

    to get a list of files matching your suffix
  • Use for line in file_obj

    to iterate over lines


Another problem:

  • The imprint is incorrect in the last few lines
0


source


You can read your lines in your files with this 1-liner:

line_nums = sum(1 for line in open(f) if line.strip() and line[0] != '#')

      

which will shorten the code segment to

for f in txt_files:
    count += sum(1 for line in open(os.path.join(d,f)) 
                 if line[0] != '#' and line.strip())

      

0


source


It sounds like you want to use a dictionary to keep track of counts. You can create one top like thiscounts = {}

Then (once you fix your tests) you can update it for each line without comment:

series_dict = {}
txt_files = [i for i in os.listdir(d) if os.path.splitext(i)[1] == ext]
#selects all files with .txt extension
for f in txt_files:
    counts[f] = 0 # create an entry in the dictionary to keep track of one file lines 
    with open(os.path.join(d,f)) as file_obj:
        series_dict[f] = file_obj.read()

        if line.startswith("#"):   #Exclude commented lines
            continue
        elif line.strip():                #Exclude blank lines
            counts(f) += 1

      

0


source







All Articles