Python - counting words in a text file

I am a Python newbie and am working on a program that will count word instances in a simple text file. The program and text file will be read from the command line, so I included in my programming syntax to validate command line arguments. Code below

import sys

count={}

with open(sys.argv[1],'r') as f:
    for line in f:
        for word in line.split():
            if word not in count:
                count[word] = 1
            else:
                count[word] += 1

print(word,count[word])

file.close()

      

count is a dictionary for storing words and the number of times they occur. I want to be able to print out every word and the number of times it happens, from most cases to least occurrences.

I would like to know if I am on the right track and if I am using sys correctly. Thank!!

+3


source to share


4 answers


What you did looks good to me, you can also use collections.Counter (assuming you are python 2.7 or newer) to get a little more information than the count of each word. My solution will look like this, maybe some improvement is possible.



import sys
from collections import Counter
lines = open(sys.argv[1], 'r').readlines()
c = Counter()
for line in lines:
    for work in line.strip().split():
        c.update(work)
for ind in c:
    print ind, c[ind]

      

+3


source


Your last one print

doesn't have a loop, so it will just print a counter for the last word you read, which still remains as a value word

.

Also, with a context manager, with

you don't need a close()

file descriptor.



Finally, as pointed out in the comment, you want to remove the last newline from each line

before split

.

For a simple program like this, this is probably not worth the trouble, but you can look defaultdict

from Collections

it to avoid the special case for initializing a new key in the dictionary.

0


source


I just noticed a typo: you open the file as f

, but close it as file

. As tripleee said, you shouldn't close the files you open in the instructions with

. Also, it is wrong to use the names of built-in functions like file

or list

for your own identifiers. Sometimes it works, but sometimes it causes nasty errors. And it confuses people who are reading your code; a syntax highlighting editor can help avoid this little problem.

To print the data in your count

dict in descending order of quantity, you can do something like this:

items = count.items()
items.sort(key=lambda (k,v): v, reverse=True)
print '\n'.join('%s: %d' % (k, v) for k,v in items)

      

See the Python library link for more details on the list.sort () method and other convenient dict methods.

0


source


I just did it using the re library. This was for average words in a text file per line, but you need to know the number of words per line.

import re
#this program get the average number of words per line
def main():
    try:
        #get name of file
        filename=input('Enter a filename:')

        #open the file
        infile=open(filename,'r')

        #read file contents
        contents=infile.read()
        line = len(re.findall(r'\n', contents))
        count = len(re.findall(r'\w+', contents))
        average = count // line

        #display fie contents
        print(contents)
        print('there is an average of', average, 'words per sentence')

        #closse the file
        infile.close()
    except IOError:
        print('An error oocurred when trying to read ')
        print('the file',filename )

#call main
main()

      

0


source







All Articles