Python - counting words in a text file
I am a Python newbie and am working on a program that will count word instances in a simple text file. The program and text file will be read from the command line, so I included in my programming syntax to validate command line arguments. Code below
import sys
count={}
with open(sys.argv[1],'r') as f:
for line in f:
for word in line.split():
if word not in count:
count[word] = 1
else:
count[word] += 1
print(word,count[word])
file.close()
count is a dictionary for storing words and the number of times they occur. I want to be able to print out every word and the number of times it happens, from most cases to least occurrences.
I would like to know if I am on the right track and if I am using sys correctly. Thank!!
source to share
What you did looks good to me, you can also use collections.Counter (assuming you are python 2.7 or newer) to get a little more information than the count of each word. My solution will look like this, maybe some improvement is possible.
import sys
from collections import Counter
lines = open(sys.argv[1], 'r').readlines()
c = Counter()
for line in lines:
for work in line.strip().split():
c.update(work)
for ind in c:
print ind, c[ind]
source to share
Your last one print
doesn't have a loop, so it will just print a counter for the last word you read, which still remains as a value word
.
Also, with a context manager, with
you don't need a close()
file descriptor.
Finally, as pointed out in the comment, you want to remove the last newline from each line
before split
.
For a simple program like this, this is probably not worth the trouble, but you can look defaultdict
from Collections
it to avoid the special case for initializing a new key in the dictionary.
source to share
I just noticed a typo: you open the file as f
, but close it as file
. As tripleee said, you shouldn't close the files you open in the instructions with
. Also, it is wrong to use the names of built-in functions like file
or list
for your own identifiers. Sometimes it works, but sometimes it causes nasty errors. And it confuses people who are reading your code; a syntax highlighting editor can help avoid this little problem.
To print the data in your count
dict in descending order of quantity, you can do something like this:
items = count.items()
items.sort(key=lambda (k,v): v, reverse=True)
print '\n'.join('%s: %d' % (k, v) for k,v in items)
See the Python library link for more details on the list.sort () method and other convenient dict methods.
source to share
I just did it using the re library. This was for average words in a text file per line, but you need to know the number of words per line.
import re
#this program get the average number of words per line
def main():
try:
#get name of file
filename=input('Enter a filename:')
#open the file
infile=open(filename,'r')
#read file contents
contents=infile.read()
line = len(re.findall(r'\n', contents))
count = len(re.findall(r'\w+', contents))
average = count // line
#display fie contents
print(contents)
print('there is an average of', average, 'words per sentence')
#closse the file
infile.close()
except IOError:
print('An error oocurred when trying to read ')
print('the file',filename )
#call main
main()
source to share