Find the minimum value in CSV and print each line that includes it in Python

Thanks a lot for any help. I am trying to write a script that will go through a folder of csv files, find the minimum value in the second column, and print every line that contains it. The csv files viewed by the script look like this:

TPN,12010,on this date,25,0.00005047619239909304377497309619
TPN,12011,on this date,23,0.00003797836224092152019127884704
TPN,12012,on this date,78,0.0001130474103447076420049393022
TPN,12020,on this date,27,0.00005671375308512314236202279053
TPN,12021,on this date,60,0.00009856619048244864701475864425

      

The script looks like this:

import csv
import os

folder = '/Users/Documents/Senior/Thesis/Python/TextAnalysis/datedmatchedngrams2/'

identity = []
for filename in os.listdir (folder):
    with open(filename, 'rb') as inf:
        incsv = csv.reader(inf)
        column = 1               
        datatype = int
        data = (datatype(row[column]) for row in incsv)   
        least_value = min(data)
        print least_value
        for row in incsv:
            if least_value in column[1]:
                identity.append(row)
            else:
                print "No match"
        print identity

      

The error I am getting:

  File "findfirsttrigram.py", line 12, in <module>
    identity.append("a")
NameError: name 'identity' is not defined

      

I've also tried doing it like this:

import csv
import os

folder = '/Users/Documents/Senior/Thesis/Python/TextAnalysis/datedmatchedngrams2/'

for filename in os.listdir (folder):
    with open(filename, 'rb') as inf:
        incsv = csv.reader(inf)
        column = 1               
        datatype = int
        data = (datatype(row[column]) for row in incsv)   
        least_value = min(data)
        print least_value
        for row in incsv:
            if least_value in row:
                print row
            else:
                print "No match"

      

But that didn't work either. It didn't give me an error, but it also didn't print "No match", so I have no idea where to start. Please help !!

+3


source to share


3 answers


You can do something like:



import csv

# for each_file in os.listdir (folder):    
with open(each_file) as f:
    m=min(int(line[1]) for line in csv.reader(f))
    f.seek(0)
    for line in csv.reader(f):
        if int(line[1])==m:
            print line

      

+4


source


The reason your minimum value was not found is because you are converting your column to int

when you search for the minimum value, but it is still a string when you look at it as part of a string you have to read. Try changing your code like this:

for row in incsv:
    if int(row[column])==least_value:
        print row
    else:
        print "No match"

      



As for another error, within a clause, the with

global identity

is not available. You can either re-enter it with global

or not use the sentence with

.

+2


source


Ashalind explained why the value testing failed. However, the reason is that your No Match operator is never called because your csv reader cannot iterate over the data twice. Let's take a simple example.

with open(filename) as inf:
    incsv = csv.reader(inf)
    total_lines = 0
    for line in incsv:
        total_lines += 1
    print total_lines

    total_lines = 0
    for line in incsv:
        total_lines += 1
    print total_lines

      

Assuming there are 999 records, it will output this:

999
0

      

This is because at the end of the first iteration, the position of the file objects is at the end. You need to reset back to the beginning of the file to repeat the data. inf.seek(0)

and the second example should be accurate. Pretty sure this will work.

for filename in os.listdir (folder):
    with open(filename, 'rb') as inf:
        incsv = csv.reader(inf)
        column = 1               
        datatype = int
        #This sets the file current position to the end
        data = (datatype(row[column]) for row in incsv)   
        least_value = min(data)
        print least_value
        #This resets the file current position to be read again
        inf.seek(0)
        for row in incsv:
            # Check if the value is the same as properly casted data
            if least_value == datatype(row[column]):
                print row
            else:
                print "No match"

      

+1


source







All Articles