How to parse a CSV file and search by item in the first column

I have a CSV file with over 4000 lines formatted as ...

name, price, cost, quantity

How do I cut my CSV file so that there are only 20 names left? I can parse / truncate a CSV file, I don't understand how to search for column 1.

+3


source to share


4 answers


Use pandas

!

import pandas as pd

df = pd.DataFrame({'name': ['abc', 'ght', 'kjh'], 'price': [7,5,6], 'cost': [9, 0 ,2], 'quantity': [1,3,4]})

df = pd.read_csv('input_csv.csv') # Your case you would import like this

>>> df

   cost name  price  quantity
0     9  abc      7         1
1     0  ght      5         3
2     2  kjh      6         4

>>> names_wanted = ['abc','kjh']

>>> df_trim = df[df['name'].isin(names_wanted)]

>>> df_trim

      cost   name   price  quantity
  0      9    abc       7         1
  2      2    kjh       6         4

      

Then export the file to csv:



>>> df_trim.to_csv('trimmed_csv.csv', index=False)

      

Done!

+2


source


You can loop csv.reader (). It will return the lines to you. Strings are made up of lists. Compare the first element of the list, i.e. line [0]. If that's what you want, add the line to the output list.



+1


source


You can create an ASCII test file with each of the 20 names on a separate line (possibly called target_names). Then with your CSV file (possibly named file.csv) on the command line (bash):

for name in $(cat target_names); do grep $name file.csv >> my_new_small_file.csv; done

      

If you have problems with case sensitivity use grep -i.

+1


source


Not sure if I understood you correctly, but can the snippet below do what you want?

def FilterCsv(_sFilename, _aAllowedNameList):
  l_aNewFileLines = []
  l_inputFile = open(_sFilename, 'r')
  for l_sLine in l_inputFile:
    l_aItems = l_sLine.split(',')
    if l_aItems[0] in _aAllowedNameList:
      l_aNewFileLines.append(l_sLine)
  l_inputFile.close()

  l_outputFile = open('output_' + _sFilename, 'w')
  for l_sLine in l_aNewFileLines:
    l_outputFile.write(l_sLine)
  l_outputFile.close()

      

Hope this can help!

+1


source







All Articles