How to handle csv file with duplicate field names when read with csv.DictReader?

I am working with a poorly formed CSV file; it has duplicate field names.

csv.DictReader

just overwrites the first column with the same name with the contents of the second column with the same name. But I need both the content of the double named columns.

I cannot directly assign the parameter DictReader.fieldnames

. There are about a hundred columns, and each time it will be a different number of columns, for example:

product, price1, price2, price1,...,price100
car, 100, 300, 200,...,350

      

output: {'product':'car', 'price1': 200, 'price2':300}

I need: {'product':'car', 'price1': 100, 'price2':300, 'price3': 200}

What is the way to do this?

+3


source to share


1 answer


Do not use DictReader()

in this case. Stick to the regular reader.

You can always map a dictionary based on the renamed list of field names:

with open(filename, 'rb') as csvfile:
    reader = csv.reader(csvfile)
    fieldnames = remap(next(reader))
    for row in reader:
        row = dict(zip(fieldnames, row))

      



where the function remap()

can either renumber the numbered columns or add additional information if the column names are duplicated.

Renumbering can be as simple as:

from itertools import count

def remap(fieldnames):
    price_count = count(1)
    return ['price{}'.format(next(price_count)) if f.startswith('price') else f
            for f in fieldnames]

      

+7


source







All Articles