How to handle csv file with duplicate field names when read with csv.DictReader?
I am working with a poorly formed CSV file; it has duplicate field names.
csv.DictReader
just overwrites the first column with the same name with the contents of the second column with the same name. But I need both the content of the double named columns.
I cannot directly assign the parameter DictReader.fieldnames
. There are about a hundred columns, and each time it will be a different number of columns, for example:
product, price1, price2, price1,...,price100 car, 100, 300, 200,...,350
output: {'product':'car', 'price1': 200, 'price2':300}
I need: {'product':'car', 'price1': 100, 'price2':300, 'price3': 200}
What is the way to do this?
source to share
Do not use DictReader()
in this case. Stick to the regular reader.
You can always map a dictionary based on the renamed list of field names:
with open(filename, 'rb') as csvfile:
reader = csv.reader(csvfile)
fieldnames = remap(next(reader))
for row in reader:
row = dict(zip(fieldnames, row))
where the function remap()
can either renumber the numbered columns or add additional information if the column names are duplicated.
Renumbering can be as simple as:
from itertools import count
def remap(fieldnames):
price_count = count(1)
return ['price{}'.format(next(price_count)) if f.startswith('price') else f
for f in fieldnames]
source to share