How to concatenate multiple lines in CSV into one line

I was provided with a large CSV file that I need to split for use in machines. I managed to find a way to split the file into two lines that I need, but I have a problem.

I basically have a file structure like this.

 "David", "Red"
 "David", "Ford"
 "David", "Blue"
 "David", "Aspergers"
 "Steve", "Red"
 "Steve", "Vauxhall"

      

And I need the data to look more like this ...

"David, "Red", "Ford", "Blue", "Aspergers"
"Steve", "Red", "Vaxhaull"

      

I currently have this to delete CSV files

import csv

cr = csv.reader(open("traits.csv","rb"), delimiter=',', lineterminator='\n')
cr.next() #skipping header line, no point in removing it as I need to standardise data manipuation.


# Print out the id of species and trait values
print 'Stripping input'
vals = [(row[1], row[4]) for row in cr]
print str(vals) + '\n'

with open("output.csv", "wb") as f:
    writer = csv.writer(f)
    writer.writerows(vals)
    print 'Sucessfully written to file output.csv'


#for row in cr:
#print row

      

+3


source to share


2 answers


Use a dictionary to store names as a key and other attributes in a list as a value:

my_dict={}
with open("traits.csv","rb") as f:
   cr = csv.reader(f, delimiter=',', lineterminator='\n')
   for row in cr:
       my_dict.setdefault(row[0].strip('" '),[]).append(row[1].strip('" '))

      

result:

print my_dict
{'Steve': ['Red', 'Vauxhall'], 'David': ['Red', 'Ford', 'Blue', 'Aspergers']}

      



And to write in a new file:

with open("output.csv", "wb") as f:
    writer = csv.writer(f,delimiter=',')
    for i,j in my_dict.iteritems():
        writer.writerow([i]+j)

      

setdefault (key [, default])

If the key is in the dictionary, return its value. If not, insert the key with the default and return the default. the default is None by default.

+4


source


Use defaultdict

, this is exactly what you need, here's a sample:

>>> from collections import defaultdict
>>> md = defaultdict(list)
>>> md[1].append('a')
>>> md[1].append('b')
>>> md[2].append('c')
>>> md[1]
['a', 'b']
>>> md[2]
['c']

      





(You can use set instead of a list, in which case you would call .add instead of .append.)

You can use iteritems

for easy data access.

0


source







All Articles