How to concatenate multiple lines in CSV into one line
I was provided with a large CSV file that I need to split for use in machines. I managed to find a way to split the file into two lines that I need, but I have a problem.
I basically have a file structure like this.
"David", "Red"
"David", "Ford"
"David", "Blue"
"David", "Aspergers"
"Steve", "Red"
"Steve", "Vauxhall"
And I need the data to look more like this ...
"David, "Red", "Ford", "Blue", "Aspergers"
"Steve", "Red", "Vaxhaull"
I currently have this to delete CSV files
import csv
cr = csv.reader(open("traits.csv","rb"), delimiter=',', lineterminator='\n')
cr.next() #skipping header line, no point in removing it as I need to standardise data manipuation.
# Print out the id of species and trait values
print 'Stripping input'
vals = [(row[1], row[4]) for row in cr]
print str(vals) + '\n'
with open("output.csv", "wb") as f:
writer = csv.writer(f)
writer.writerows(vals)
print 'Sucessfully written to file output.csv'
#for row in cr:
#print row
source to share
Use a dictionary to store names as a key and other attributes in a list as a value:
my_dict={}
with open("traits.csv","rb") as f:
cr = csv.reader(f, delimiter=',', lineterminator='\n')
for row in cr:
my_dict.setdefault(row[0].strip('" '),[]).append(row[1].strip('" '))
result:
print my_dict
{'Steve': ['Red', 'Vauxhall'], 'David': ['Red', 'Ford', 'Blue', 'Aspergers']}
And to write in a new file:
with open("output.csv", "wb") as f:
writer = csv.writer(f,delimiter=',')
for i,j in my_dict.iteritems():
writer.writerow([i]+j)
setdefault (key [, default])
If the key is in the dictionary, return its value. If not, insert the key with the default and return the default. the default is None by default.
source to share
Use defaultdict
, this is exactly what you need, here's a sample:
>>> from collections import defaultdict
>>> md = defaultdict(list)
>>> md[1].append('a')
>>> md[1].append('b')
>>> md[2].append('c')
>>> md[1]
['a', 'b']
>>> md[2]
['c']
(You can use set instead of a list, in which case you would call .add instead of .append.)
You can use iteritems
for easy data access.
source to share