Read CSV file and create dictionary?

Let's say I have a "players.csv" file below with some NFL player data. My goal is to read the file and create a dictionary with keys like the height of the players, as well as the values ​​as lists of player profiles. (Which are in the tuple)

HEIGHT,NAME,DRAFTED,AGE,POSITION,WEIGHT

6,Aaron,2005,31,QB,225

5,Jordy,2008,30,WR,217

5,Randall,2011,24,WR,192

      

An example of a player profile tuple, "name" must be a string, and "age" and "position" must be integers. "Year" and "Position" should be ignored.

player_profile = (name, age, position)

      

Expected dictionary:

# players height are keys, player profiles are values.
dict = {
    6: [('Aaron', 31, 225)]
    5: [('Jordy', 30, 217), ('Randall', 24, 192)]
   }

      

Below is what I have so far and I am stuck.

final_dict = {}

#open csv file
with open(filename) as f:
    info = f.read()

#split the newline characters
info2 = info.split()

#exclude the header
info3 = info2[1:]

      

+3


source to share


3 answers


Use the csv module with defaultdict to handle duplicate keys:

import csv
from collections import defaultdict

d = defaultdict(list)

with open("in.csv") as f:
    next(f) # skip header
    r = csv.reader(f)
    # unpack use height as key and  append name age and position
    for h, nm, _, a, p ,_ in r:
        d[int(h)].append((nm, int(a), p))

print(d)

      

Output:



defaultdict(<type 'list'>, {5: [('Jordy', 30, 'WR'), ('Randall', 24, 'WR')], 6: [('Aaron', 31, 'QB')]})

      

If you really want to avoid importing you can str.split and use dict.setdefault, but I see no reason not to use built-in libraries like csv and collections:

d = {}

with open("in.csv") as f:
    next(f)  
    for line in f:
        h, nm, _, a, p ,_  = line.split(",")
        d.setdefault(int(h),[]).append((nm, int(a), p))

print(d)

      

Output:



{5: [('Jordy', 30, 'WR'), ('Randall', 24, 'WR')], 6: [('Aaron', 31, 'QB')]}

      

Your example input is incorrect, since POSITION

is a string, you must take WEIGHT

to match the expected output:

with open("in.csv") as f:
    next(f) # skip header
    r = csv.reader(f)
    # unpack use height as key and  append name age and weight
    for h, nm, _, a, _ ,w in r:
        d[int(h)].append((nm, int(a), int(w)))

      

Output:



defaultdict(<type 'list'>, {5: [('Jordy', 30, 217), ('Randall', 24, 192)], 6: [('Aaron', 31, 225)]})

      

Make the same changes using a regular dict to get the same result.

+2


source


The problem with the module csv

is that it does not automatically handle the datatype conversion, and as you already noticed from Padraic's answer, the keys are strings as well as age. This, in turn, means that you will need an additional pass, perhaps with map

, in which you will translate the strings to the desired types. Also, it is likely that after you read your file, you will want to perform some sort of analysis or other processing on its contents.

For this reason, I would like to suggest working with pandas.DataFrame

, which offers behavior similar to that in a dictionary, as follows:

import pandas
Q = pandas.read_csv("myfile.csv", index_col = "HEIGHT")

      

Q

now DataFrame . To get all players with height 5:

Q.ix[5] #Returns two rows according to the data posted in the question.

      



To get the average age of players with a height of 5:

Q.ix[5]["AGE"].median() #27.0 according to the data posted in the question.

      

For more information on pandas see this link .

Hope it helps.

0


source


I think this is the most basic solution to this question

from collections import defaultdict

players = defaultdict(list)
for line in open("players.csv"):
    line = line.strip()
    tokens = line.split(",")
    xs = [tokens[1], tokens[3], tokens[5]]
    players[tokens[0]].append(tuple(xs))

      

First of all, you are defining a default dict with a list as a value. Then you view the file and we have to strip some special characters like "\ n" and so on. Then we split the whole line into ",". Then we know where what is. We know the number is at position zero, so this is our key. The other attributes are in the 1st, 3rd and 5th position, so we also include these tokens in our list. We include these tokens to list just to convert that list to a tuple. This is the simplest solution. We could also say something like this

players[tokens[0]].append((tokens[1], tokens[3], tokens[5]))

      

It would also work :)

Regards, golobich

0


source







All Articles