How can I count the comma separated values in one column of a panda table?

Question

How can I count the comma separated values in one column of a panda table?

I have the following code:

businessdata = ['Name of Location','Address','City','Zip Code','Website','Yelp',
'# Reviews', 'Yelp Rating Stars','BarRestStore','Category',
'Price Range','Alcohol','Ambience','Latitude','Longitude']

business = pd.read_table('FL_Yelp_Data_v2.csv', sep=',', header=1, names=businessdata)
print '\n\nBusiness\n'
print business[:6]

It reads my file and creates a Panda table that I can work with. I need to count the number of categories in each row of the variable "Category" and store that number in a new column called "Categories". Here's an example of a target column:

Category                                         
French                                               
Adult Entertainment , Lounges , Music Venues         
American (New) , Steakhouses                        
American (New) , Beer, Wine & Spirits , Gastropubs 
Chicken Wings , Sports Bars , American (New)         
Japanese

Desired output:

Category                                        # Categories  
French                                               1           
Adult Entertainment , Lounges , Music Venues         3         
American (New) , Steakhouses                         2        
American (New) , Beer, Wine & Spirits , Gastropubs   4         
Chicken Wings , Sports Bars , American (New)         3         
Japanese                                             1

EDIT 1:

Raw input = CSV file. Target Column: "Category" I can't post screenshots yet. I don't think the values to be recalculated are lists.

This is my code:

business = pd.read_table('FL_Yelp_Data_v2.csv', sep=',', header=1, names=businessdata, skip_blank_lines=True)
#business = pd.read_csv('FL_Yelp_Data_v2.csv')

business['Category'].str.split(',').apply(len)
#not sure where to declare the df part in the suggestions that use it.

print business[:6]

but i keep getting the following error:

TypeError: object of type 'float' has no len()

EDIT 2:

I give up. Thanks for your help, but I will need to do something.

+3

python pandas

Danilo May 12 '15 at 21:46

source to share

5 answers

Alexander · Answer 1 · 2015-05-12T22:06:57+0000

Assuming the Category is actually a List, you can use apply

(for @EdChum's suggestion):

business['# Categories'] = business.Category.apply(len)

If not, you first need to parse it and convert it to a list.

df['Category'] = df.Category.map(lambda x: [i.strip() for i in x.split(",")])

Can you show some example output of EXACTLY what this column looks like (including the correct quotes)?

PS @EdChum Thanks for your suggestions. I appreciate them. I believe the list recognition method might be faster, on fetching some text data, which I tested with 30k + lines of data:

%%timeit
df.Category.str.strip().str.split(',').apply(len)
10 loops, best of 3: 44.8 ms per loop

%%timeit
df.Category.map(lambda x: [i.strip() for i in x.split(",")])
10 loops, best of 3: 28.4 ms per loop

Even accounting function function len

:

%%timeit
df.Category.map(lambda x: len([i.strip() for i in x.split(",")]))
10 loops, best of 3: 30.3 ms per loop

vk1011 · Answer 2 · 2015-05-12T22:01:47+0000

Use pd.read_csv to make typing easier:

business = pd.read_csv('FL_Yelp_Data_v2.csv')

http://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_csv.html

Once this has been created, you can create a function to split the category column into "," and calculate the length of the resulting list. Use lambda and apply.

Joe Germuska · Answer 3 · 2015-05-12T22:06:42+0000

It works:

business['# Categories'] = business['Category'].apply(lambda x: len(x.split(',')))

If you need to handle NA, etc., you can pass a more complex function instead of a lambda.

Wei-ting liao · Answer 4 · 2015-05-12T22:34:17+0000

You can do it...

for i in business['Category'].tolist():
    business.loc[i, '#Categories'] = len(i.split(","))

Arko · Answer 5 · 2017-06-01T17:43:01+0000

I had the same doubt. I counted the number of comma-separated words in each line. I resolved it like this:

data ['Number_of_Categories'] = data ['Category']. apply (lambda x: len (str (x) .split (',')))

Basically I convert each string to a string first, since Python recognizes it as a float and then does the len function. Hope it helps

How can I count the comma separated values ​​in one column of a panda table?

More articles:

How can I count the comma separated values in one column of a panda table?