[pandas]: how to figure out what's inside a column

I would like to know what are the different variables in the dataframe, here is an example so you can understand my problem.

dic = { 'a': ['pippo', 'giacomo', 'giacomo', 'francesco', 'luigi', 'francesco', 'luigi'] }
df = pd.DataFrame(dic)

      

Information frame

 a
 pippo
 giacomo
 giacomo
 francesco
 luigi
 francesco
 luigi
 francesco
 luigi

      

What I am looking for is what gives me the result

 pippo
 giacomo
 francesco
 luigi

      

So that I can understand what different things are available in my dataframe

+3


source to share


1 answer


You can use drop_duplicates

:

df = df.drop_duplicates()
print (df)
           a
0      pippo
1    giacomo
3  francesco
4      luigi

      

If you need to specify a column to check for duplicates:



df = df.drop_duplicates(subset=['a'])
print (df)
           a
0      pippo
1    giacomo
3  francesco
4      luigi

      

For another way out - numpy array

use unique

:

arr = df['a'].unique()
print (arr)
['pippo' 'giacomo' 'francesco' 'luigi']

L = df['a'].unique().tolist()
print (L)
['pippo', 'giacomo', 'francesco', 'luigi']

      

+2


source







All Articles