[pandas]: how to figure out what's inside a column
I would like to know what are the different variables in the dataframe, here is an example so you can understand my problem.
dic = { 'a': ['pippo', 'giacomo', 'giacomo', 'francesco', 'luigi', 'francesco', 'luigi'] }
df = pd.DataFrame(dic)
Information frame
a
pippo
giacomo
giacomo
francesco
luigi
francesco
luigi
francesco
luigi
What I am looking for is what gives me the result
pippo
giacomo
francesco
luigi
So that I can understand what different things are available in my dataframe
+3
source to share
1 answer
You can use drop_duplicates
:
df = df.drop_duplicates()
print (df)
a
0 pippo
1 giacomo
3 francesco
4 luigi
If you need to specify a column to check for duplicates:
df = df.drop_duplicates(subset=['a'])
print (df)
a
0 pippo
1 giacomo
3 francesco
4 luigi
For another way out - numpy array
use unique
:
arr = df['a'].unique()
print (arr)
['pippo' 'giacomo' 'francesco' 'luigi']
L = df['a'].unique().tolist()
print (L)
['pippo', 'giacomo', 'francesco', 'luigi']
+2
source to share