Python dataframe buffered values ββwith if statement
I want the if statement to display all REF_INTs that were duplicated, I tried this:
(df_picru['REF_INT'].value_counts()==1)
and it shows me all values ββwith true or false, but I don't want to do something like this:
if (df_picru['REF_INT'].value_counts()==1)
print "df_picru['REF_INT']"
source to share
Another solution using groupby.
#groupby REF_INT and then count the occurrence and set as duplicate if count is greater than 1
df_picru.groupby('REF_INT').apply(lambda x: 'Duplicated' if len(x)> 1 else 'Unique')
Out[21]:
REF_INT
1 Unique
2 Duplicated
3 Unique
8 Duplicated
dtype: object
value_counts might work if you make minor changes:
df_picru.REF_INT.value_counts()[lambda x: x>1]
Out[31]:
2 2
8 2
Name: REF_INT, dtype: int64
source to share
I think you need duplicated
for the boolean mask and for the new column numpy.where
:
mask = df_picru['REF_INT'].duplicated(keep=False)
Example:
df_picru = pd.DataFrame({'REF_INT':[1,2,3,8,8,2]})
mask = df_picru['REF_INT'].duplicated(keep=False)
print (mask)
0 False
1 True
2 False
3 True
4 True
5 True
Name: REF_INT, dtype: bool
df_picru['new'] = np.where(mask, 'duplicates', 'unique')
print (df_picru)
REF_INT new
0 1 unique
1 2 duplicates
2 3 unique
3 8 duplicates
4 8 duplicates
5 2 duplicates
If you need to check at least one value unique
, you need any
to convert it boolean mask
- array
to a scalar True
or False
:
if mask.any():
print ('at least one unique')
at least one unique
source to share