DataFrame.drop_duplicates and DataFrame.drop do not delete rows
I have read in csv in pandas dataframe and it has five columns. Some rows only have duplicate values in the second column, I want to remove these rows from the dataframe, but neither drop nor drop_duplicates work.
Here is my implementation:
#Read CSV
df = pd.read_csv(data_path, header=0, names=['a', 'b', 'c', 'd', 'e'])
print Series(df.b)
dropRows = []
#Sanitize the data to get rid of duplicates
for indx, val in enumerate(df.b): #for all the values
if(indx == 0): #skip first indx
continue
if (val == df.b[indx-1]): #this is duplicate rtc value
dropRows.append(indx)
print dropRows
df.drop(dropRows) #this doesnt work
df.drop_duplicates('b') #this doesnt work either
print Series(df.b)
when i print series df.b before and after they are the same length and i can see all duplicates. is there something wrong in my implementation?
+3
source to share
1 answer
As mentioned in the comments, drop
and drop_duplicates
creates a new DataFrame if no inplace argument is provided. All of these options will work:
df = df.drop(dropRows)
df = df.drop_duplicates('b') #this doesnt work either
df.drop(dropRows, inplace = True)
df.drop_duplicates('b', inplace = True)
+12
source to share