DataFrame.drop_duplicates and DataFrame.drop do not delete rows

I have read in csv in pandas dataframe and it has five columns. Some rows only have duplicate values ​​in the second column, I want to remove these rows from the dataframe, but neither drop nor drop_duplicates work.

Here is my implementation:

#Read CSV
df = pd.read_csv(data_path, header=0, names=['a', 'b', 'c', 'd', 'e'])

print Series(df.b)

dropRows = []
#Sanitize the data to get rid of duplicates
for indx, val in enumerate(df.b): #for all the values
    if(indx == 0): #skip first indx
        continue

    if (val == df.b[indx-1]): #this is duplicate rtc value
        dropRows.append(indx)

print dropRows

df.drop(dropRows) #this doesnt work
df.drop_duplicates('b') #this doesnt work either

print Series(df.b)

      

when i print series df.b before and after they are the same length and i can see all duplicates. is there something wrong in my implementation?

+3


source to share


1 answer


As mentioned in the comments, drop

and drop_duplicates

creates a new DataFrame if no inplace argument is provided. All of these options will work:



df = df.drop(dropRows)
df = df.drop_duplicates('b') #this doesnt work either
df.drop(dropRows, inplace = True)
df.drop_duplicates('b', inplace = True)

      

+12


source







All Articles