Confusion Python.loc

I am making a Kaggle tutorial for Titanic using the Datacamp platform.

I understand the use of .loc in Pandas - to select values โ€‹โ€‹row by row using column labels ...

My confusion comes from the fact that in the Datacamp tutorial we want to find all the "male" entries in the "Sex" column and replace it with the value 0. They use the following piece of code this:

titanic.loc[titanic["Sex"] == "male", "Sex"] = 0

      

Can someone explain how this works? I thought .loc took input from row and column, so what is == for?

Must not be:

titanic.loc["male", "Sex"] = 0

      

Thank!

+3


source to share


1 answer


It sets the column Sex

to 1

, unless the condition is True

, the other values โ€‹โ€‹are intact:

titanic["Sex"] == "male"

      

Example:

titanic = pd.DataFrame({'Sex':['male','female', 'male']})
print (titanic)
      Sex
0    male
1  female
2    male

print (titanic["Sex"] == "male")
0     True
1    False
2     True
Name: Sex, dtype: bool

titanic.loc[titanic["Sex"] == "male", "Sex"] = 0
print (titanic)

0       0
1  female
2       0

      

This is very similar to boolean indexing

c loc

- it only selects the column values Sex

by condition:

print (titanic.loc[titanic["Sex"] == "male", "Sex"])
0    male
2    male
Name: Sex, dtype: object

      



But I think it's better to use here map

if only values male

and female

need to be converted to some other values:

titanic = pd.DataFrame({'Sex':['male','female', 'male']})
titanic["Sex"] = titanic["Sex"].map({'male':0, 'female':1})
print (titanic)
   Sex
0    0
1    1
2    0

      

EDIT:

Primary is loc

used to set a new value by index and columns:

titanic = pd.DataFrame({'Sex':['male','female', 'male']}, index=['a','b','c'])
print (titanic)
      Sex
a    male
b  female
c    male

titanic.loc["a", "Sex"] = 0
print (titanic)
      Sex
a       0
b  female
c    male

titanic.loc[["a", "b"], "Sex"] = 0
print (titanic)
    Sex
a     0
b     0
c  male

      

+3


source







All Articles