Confusion Python.loc
I am making a Kaggle tutorial for Titanic using the Datacamp platform.
I understand the use of .loc in Pandas - to select values โโrow by row using column labels ...
My confusion comes from the fact that in the Datacamp tutorial we want to find all the "male" entries in the "Sex" column and replace it with the value 0. They use the following piece of code this:
titanic.loc[titanic["Sex"] == "male", "Sex"] = 0
Can someone explain how this works? I thought .loc took input from row and column, so what is == for?
Must not be:
titanic.loc["male", "Sex"] = 0
Thank!
source to share
It sets the column Sex
to 1
, unless the condition is True
, the other values โโare intact:
titanic["Sex"] == "male"
Example:
titanic = pd.DataFrame({'Sex':['male','female', 'male']})
print (titanic)
Sex
0 male
1 female
2 male
print (titanic["Sex"] == "male")
0 True
1 False
2 True
Name: Sex, dtype: bool
titanic.loc[titanic["Sex"] == "male", "Sex"] = 0
print (titanic)
0 0
1 female
2 0
This is very similar to boolean indexing
c loc
- it only selects the column values Sex
by condition:
print (titanic.loc[titanic["Sex"] == "male", "Sex"])
0 male
2 male
Name: Sex, dtype: object
But I think it's better to use here map
if only values male
and female
need to be converted to some other values:
titanic = pd.DataFrame({'Sex':['male','female', 'male']})
titanic["Sex"] = titanic["Sex"].map({'male':0, 'female':1})
print (titanic)
Sex
0 0
1 1
2 0
EDIT:
Primary is loc
used to set a new value by index and columns:
titanic = pd.DataFrame({'Sex':['male','female', 'male']}, index=['a','b','c'])
print (titanic)
Sex
a male
b female
c male
titanic.loc["a", "Sex"] = 0
print (titanic)
Sex
a 0
b female
c male
titanic.loc[["a", "b"], "Sex"] = 0
print (titanic)
Sex
a 0
b 0
c male
source to share