How do I replace all instances of a specific character in a dataframe?
Consider a data block df
df = pd.DataFrame([['?', 1], [2, '?']])
print(df)
0 1
0 ? 1
1 2 ?
replace
df.replace('?', 0)
0 1
0 0 1
1 2 0
mask
or where
df.mask(df == '?', 0)
# df.where(df != '?', 0)
0 1
0 0 1
1 2 0
However, imagine your dataframe has ?
in longer lines.
df = pd.DataFrame([['a?', 1], [2, '?b']])
print(df)
0 1
0 a? 1
1 2 ?b
replace
from regex=True
df.replace('\?', '0', regex=True)
0 1
0 a0 1
1 2 0b
source to share
I think it's better replace
before string
0
, because otherwise you get mixed types - numeric with strings and some pandas may not work:
df.replace('?', '0')
Also, if you need to replace several ?
with one, 0
add +
to match one or more values:
df = pd.DataFrame([['a???', '?'], ['s?', '???b']])
print(df)
0 1
0 a??? ?
1 s? ???b
df = df.replace('\?+', '0', regex=True)
print (df)
0 1
0 a0 0
1 s0 0b
df = df.replace('[?]+', '0', regex=True)
print (df)
0 1
0 a0 0
1 s0 0b
source to share