How do I replace all instances of a specific character in a dataframe?

I have a dataframe that has many examples of '?' on separate lines. The data type of the columns is "object". Now I want to replace all '?' from 0. How to do it?

+3


source to share


2 answers


Consider a data block df

df = pd.DataFrame([['?', 1], [2, '?']])

print(df)

   0  1
0  ?  1
1  2  ?

      

replace

df.replace('?', 0)

   0  1
0  0  1
1  2  0

      

mask

or where

df.mask(df == '?', 0)
# df.where(df != '?', 0)

   0  1
0  0  1
1  2  0

      




However, imagine your dataframe has ?

in longer lines.

df = pd.DataFrame([['a?', 1], [2, '?b']])

print(df)

    0   1
0  a?   1
1   2  ?b

      

replace

from regex=True

df.replace('\?', '0', regex=True)

    0   1
0  a0   1
1   2  0b

      

+4


source


I think it's better replace

before string

0

, because otherwise you get mixed types - numeric with strings and some pandas may not work:

df.replace('?', '0')

      

Also, if you need to replace several ?

with one, 0

add +

to match one or more values:



df = pd.DataFrame([['a???', '?'], ['s?', '???b']])
print(df)
      0     1
0  a???     ?
1    s?  ???b

df = df.replace('\?+', '0', regex=True)
print (df)
    0   1
0  a0   0
1  s0  0b

      


df = df.replace('[?]+', '0', regex=True)
print (df)
    0   1
0  a0   0
1  s0  0b

      

+2


source







All Articles