Str when replacing values in pandas dataframe

Question

Str when replacing values in pandas dataframe

My code dumps information from a website and puts it in a dataframe. But I'm not sure why the order of the code would throw an error:AttributeError: Can only use .str accessor with string values, which use np.object_ dtype in pandas

Basically, crossed data has more than 20 rows and 10 columns.

Some values are in parentheses ie: (2,333)

, and I want to change it to: -2333

.
Some meanings have words n.a

and I want to change it tonumpy.nan
some values -

and I want to change them to also numpy.nan

.

Does not work

for final_df, engine_name in zip((df_foo, df_bar, df_far), (['engine_foo', 'engine_bar', 'engine_far'])):

# Replacing necessary items for final clean up

    final_df.replace('-', numpy.nan, inplace=True)
    final_df.replace('n.a.', numpy.nan, inplace=True)

    for i in final_df.columns:
        final_df[i] = final_df[i].str.replace(')', '')
        final_df[i] = final_df[i].str.replace(',', '')
        final_df[i] = final_df[i].str.replace('(', '-')

    # Appending Code to dataframe
    final_df = final_df.T
    final_df.insert(loc=0, column='Code', value=some_code)

# This produces the error - AttributeError: Can only use .str accessor with string values, which use np.object_ dtype in pandas

Work

for final_df, engine_name in zip((df_foo, df_bar, df_far), (['engine_foo', 'engine_bar', 'engine_far'])):

# Replacing necessary items for final clean up

    for i in final_df.columns:
        final_df[i] = final_df[i].str.replace(')', '')
        final_df[i] = final_df[i].str.replace(',', '')
        final_df[i] = final_df[i].str.replace('(', '-')

    final_df.replace('-', numpy.nan, inplace=True)
    final_df.replace('n.a.', numpy.nan, inplace=True)

    # Appending Code to dataframe
    final_df = final_df.T
    final_df.insert(loc=0, column='Code', value=some_code)

# This doesn't give me any errors and returns me what I want.

Any thoughts on why this is happening?

+1

python pandas

jake wong 09 jul. 17 at 8:24

source to share

1 answer

jezrael · Accepted Answer · 2017-07-09T10:09:56+0000

A double works for me replace

- first with regex=True

to replace substrings, and second for all values:

np.random.seed(23)
df = pd.DataFrame(np.random.choice(['(2,333)','n.a.','-',2.34], size=(3,3)), 
                  columns=list('ABC'))
print (df)
      A     B        C
0  2.34     -  (2,333)
1  n.a.     -  (2,333)
2  2.34  n.a.  (2,333)

df1 = df.replace(['\(','\)','\,'], ['-','',''], regex=True).replace(['-','n.a.'], np.nan)
print(df1)
      A   B      C
0  2.34 NaN  -2333
1   NaN NaN  -2333
2  2.34 NaN  -2333

df1 = df.replace(['-','n.a.'], np.nan).replace(['\(','\)','\,'], ['-','',''], regex=True)
print(df1)  
      A   B      C
0  2.34 NaN  -2333
1   NaN NaN  -2333
2  2.34 NaN  -2333

EDIT:

Your error means that you want to replace some non-string column (for example, all columns NaN

in a column B

) str.replace

:

df1 = df.apply(lambda x: x.str.replace('\(','-').str.replace('\)','')
                           .str.replace(',','')).replace(['-','n.a.'], np.nan)
print(df1)
      A   B      C
0  2.34 NaN  -2333
1   NaN NaN  -2333
2  2.34 NaN  -2333

df1 = df.replace(['-','n.a.'], np.nan)
       .apply(lambda x: x.str.replace('\(','-')
                         .str.replace('\)','')
                         .str.replace(',',''))
print(df1)

AttributeError: ('Can only use .str accessor with string values that use np.object_ dtype in pandas', 'occurred at index B')

dtype

the column B

has float64

:

df1 = df.replace(['-','n.a.'], np.nan)
print(df1)
      A   B        C
0  2.34 NaN  (2,333)
1   NaN NaN  (2,333)
2  2.34 NaN  (2,333)

print (df1.dtypes)
A     object
B    float64
C     object
dtype: object

Str when replacing values ​​in pandas dataframe

More articles:

Str when replacing values in pandas dataframe