Str when replacing values ​​in pandas dataframe

My code dumps information from a website and puts it in a dataframe. But I'm not sure why the order of the code would throw an error:AttributeError: Can only use .str accessor with string values, which use np.object_ dtype in pandas

Basically, crossed data has more than 20 rows and 10 columns.

  • Some values are in parentheses ie: (2,333)

    , and I want to change it to: -2333

    .
  • Some meanings have words n.a

    and I want to change it tonumpy.nan

  • some values -

    and I want to change them to also numpy.nan

    .

Does not work

for final_df, engine_name in zip((df_foo, df_bar, df_far), (['engine_foo', 'engine_bar', 'engine_far'])):

# Replacing necessary items for final clean up

    final_df.replace('-', numpy.nan, inplace=True)
    final_df.replace('n.a.', numpy.nan, inplace=True)

    for i in final_df.columns:
        final_df[i] = final_df[i].str.replace(')', '')
        final_df[i] = final_df[i].str.replace(',', '')
        final_df[i] = final_df[i].str.replace('(', '-')

    # Appending Code to dataframe
    final_df = final_df.T
    final_df.insert(loc=0, column='Code', value=some_code)

# This produces the error - AttributeError: Can only use .str accessor with string values, which use np.object_ dtype in pandas

      

Work

for final_df, engine_name in zip((df_foo, df_bar, df_far), (['engine_foo', 'engine_bar', 'engine_far'])):

# Replacing necessary items for final clean up

    for i in final_df.columns:
        final_df[i] = final_df[i].str.replace(')', '')
        final_df[i] = final_df[i].str.replace(',', '')
        final_df[i] = final_df[i].str.replace('(', '-')

    final_df.replace('-', numpy.nan, inplace=True)
    final_df.replace('n.a.', numpy.nan, inplace=True)

    # Appending Code to dataframe
    final_df = final_df.T
    final_df.insert(loc=0, column='Code', value=some_code)

# This doesn't give me any errors and returns me what I want. 

      

Any thoughts on why this is happening?

+1


source to share


1 answer


A double works for me replace

- first with regex=True

to replace substrings, and second for all values:

np.random.seed(23)
df = pd.DataFrame(np.random.choice(['(2,333)','n.a.','-',2.34], size=(3,3)), 
                  columns=list('ABC'))
print (df)
      A     B        C
0  2.34     -  (2,333)
1  n.a.     -  (2,333)
2  2.34  n.a.  (2,333)

df1 = df.replace(['\(','\)','\,'], ['-','',''], regex=True).replace(['-','n.a.'], np.nan)
print(df1)
      A   B      C
0  2.34 NaN  -2333
1   NaN NaN  -2333
2  2.34 NaN  -2333

df1 = df.replace(['-','n.a.'], np.nan).replace(['\(','\)','\,'], ['-','',''], regex=True)
print(df1)  
      A   B      C
0  2.34 NaN  -2333
1   NaN NaN  -2333
2  2.34 NaN  -2333

      

EDIT:

Your error means that you want to replace some non-string column (for example, all columns NaN

in a column B

) str.replace

:

df1 = df.apply(lambda x: x.str.replace('\(','-').str.replace('\)','')
                           .str.replace(',','')).replace(['-','n.a.'], np.nan)
print(df1)
      A   B      C
0  2.34 NaN  -2333
1   NaN NaN  -2333
2  2.34 NaN  -2333 

      




df1 = df.replace(['-','n.a.'], np.nan)
       .apply(lambda x: x.str.replace('\(','-')
                         .str.replace('\)','')
                         .str.replace(',',''))
print(df1)

      

AttributeError: ('Can only use .str accessor with string values ​​that use np.object_ dtype in pandas', 'occurred at index B')

dtype

the column B

has float64

:

df1 = df.replace(['-','n.a.'], np.nan)
print(df1)
      A   B        C
0  2.34 NaN  (2,333)
1   NaN NaN  (2,333)
2  2.34 NaN  (2,333)

print (df1.dtypes)
A     object
B    float64
C     object
dtype: object

      

+2


source







All Articles