How to replace DataFrame elements from other specified columns

I have DataFrame

like:

df = pd.DataFrame([{'v1':'a', 'v2':'b', 'v3':'1'},
                   {'v1':'2', 'v2':'c', 'v3':'d'}])

      

or

  v1 v2 v3
0  a  b  1
1  2  c  d

      

When the content of a column / row is "1", "2", or "3", I would like to replace its content with the corresponding element from the specified column. Ie, in the first row, the column v3

has a value "1"

, so I would like to replace it with the value of the first item in the column v1

. By doing this for both lines, I should get:

  v1 v2 v3
0  a  b  a
1  c  c  d

      

I can do it with the following code:

for i in range(3):
    for j in range(3):
        df.loc[df['v%d' % (i+1)]==('%d' % (j+1)),'v%d' % (i+1)]= \
            df.loc[df['v%d' % (i+1)]==('%d' % (j+1)),'v%d' % (j+1)]

      

Is there a less cumbersome way to do this?

+3


source to share


4 answers


df.apply(lambda row: [row['v'+v] if 'v'+v in row else v for v in row], 1)

      

Iterates through each row and replaces any value with the v

value in the named column 'v'+v

, if that column exists, otherwise it won't change the value.

output:

  v1 v2 v3
0  a  b  a
1  c  c  d 

      

Note that this does not limit substitution to numbers only. For example, if you have a column named 'va'

, it will replace all cells containing "a" with the value in the column 'va'

in that row. To limit the rows that you can replace, you can define a list of valid column names. For example, let's say you only wanted to make replacements from a column v1

:

acceptable_columns = ['v1']

df.apply(lambda row: [row['v'+v] if 'v'+v in acceptable_columns else v for v in row], 1)

      

output:

  v1 v2 v3
0  a  b  a
1  2  c  d

      

EDIT

It has been pointed out that the above answer throws an error if there are non-string types in your framework. You can avoid this by explicitly converting each cell value to a string:

df.apply(lambda row: [row['v'+str(v)] if 'v'+str(v) in row else v for v in row], 1)

      

ORIGINAL (WRONG) ANSWER BELOW

note that the answer below only applies when the replacement values ​​are diagonally (which is the case in the example, but the question was not asked ... mine is bad)



You can do it with pandas' replace

method and numpy diag

:

First, select the values ​​to replace, these will be 1 digits for the length of your data frame:

to_replace = [str(i) for i in range(1,len(df)+1)]  

      

Then select the values ​​to be replaced, this will be the diagonal of your dataframe:

import numpy as np
replace_with = np.diag(df)

      

Now you can do the actual replacement:

df.replace(to_replace, replace_with)

      

which gives:

  v1 v2 v3
0  a  b  a
1  c  c  d

      

And of course, if you want it all as one liner:

df.replace([str(i) for i in range(1,len(df)+1)], np.diag(df))

      

Add the inplace=True

arg keyword to replace

if you want to do in-place replacement.

+1


source


I did this:

df = pd.DataFrame([{'v1':'a', 'v2':'b', 'v3':'1'},
               {'v1':'2', 'v2':'c', 'v3':'d'}])

def replace_col(row, columns, col_num_dict={1: 'v1', 2: 'v2', 3: 'v3'}):
    for col in columns:
        x = getattr(row, col)
        try:
            x = int(x)
            if int(x) in col_num_dict.keys():
                setattr(row, col, getattr(row, col_num_dict[int(x)]))
        except ValueError:
            pass
    return row

df = df.apply(replace_col, axis=1, args=(df.columns,))

      



It applies replace_col function for every line. The row object attributes that correspond to its columns are replaced with the correct value from the same row. It looks a little more complicated due to the many set / get attributes, but it does exactly what it needs to be done without too much overhead.

+1


source


I see 2 options.

Loop over columns and then over display

mapping = {'1': 'v1', '3': 'v3', '2': 'v2'}

df1 = df.copy()
for column_name, column in df1.iteritems():
    for k, v in mapping.items():
        df1.loc[column == k, column_name] = df1.loc[column == k, v]

      

df1

    v1  v2  v3
0   a   b   a
1   c   c   d

      

Loop through columns, then loop over all "images"

df2 = df.copy()
for column_name, column in df2.iteritems():
    hits = column.isin(mapping.keys())
    for idx, item in column[hits].iteritems():
        df2.loc[idx, column_name] = df2.loc[idx, mapping[item]]

      

df2

    v1  v2  v3
0   a   b   a
1   c   c   d

      

If you choose the way you can reduce the 2 nested for-loops to 1 for the loop with itertools.product

+1


source


you can change data before converting to df

data = [{'v1':'a', 'v2':'b', 'v3':'1'},{'v1':'2', 'v2':'c', 'v3':'d'}]
mapping = {'1': 'v1', '3': 'v3', '2': 'v2'}
for idx,line in enumerate(data):
...     for item in line:
...         try:
...             int(line[item ])
...             data[idx][item ] = data[idx][mapping[line[item ]]]
...         except Exception:
...             pass

[{'v1': 'a', 'v2': 'b', 'v3': 'a'}, {'v1': 'c', 'v2': 'c', 'v3': 'd'}]

      

0


source







All Articles