How to replace DataFrame elements from other specified columns

Question

How to replace DataFrame elements from other specified columns

I have DataFrame

like:

df = pd.DataFrame([{'v1':'a', 'v2':'b', 'v3':'1'},
                   {'v1':'2', 'v2':'c', 'v3':'d'}])

or

  v1 v2 v3
0  a  b  1
1  2  c  d

When the content of a column / row is "1", "2", or "3", I would like to replace its content with the corresponding element from the specified column. Ie, in the first row, the column v3

has a value "1"

, so I would like to replace it with the value of the first item in the column v1

. By doing this for both lines, I should get:

  v1 v2 v3
0  a  b  a
1  c  c  d

I can do it with the following code:

for i in range(3):
    for j in range(3):
        df.loc[df['v%d' % (i+1)]==('%d' % (j+1)),'v%d' % (i+1)]= \
            df.loc[df['v%d' % (i+1)]==('%d' % (j+1)),'v%d' % (j+1)]

Is there a less cumbersome way to do this?

+3

python pandas

Ted to Jul 17 17 at 14:32

source to share

4 answers

I did this:

df = pd.DataFrame([{'v1':'a', 'v2':'b', 'v3':'1'},
               {'v1':'2', 'v2':'c', 'v3':'d'}])

def replace_col(row, columns, col_num_dict={1: 'v1', 2: 'v2', 3: 'v3'}):
    for col in columns:
        x = getattr(row, col)
        try:
            x = int(x)
            if int(x) in col_num_dict.keys():
                setattr(row, col, getattr(row, col_num_dict[int(x)]))
        except ValueError:
            pass
    return row

df = df.apply(replace_col, axis=1, args=(df.columns,))

It applies replace_col function for every line. The row object attributes that correspond to its columns are replaced with the correct value from the same row. It looks a little more complicated due to the many set / get attributes, but it does exactly what it needs to be done without too much overhead.

+1

Skirrebattie Jul 17 17 at 14:48

source to share

I see 2 options.

Loop over columns and then over display

mapping = {'1': 'v1', '3': 'v3', '2': 'v2'}

df1 = df.copy()
for column_name, column in df1.iteritems():
    for k, v in mapping.items():
        df1.loc[column == k, column_name] = df1.loc[column == k, v]

df1

    v1  v2  v3
0   a   b   a
1   c   c   d

Loop through columns, then loop over all "images"

df2 = df.copy()
for column_name, column in df2.iteritems():
    hits = column.isin(mapping.keys())
    for idx, item in column[hits].iteritems():
        df2.loc[idx, column_name] = df2.loc[idx, mapping[item]]

df2

    v1  v2  v3
0   a   b   a
1   c   c   d

If you choose the way you can reduce the 2 nested for-loops to 1 for the loop with itertools.product

+1

Maarten fabré Jul 17 17 at 15:12

source to share

you can change data before converting to df

data = [{'v1':'a', 'v2':'b', 'v3':'1'},{'v1':'2', 'v2':'c', 'v3':'d'}]
mapping = {'1': 'v1', '3': 'v3', '2': 'v2'}
for idx,line in enumerate(data):
...     for item in line:
...         try:
...             int(line[item ])
...             data[idx][item ] = data[idx][mapping[line[item ]]]
...         except Exception:
...             pass

[{'v1': 'a', 'v2': 'b', 'v3': 'a'}, {'v1': 'c', 'v2': 'c', 'v3': 'd'}]

0

galaxyan Jul 17 17 at 14:45

source to share

bunji · Accepted Answer · 2017-07-17T15:17:22+0000

df.apply(lambda row: [row['v'+v] if 'v'+v in row else v for v in row], 1)

Iterates through each row and replaces any value with the v

value in the named column 'v'+v

, if that column exists, otherwise it won't change the value.

output:

  v1 v2 v3
0  a  b  a
1  c  c  d

Note that this does not limit substitution to numbers only. For example, if you have a column named 'va'

, it will replace all cells containing "a" with the value in the column 'va'

in that row. To limit the rows that you can replace, you can define a list of valid column names. For example, let's say you only wanted to make replacements from a column v1

:

acceptable_columns = ['v1']

df.apply(lambda row: [row['v'+v] if 'v'+v in acceptable_columns else v for v in row], 1)

output:

  v1 v2 v3
0  a  b  a
1  2  c  d

EDIT

It has been pointed out that the above answer throws an error if there are non-string types in your framework. You can avoid this by explicitly converting each cell value to a string:

df.apply(lambda row: [row['v'+str(v)] if 'v'+str(v) in row else v for v in row], 1)

ORIGINAL (WRONG) ANSWER BELOW

note that the answer below only applies when the replacement values are diagonally (which is the case in the example, but the question was not asked ... mine is bad)

You can do it with pandas' replace

method and numpy diag

:

First, select the values to replace, these will be 1 digits for the length of your data frame:

to_replace = [str(i) for i in range(1,len(df)+1)]

Then select the values to be replaced, this will be the diagonal of your dataframe:

import numpy as np
replace_with = np.diag(df)

Now you can do the actual replacement:

df.replace(to_replace, replace_with)

which gives:

  v1 v2 v3
0  a  b  a
1  c  c  d

And of course, if you want it all as one liner:

df.replace([str(i) for i in range(1,len(df)+1)], np.diag(df))

Add the inplace=True

arg keyword to replace

if you want to do in-place replacement.

How to replace DataFrame elements from other specified columns

Loop over columns and then over display

Loop through columns, then loop over all "images"

More articles: