How to replace DataFrame elements from other specified columns
I have DataFrame
like:
df = pd.DataFrame([{'v1':'a', 'v2':'b', 'v3':'1'},
{'v1':'2', 'v2':'c', 'v3':'d'}])
or
v1 v2 v3
0 a b 1
1 2 c d
When the content of a column / row is "1", "2", or "3", I would like to replace its content with the corresponding element from the specified column. Ie, in the first row, the column v3
has a value "1"
, so I would like to replace it with the value of the first item in the column v1
. By doing this for both lines, I should get:
v1 v2 v3
0 a b a
1 c c d
I can do it with the following code:
for i in range(3):
for j in range(3):
df.loc[df['v%d' % (i+1)]==('%d' % (j+1)),'v%d' % (i+1)]= \
df.loc[df['v%d' % (i+1)]==('%d' % (j+1)),'v%d' % (j+1)]
Is there a less cumbersome way to do this?
source to share
df.apply(lambda row: [row['v'+v] if 'v'+v in row else v for v in row], 1)
Iterates through each row and replaces any value with the v
value in the named column 'v'+v
, if that column exists, otherwise it won't change the value.
output:
v1 v2 v3
0 a b a
1 c c d
Note that this does not limit substitution to numbers only. For example, if you have a column named 'va'
, it will replace all cells containing "a" with the value in the column 'va'
in that row. To limit the rows that you can replace, you can define a list of valid column names. For example, let's say you only wanted to make replacements from a column v1
:
acceptable_columns = ['v1']
df.apply(lambda row: [row['v'+v] if 'v'+v in acceptable_columns else v for v in row], 1)
output:
v1 v2 v3
0 a b a
1 2 c d
EDIT
It has been pointed out that the above answer throws an error if there are non-string types in your framework. You can avoid this by explicitly converting each cell value to a string:
df.apply(lambda row: [row['v'+str(v)] if 'v'+str(v) in row else v for v in row], 1)
ORIGINAL (WRONG) ANSWER BELOW
note that the answer below only applies when the replacement values ββare diagonally (which is the case in the example, but the question was not asked ... mine is bad)
You can do it with pandas' replace
method and numpy diag
:
First, select the values ββto replace, these will be 1 digits for the length of your data frame:
to_replace = [str(i) for i in range(1,len(df)+1)]
Then select the values ββto be replaced, this will be the diagonal of your dataframe:
import numpy as np
replace_with = np.diag(df)
Now you can do the actual replacement:
df.replace(to_replace, replace_with)
which gives:
v1 v2 v3
0 a b a
1 c c d
And of course, if you want it all as one liner:
df.replace([str(i) for i in range(1,len(df)+1)], np.diag(df))
Add the inplace=True
arg keyword to replace
if you want to do in-place replacement.
source to share
I did this:
df = pd.DataFrame([{'v1':'a', 'v2':'b', 'v3':'1'},
{'v1':'2', 'v2':'c', 'v3':'d'}])
def replace_col(row, columns, col_num_dict={1: 'v1', 2: 'v2', 3: 'v3'}):
for col in columns:
x = getattr(row, col)
try:
x = int(x)
if int(x) in col_num_dict.keys():
setattr(row, col, getattr(row, col_num_dict[int(x)]))
except ValueError:
pass
return row
df = df.apply(replace_col, axis=1, args=(df.columns,))
It applies replace_col function for every line. The row object attributes that correspond to its columns are replaced with the correct value from the same row. It looks a little more complicated due to the many set / get attributes, but it does exactly what it needs to be done without too much overhead.
source to share
I see 2 options.
Loop over columns and then over display
mapping = {'1': 'v1', '3': 'v3', '2': 'v2'}
df1 = df.copy()
for column_name, column in df1.iteritems():
for k, v in mapping.items():
df1.loc[column == k, column_name] = df1.loc[column == k, v]
df1
v1 v2 v3
0 a b a
1 c c d
Loop through columns, then loop over all "images"
df2 = df.copy()
for column_name, column in df2.iteritems():
hits = column.isin(mapping.keys())
for idx, item in column[hits].iteritems():
df2.loc[idx, column_name] = df2.loc[idx, mapping[item]]
df2
v1 v2 v3
0 a b a
1 c c d
If you choose the way you can reduce the 2 nested for-loops to 1 for the loop with itertools.product
source to share
you can change data before converting to df
data = [{'v1':'a', 'v2':'b', 'v3':'1'},{'v1':'2', 'v2':'c', 'v3':'d'}]
mapping = {'1': 'v1', '3': 'v3', '2': 'v2'}
for idx,line in enumerate(data):
... for item in line:
... try:
... int(line[item ])
... data[idx][item ] = data[idx][mapping[line[item ]]]
... except Exception:
... pass
[{'v1': 'a', 'v2': 'b', 'v3': 'a'}, {'v1': 'c', 'v2': 'c', 'v3': 'd'}]
source to share