Editing the original data frame after creating a copy, but before editing the copy Modifies the copy
I am trying to understand how copying a pandas frame works. When I assign a copy of an object in python, I am not used to changes to the original object affecting copies of that object. For example:
x = 3 y = x x = 4 print(y) 3
While it was x
subsequently changed, y remains unchanged. In contrast, when I make changes to pandas df
after assigning a copy to it df1
, the copies are also affected by the changes in the original DataFrame.
import pandas as pd
import numpy as np
def minusone(x):
return int(x) - 1
df = pd.DataFrame({"A": [10,20,30,40,50], "B": [20, 30, 10, 40, 50], "C": [32, 234, 23, 23, 42523]})
df1 = df
print(df1['A'])
0 10
1 20
2 30
3 40
4 50
Name: A, dtype: int64
df['A'] = np.vectorize(minusone)(df['A'])
print(df1['A'])
0 9
1 19
2 29
3 39
4 49
Name: A, dtype: int64
The solution seems to make a deep copy with help copy.deepcopy()
, but since this behavior is different from the behavior I am used to in python, I was wondering if anyone could explain what the reasoning behind this difference is or if it is a bug.
source to share
In the first example, you made no change to the value x
. You have assigned a new value x
.
In your second example, you changed the value df
by changing one of your columns.
You can also see the effect with built-in types:
>>> x = []
>>> y = x
>>> x.append(1)
>>> y
[1]
This behavior is not specific to Pandas; this is fundamental to Python. There are many, many questions on this site on the same issue, all due to the same misunderstanding. Syntax
barename = value
does not have the same behavior as any other construct in Python .
When using name[key] = value
either name.attr = value
or name.methodcall()
you can change the value of the object it points to name
, you can copy something, etc. By using name = value
(where name
is the only identifier, no dots, no parentheses, etc.) you never mutate anything, and you never copy anything.
In the first example, you used the syntax x = ...
. In your second example, you used the syntax df['A'] = ...
. They are not the same syntax, so you cannot assume that they have the same behavior.
How you create a copy depends on the type of object you are trying to copy. For your case, use df1 = df.copy()
.
source to share