Editing the original data frame after creating a copy, but before editing the copy Modifies the copy

I am trying to understand how copying a pandas frame works. When I assign a copy of an object in python, I am not used to changes to the original object affecting copies of that object. For example:

x = 3
y = x
x = 4
print(y)
3

      

While it was x

subsequently changed, y remains unchanged. In contrast, when I make changes to pandas df

after assigning a copy to it df1

, the copies are also affected by the changes in the original DataFrame.

import pandas as pd
import numpy as np

def minusone(x):
    return int(x) - 1

df = pd.DataFrame({"A": [10,20,30,40,50], "B": [20, 30, 10, 40, 50], "C": [32, 234, 23, 23, 42523]})

df1 = df


print(df1['A'])

0    10
1    20
2    30
3    40
4    50
Name: A, dtype: int64

df['A'] = np.vectorize(minusone)(df['A'])

print(df1['A'])

0     9
1    19
2    29
3    39
4    49
Name: A, dtype: int64

      

The solution seems to make a deep copy with help copy.deepcopy()

, but since this behavior is different from the behavior I am used to in python, I was wondering if anyone could explain what the reasoning behind this difference is or if it is a bug.

+3


source to share


1 answer


In the first example, you made no change to the value x

. You have assigned a new value x

.

In your second example, you changed the value df

by changing one of your columns.

You can also see the effect with built-in types:

>>> x = []
>>> y = x
>>> x.append(1)
>>> y
[1]

      

This behavior is not specific to Pandas; this is fundamental to Python. There are many, many questions on this site on the same issue, all due to the same misunderstanding. Syntax



barename = value

      

does not have the same behavior as any other construct in Python .

When using name[key] = value

either name.attr = value

or name.methodcall()

you can change the value of the object it points to name

, you can copy something, etc. By using name = value

(where name

is the only identifier, no dots, no parentheses, etc.) you never mutate anything, and you never copy anything.

In the first example, you used the syntax x = ...

. In your second example, you used the syntax df['A'] = ...

. They are not the same syntax, so you cannot assume that they have the same behavior.

How you create a copy depends on the type of object you are trying to copy. For your case, use df1 = df.copy()

.

+6


source







All Articles