Pythonic way to randomly assign pandas data records

Question

Pythonic way to randomly assign pandas data records

Suppose we have a data frame

In [1]: df = pd.DataFrame(np.random.randint(0,100,size=(100, 4)), columns=list('ABCD'))

In [2]: df
Out[3]:
     A   B   C   D
0   45  88  44  92
1   62  34   2  86
2   85  65  11  31
3   74  43  42  56
4   90  38  34  93
5    0  94  45  10
..  ..  ..  ..  ..

How can I randomly replace x% of all records with a value, for example None

?

In [4]: something(df, percent=25)
Out[5]:
     A   B   C   D
0   45  88  None  92
1   62  34   2  86
2   None  None  11  31
3   74  43  None  56
4   90  38  34  None
5    None  94  45  10
..  ..  ..  ..  ..

I found information on sampling individual axes and I can imagine a way to randomly generate integers in the dimensions of my data frame and set those to be equal None

, but this is not very Pythonic.

Edit: forgot the "path" in the title

+3

python pandas random

Ian Gilman 09 Apr 17 at 4:20 am

source to share

1 answer

DSM · Answer 1 · 2017-04-09T04:31:17+0000

You can combine DataFrame.where

and np.random.uniform

:

In [37]: df
Out[37]: 
   A  B  C  D
0  1  0  2  2
1  2  2  0  3
2  3  0  0  3
3  0  2  3  1

In [38]: df.where(np.random.uniform(size=df.shape) > 0.3, None)
Out[38]: 
      A  B     C     D
0     1  0     2  None
1     2  2     0     3
2     3  0  None  None
3  None  2     3  None

It's not the shortest but gets the job done.

Note that you have to ask yourself if you really want to do this if you still have calculations. If you put None in the column, then pandas will have to use a slow dtype object instead of fast like int64 or float64.

Pythonic way to randomly assign pandas data records

More articles: