How to randomly select some non-null elements from numpy.ndarray?

I have implemented a matrix factorization model, say R = U * V, and now I would train and test this model.

For this purpose, given the sparse matrix R (zero for missing value), I want to hide some non-zero elements in training first and use those non-zero elements as a test case later.

How can I randomly select some non-null elements from numpy.ndarray? Also, I need to remember the index and column position of these selected items in order to use those items when testing.

eg:

In [2]: import numpy as np

In [4]: mtr = np.random.rand(10,10)

In [5]: mtr
Out[5]: 
array([[ 0.92685787,  0.95496193,  0.76878455,  0.12304856,  0.13804963,
         0.30867502,  0.60245974,  0.00797898,  0.1060602 ,  0.98277982],
       [ 0.88879888,  0.40209901,  0.35274404,  0.73097713,  0.56238248,
         0.380625  ,  0.16432029,  0.5383006 ,  0.0678564 ,  0.42875591],
       [ 0.42343761,  0.31957986,  0.5991212 ,  0.04898903,  0.2908878 ,
         0.13160296,  0.26938537,  0.91442668,  0.72827097,  0.4511198 ],
       [ 0.63979934,  0.33421621,  0.09218392,  0.71520048,  0.57100522,
         0.37205284,  0.59726293,  0.58224992,  0.58690505,  0.4791199 ],
       [ 0.35219557,  0.34954002,  0.93837312,  0.2745864 ,  0.89569075,
         0.81244084,  0.09661341,  0.80673646,  0.83756759,  0.7948081 ],
       [ 0.09173706,  0.86250006,  0.22121994,  0.21097563,  0.55090202,
         0.80954817,  0.97159981,  0.95888693,  0.43151554,  0.2265607 ],
       [ 0.00723128,  0.95690539,  0.94214806,  0.01721733,  0.12552314,
         0.65977765,  0.20845669,  0.44663729,  0.98392716,  0.36258081],
       [ 0.65994805,  0.47697842,  0.35449045,  0.73937445,  0.68578224,
         0.44278095,  0.86743906,  0.5126411 ,  0.75683392,  0.73354572],
       [ 0.4814301 ,  0.92410622,  0.85267402,  0.44856078,  0.03887269,
         0.48868498,  0.83618382,  0.49404473,  0.37328248,  0.18134919],
       [ 0.63999748,  0.48718656,  0.54826717,  0.1001681 ,  0.1940816 ,
         0.3937014 ,  0.48768013,  0.70610649,  0.03213063,  0.88371607]])

In [6]: mtr = np.where(mtr>0.5, 0, mtr)

In [7]: %clear


In [8]: mtr
Out[8]: 
array([[ 0.        ,  0.        ,  0.        ,  0.12304856,  0.13804963,
         0.30867502,  0.        ,  0.00797898,  0.1060602 ,  0.        ],
       [ 0.        ,  0.40209901,  0.35274404,  0.        ,  0.        ,
         0.380625  ,  0.16432029,  0.        ,  0.0678564 ,  0.42875591],
       [ 0.42343761,  0.31957986,  0.        ,  0.04898903,  0.2908878 ,
         0.13160296,  0.26938537,  0.        ,  0.        ,  0.4511198 ],
       [ 0.        ,  0.33421621,  0.09218392,  0.        ,  0.        ,
         0.37205284,  0.        ,  0.        ,  0.        ,  0.4791199 ],
       [ 0.35219557,  0.34954002,  0.        ,  0.2745864 ,  0.        ,
         0.        ,  0.09661341,  0.        ,  0.        ,  0.        ],
       [ 0.09173706,  0.        ,  0.22121994,  0.21097563,  0.        ,
         0.        ,  0.        ,  0.        ,  0.43151554,  0.2265607 ],
       [ 0.00723128,  0.        ,  0.        ,  0.01721733,  0.12552314,
         0.        ,  0.20845669,  0.44663729,  0.        ,  0.36258081],
       [ 0.        ,  0.47697842,  0.35449045,  0.        ,  0.        ,
         0.44278095,  0.        ,  0.        ,  0.        ,  0.        ],
       [ 0.4814301 ,  0.        ,  0.        ,  0.44856078,  0.03887269,
         0.48868498,  0.        ,  0.49404473,  0.37328248,  0.18134919],
       [ 0.        ,  0.48718656,  0.        ,  0.1001681 ,  0.1940816 ,
         0.3937014 ,  0.48768013,  0.        ,  0.03213063,  0.        ]])

      

Given such a sparse ndarray, how can I select 20% of the non-null elements and remember their position?

+3


source to share


1 answer


We will use numpy.random.choice

. First, we get arrays of indices (i,j)

where the data is nonzero:

i,j = np.nonzero(x)

      

Then we'll select 20% of them:



ix = np.random.choice(len(i), np.floor(0.2 * len(i)), replace=False)

      

Here ix

is a list of random, unique indices, 20% length i

and j

(length i

and j

is the number of non-zero entries). To rebuild the indexes, we do i[ix]

and j[ix]

, so we can select 20% of non-zero entries x

by writing:

print x[i[ix], j[ix]]

      

+7


source







All Articles