Converting to and from numpy np.random.RandomState and Python random.Random?

I would like to be able to convert back and forth between the standard Python Random and the numpy np.random.RandomState. They both use the Mersenne Twister algorithm, so it should be possible (as long as they don't use different versions of this algorithm).

I started exploring the getstate / setstate and get_state / set_state methods of these objects. But I'm not sure how to transform their details.

import numpy as np
import random

rng1 = np.random.RandomState(seed=0)
rng2 = random.Random(seed=0)

state1 = rng1.get_state()
state2 = rng2.getstate()

      

Checking every state I see:

>>> print(state1) 
('MT19937', array([0, 1, 1812433255, ..., 1796872496], dtype=uint32), 624, 0, 0.0)
>>> print(state2) 
(3, (2147483648, 766982754, ..., 1057334138, 2902720905, 624), None)

      

The first state is a 5 s tuple len(state1[1]) = 624

.

The second state is a 3s tuple len(state2[1]) = 625

. It seems that the last item in state2 is actually 624 in state1, which means the arrays are actually the same size. So far, so good. They seem to be reasonably compatible.

Unfortunately, the internal numbers don't have an obvious correspondence, so the seed 0 leads to different states, which makes sense because rng1.rand() = .548

and rng2.random() = .844

. Thus, the algorithm looks somewhat different.

However, I don't need them to match perfectly. I just need to be able to set the state of one rng from another in a deterministic way, without affecting the state of the first.

Ideally, when I used the state of the first to set the state of the second without calling any random methods, and then used the second to set the state of the first, the first state would remain unchanged, but this is not required.

I currently have a hacked together method that simply replaces a 624 length list that I can extract from both rngs. However, I'm not sure if there are any problems with this approach. Can anyone more knowledgeable on this topic shed some light?

Here is my approach, but I'm not sure what works correctly.

np_rng = np.random.RandomState(seed=0)
py_rng = random.Random(0)

# Convert python to numpy random state (incomplete)
py_state = py_rng.getstate()
np_rng = np.random.RandomState(seed=0)
np_state = np_rng.get_state()
new_np_state = (
    np_state[0],
    np.array(py_state[1][0:-1], dtype=np.uint32),
    np_state[2], np_state[3], np_state[4])
np_rng.set_state(new_np_state)

# Convert numpy to python random state (incomplete)
np_state = np_rng.get_state()
py_rng = random.Random(0)
py_state = py_rng.getstate()
new_py_state = (
    py_state[0], tuple(np_state[1].tolist() + [len(np_state[1])]),
    py_state[1]
)
py_rng.setstate(new_py_state)

      


EDIT:

After doing some research, I checked what happens to the state of over 10 random function calls.

np_rng = np.random.RandomState(seed=0)
py_rng = random.Random(0)

for i in range(10):
    np_rng.rand()
    npstate = np_rng.get_state()
    print([npstate[0], npstate[1][[0, 1, 2, -2, -1]], npstate[2], npstate[3], npstate[4]])

for i in range(10):
    py_rng.random()
    pystate = py_rng.getstate()
    print([pystate[0], pystate[1][0:3] + pystate[1][-2:], pystate[2]])


['MT19937', array([2443250962, 1093594115, 1878467924, 2648828502, 1678096082], dtype=uint32), 2, 0, 0.0]
['MT19937', array([2443250962, 1093594115, 1878467924, 2648828502, 1678096082], dtype=uint32), 4, 0, 0.0]
['MT19937', array([2443250962, 1093594115, 1878467924, 2648828502, 1678096082], dtype=uint32), 6, 0, 0.0]
['MT19937', array([2443250962, 1093594115, 1878467924, 2648828502, 1678096082], dtype=uint32), 8, 0, 0.0]
['MT19937', array([2443250962, 1093594115, 1878467924, 2648828502, 1678096082], dtype=uint32), 10, 0, 0.0]
['MT19937', array([2443250962, 1093594115, 1878467924, 2648828502, 1678096082], dtype=uint32), 12, 0, 0.0]
['MT19937', array([2443250962, 1093594115, 1878467924, 2648828502, 1678096082], dtype=uint32), 14, 0, 0.0]
['MT19937', array([2443250962, 1093594115, 1878467924, 2648828502, 1678096082], dtype=uint32), 16, 0, 0.0]
['MT19937', array([2443250962, 1093594115, 1878467924, 2648828502, 1678096082], dtype=uint32), 18, 0, 0.0]
['MT19937', array([2443250962, 1093594115, 1878467924, 2648828502, 1678096082], dtype=uint32), 20, 0, 0.0]
[3, (1372342863, 3221959423, 4180954279, 418789356, 2), None]
[3, (1372342863, 3221959423, 4180954279, 418789356, 4), None]
[3, (1372342863, 3221959423, 4180954279, 418789356, 6), None]
[3, (1372342863, 3221959423, 4180954279, 418789356, 8), None]
[3, (1372342863, 3221959423, 4180954279, 418789356, 10), None]
[3, (1372342863, 3221959423, 4180954279, 418789356, 12), None]
[3, (1372342863, 3221959423, 4180954279, 418789356, 14), None]
[3, (1372342863, 3221959423, 4180954279, 418789356, 16), None]
[3, (1372342863, 3221959423, 4180954279, 418789356, 18), None]
[3, (1372342863, 3221959423, 4180954279, 418789356, 20), None]

      

I would expect the first element in each tuple to be just the version of the algorithm being used.

It's interesting to see that the integers 624 don't seem to change. It's always like that?

However, I'm still not sure what the latter means None in the Python version, and the last number 2 is in the numpy version.

+3


source to share


1 answer


NumPy state form is RandomState

documented :

Returns: out: tuple (str, ndarray of 624 uints, int, int, float)

The returned tuple has the following elements:

  • string 'MT19937.
  • 1-D array of 624 unsigned integer keys.
  • integer.
  • an integer has_gauss.
  • float cached_gaussian.

The last two entries refer to the generator state for standard normal values: NumPy uses a Box-Muller Transform , which generates these deviations in pairs. So the first call to the gaussian generator creates two values, returns the first, and then stores the second for later use. The second call then retrieves this second value. Thus, here we have additional state that needs to be stored and retrieved.

The Python state form is Random

not documented, but it is easy to extract from the source. Starting from CPython 3.6.1 it looks like this:

def getstate(self):
    """Return internal state; can be passed to setstate() later."""
    return self.VERSION, super().getstate(), self.gauss_next

      

Again, Python generates the normal deviations in pairs, and self.gauss_next

- None

if there is no excess normal deviation, and the value of the stored deviation, if available.

To find out what it returns super().getstate()

, you need to dive into source C: it's a 625-length tuple containing 624 words that form the Mersenne Twister state, along with the current position in that word set. So the last entry in this tuple corresponds to the value pos

at index 2 of the NumPy state.

Here's an example of converting from Python state to NumPy state, ignoring the details of Gaussian information:

Python 3.6.1 (default, May 23 2017, 18:09:41) 
[GCC 4.2.1 Compatible Apple LLVM 7.0.2 (clang-700.1.81)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import numpy as np
>>> import random
>>> np_rng = np.random.RandomState(seed=0)
>>> py_rng = random.Random(0)
>>> version, (*mt_state, pos), gauss_next = py_rng.getstate() 
>>> np_rng.set_state(('MT19937', mt_state, pos))

      



After setting the NumPy state RandomState

from Python state, Random

we can see that the floats generated from the two RNGs are the same:

>>> py_rng.random(), np_rng.uniform()
(0.8444218515250481, 0.8444218515250481)
>>> py_rng.random(), np_rng.uniform()
(0.7579544029403025, 0.7579544029403025)
>>> py_rng.random(), np_rng.uniform()
(0.420571580830845, 0.420571580830845)

      

And here's the reverse conversion:

>>> _, words, pos, _, _ = np_rng.get_state()
>>> py_rng.setstate((3, tuple(map(int, words)) + (pos,), None))

      

And as before, we can check if the output of the two generators matches:

>>> py_rng.random(), np_rng.uniform()
(0.5488135039273248, 0.5488135039273248)
>>> py_rng.random(), np_rng.uniform()
(0.7151893663724195, 0.7151893663724195)
>>> py_rng.random(), np_rng.uniform()
(0.6027633760716439, 0.6027633760716439)
>>> all(py_rng.random() == np_rng.uniform() for _ in range(1000000))
True

      

Python and NumPy use different algorithms to generate the normal deviations (although both algorithms used generate these deviations in pairs), so even if we are passing a Gaussian-related state, we cannot expect the generated normal deviations to match. But if all you want to do is somehow store the Python state information in a NumPy state object (and vice versa), so that converting from one state to another and back won't lose information, which is easy enough to do: if has_gauss

is zero in state NumPy, use None

for the last Python state entry, and if has_gauss

nonzero, use the value cached_gaussian

from the NumPy state in the last Python State entry. Here are a couple of functions that implement these transformations:

PY_VERSION = 3
NP_VERSION = 'MT19937'

def npstate_to_pystate(npstate):
    """
    Convert state of a NumPy RandomState object to a state
    that can be used by Python Random.
    """
    version, keys, pos, has_gauss, cached_gaussian = npstate
    pystate = (
        PY_VERSION,
        tuple(map(int, keys)) + (int(pos),),
        cached_gaussian if has_gauss else None,
    )
    return pystate


def pystate_to_npstate(pystate):
    """
    Convert state of a Python Random object to state usable
    by NumPy RandomState.
    """
    version, (*keys, pos), cached_gaussian = pystate
    has_gauss = cached_gaussian is not None
    npstate = (
        NP_VERSION,
        keys,
        pos,
        has_gauss,
        cached_gaussian if has_gauss else 0.0
    )
    return npstate

      

+6


source







All Articles