Random Number Generation¶
Current best practice for reproducible science in machine learning — including, but not limited to, recommender systems — is to use fixed random seeds so results can be reproduced precisely. This is useful both for reproducing the results themselves and for debugging.
To test for seed sensitivity, the entire experiment can be re-run with a different random seed and the conclusions compared.
LensKit is built to support this experimental design, making consistent use of
configurable random number generators throughout its algorithm implementations.
When run against NumPy 1.17 or later, it uses the new numpy.random.Generator
and numpy.random.SeedSequence
facilities to provide consistent random
number generation and initialization. LensKit is compatible with older versions
of NumPy, but the RNG reproducibility logic will not fully function, and some
functions will not work.
Developers using LensKit will be primarily intrested in the init_rng()
function, so they can initialize LensKit’s random seed. LensKit components using
randomization also take an rng
option, usually in their constructor, to set
the seed on a per-operation basis; if the script is straightforward and performs
LensKit operations in a deterministic order (e.g. does not train multiple models
in parallel), initializing the global RNG is sufficient.
Developers writing new LensKit algorithms that use randomization will also need
pay attention to the rng()
function, along with derivable_rng()
and derive_seed()
if predictions or recommendations, not just model
training, requires random values. Their constructors should take a parameter
rng_spec
to specify the RNG initialization.
Seeds¶
LensKit random number generation starts from a global root seed, accessible with
get_root_seed()
. This seed can be initialized with init_rng()
.
-
lenskit.util.random.
init_rng
(seed, *keys, propagate=True)¶ Initialize the random infrastructure with a seed. This function should generally be called very early in the setup.
- Parameters
seed (int or numpy.random.SeedSequence) – The random seed to initialize with.
keys – Additional keys, to use as a
spawn_key
on NumPy 1.17. Passed toderive_seed()
.propagate (bool) –
If
True
, initialize other RNG infrastructure. This currently initializes:np.random.seed()
If
propagate=False
, LensKit is still fully seeded — no component included with LensKit uses any of the global RNGs, they all use RNGs seeded with the specified seed.
- Returns
The random seed.
-
lenskit.util.random.
derive_seed
(*keys, base=None)¶ Derive a seed from the root seed, optionally with additional seed keys.
- Parameters
keys (list of int or str) – Additional components to add to the spawn key for reproducible derivation. If unspecified, the seed’s internal counter is incremented.
base (numpy.random.SeedSequence) – The base seed to use. If
None
, uses the root seed.
-
lenskit.util.random.
get_root_seed
()¶ Get the root seed.
- Returns
The LensKit root seed.
- Return type
Random Number Generators¶
These functions create actual RNGs from the LensKit global seed or a user-provided
seed. They can produce both new-style numpy.random.Generator
RNGs and
legacy numpy.random.mtrand.RandomState
; the latter is needed because
some libraries, such as Pandas and scikit-learn, do not yet know what to do with
a new-style RNG.
-
lenskit.util.random.
rng
(seed=None, *, legacy=False)¶ Get a random number generator. This is similar to
sklearn.utils.check_random_seed()
, but it usually returns anumpy.random.Generator
instead.- Parameters
seed –
The seed for this RNG. Can be any of the following types:
int
None
numpy.random.mtrand.RandomState
legacy (bool) – If
True
, returnnumpy.random.mtrand.RandomState
instead of a new-stylenumpy.random.Generator
.
- Returns
A random number generator.
- Return type
-
lenskit.util.random.
derivable_rng
(spec, *, legacy=False)¶ Get a derivable RNG, for use cases where the code needs to be able to reproducibly derive sub-RNGs for different keys, such as user IDs.
- Parameters
spec –
Any value supported by the seed parameter of
rng()
, in addition to the following values:the string
'user'
a tuple of the form (seed,
'user'
)
Either of these forms will cause the returned function to re-derive new RNGs.
- Returns
A function taking one (or more) key values, like
derive_seed()
, and returning a random number generator (the type of which is determined by thelegacy
parameter).- Return type
function