Random Number Generation¶
Current best practice for reproducible science in machine learning — including, but not limited to, recommender systems — is to use fixed random seeds so results can be reproduced precisely. This is useful both for reproducing the results themselves and for debugging.
To test for seed sensitivity, the entire experiment can be re-run with a different random seed and the conclusions compared.
LensKit is built to support this experimental design, making consistent use of
configurable random number generators throughout its algorithm implementations.
When run against NumPy 1.17 or later, it uses the new
numpy.random.SeedSequence facilities to provide consistent random
number generation and initialization. LensKit is compatible with older versions
of NumPy, but the RNG reproducibility logic will not fully function, and some
functions will not work.
For fully reproducible research, including random seeds and the use thereof, make sure that you are running on the same platform with the same verions of all packages (particularly LensKit, NumPy, SciPy, Pandas, and related packages), and are using at least NumPy 1.17. LensKit manages state for older versions of NumPy on a best-effort basis.
Developers using LensKit will be primarily intrested in the
function, so they can initialize LensKit’s random seed. LensKit components using
randomization also take an
rng option, usually in their constructor, to set
the seed on a per-operation basis; if the script is straightforward and performs
LensKit operations in a deterministic order (e.g. does not train multiple models
in parallel), initializing the global RNG is sufficient.
Developers writing new LensKit algorithms that use randomization will also need
pay attention to the
rng() function, along with
derive_seed() if predictions or recommendations, not just model
training, requires random values. Their constructors should take a parameter
rng_spec to specify the RNG initialization.
init_rng(seed, *keys, propagate=True)¶
Initialize the random infrastructure with a seed. This function should generally be called very early in the setup.
keys – Additional keys, to use as a
spawn_keyon NumPy 1.17. Passed to
propagate (bool) –
True, initialize other RNG infrastructure. This currently initializes:
propagate=False, LensKit is still fully seeded — no component included with LensKit uses any of the global RNGs, they all use RNGs seeded with the specified seed.
The random seed.
derive_seed(*keys, base=None, none_on_old_numpy=False)¶
Derive a seed from the root seed, optionally with additional seed keys.
keys (list of int or str) – Additional components to add to the spawn key for reproducible derivation. If unspecified, the seed’s internal counter is incremented (by calling
base (numpy.random.SeedSequence) – The base seed to use. If
None, uses the root seed.
Random Number Generators¶
These functions create actual RNGs from the LensKit global seed or a user-provided
seed. They can produce both new-style
numpy.random.Generator RNGs and
numpy.random.mtrand.RandomState; the latter is needed because
some libraries, such as Pandas and scikit-learn, do not yet know what to do with
a new-style RNG.
rng(spec=None, *, legacy=False)¶
Get a random number generator. This is similar to
sklearn.utils.check_random_seed(), but it usually returns a
A random number generator.
- Return type
derivable_rng(spec, *, legacy=False)¶
Get a derivable RNG, for use cases where the code needs to be able to reproducibly derive sub-RNGs for different keys, such as user IDs.
Any value supported by the seed parameter of
rng(), in addition to the following values:
a tuple of the form (seed,
Either of these forms will cause the returned function to re-derive new RNGs.
A function taking one (or more) key values, like
derive_seed(), and returning a random number generator (the type of which is determined by the
- Return type