Utility Functions

These utility functions are useful for data processing.


Miscellaneous utility functions.


Set up the logging infrastructure to show log output on sys.stderr, where it will appear in the IPython message log.


Set up the logging infrastructure to show log output in the Jupyter notebook.

class lenskit.util.Stopwatch(start=True)

Bases: object

Timer class for recording elapsed wall time in operations.


Read a Pandas data frame, auto-detecting the file format based on filename suffix. The following file types are supported:


File has suffix .csv, read with pandas.read_csv().


File has suffix .parquet, .parq, or .pq, read with pandas.read_parquet().

lenskit.util.rng(spec=None, *, legacy=False)

Get a random number generator. This is similar to sklearn.utils.check_random_seed(), but it usually returns a numpy.random.Generator instead.


A random number generator.

Return type


lenskit.util.init_rng(seed, *keys, propagate=True)

Initialize the random infrastructure with a seed. This function should generally be called very early in the setup.

  • seed (int or numpy.random.SeedSequence) – The random seed to initialize with.

  • keys – Additional keys, to use as a spawn_key on NumPy 1.17. Passed to derive_seed().

  • propagate (bool) –

    If True, initialize other RNG infrastructure. This currently initializes:

    If propagate=False, LensKit is still fully seeded — no component included with LensKit uses any of the global RNGs, they all use RNGs seeded with the specified seed.


The random seed.

lenskit.util.derivable_rng(spec, *, legacy=False)

Get a derivable RNG, for use cases where the code needs to be able to reproducibly derive sub-RNGs for different keys, such as user IDs.



Any value supported by the seed parameter of rng(), in addition to the following values:

  • the string 'user'

  • a tuple of the form (seed, 'user')

Either of these forms will cause the returned function to re-derive new RNGs.


A function taking one (or more) key values, like derive_seed(), and returning a random number generator (the type of which is determined by the legacy parameter).

Return type


lenskit.util.proc_count(core_div=2, max_default=None, level=0)

Get the number of desired jobs for multiprocessing operations. This does not affect Numba or MKL multithreading.

This count can come from a number of sources:

  • The LK_NUM_PROCS environment variable

  • The number of CPUs, divided by core_div (default 2)

  • core_div (int or None) – The divisor to scale down the number of cores; None to turn off core-based fallback.

  • max_default – The maximum number of processes to use if the environment variable is not configured.

  • level – The process nesting level. 0 is the outermost level of parallelism; subsequent levels control nesting. Levels deeper than 1 are rare, and it isn’t expected that callers actually have an accurate idea of the threading nesting, just that they are configuring a child. If the process count is unconfigured, then level 1 will use core_div, and deeper levels will use 1.


The number of jobs desired.

Return type



Clone an algorithm, but not its fitted data. This is like scikit.base.clone(), but may not work on arbitrary SciKit estimators. LensKit algorithms are compatible with SciKit clone, however, so feel free to use that if you need more general capabilities.

This function is somewhat derived from the SciKit one.

>>> from lenskit.algorithms.bias import Bias
>>> orig = Bias()
>>> copy = clone(orig)
>>> copy is orig
>>> copy.damping == orig.damping