These utility functions are useful for data processing.
Miscellaneous utility functions.
Set up the logging infrastructure to show log output on
sys.stderr, where it will appear in the IPython message log.
Set up the logging infrastructure to show log output in the Jupyter notebook.
Timer class for recording elapsed wall time in operations.
Read a Pandas data frame, auto-detecting the file format based on filename suffix. The following file types are supported:
rng(spec=None, *, legacy=False)¶
Get a random number generator. This is similar to
sklearn.utils.check_random_seed(), but it usually returns a
A random number generator.
- Return type
init_rng(seed, *keys, propagate=True)¶
Initialize the random infrastructure with a seed. This function should generally be called very early in the setup.
keys – Additional keys, to use as a
spawn_keyon NumPy 1.17. Passed to
propagate (bool) –
True, initialize other RNG infrastructure. This currently initializes:
propagate=False, LensKit is still fully seeded — no component included with LensKit uses any of the global RNGs, they all use RNGs seeded with the specified seed.
The random seed.
derivable_rng(spec, *, legacy=False)¶
Get a derivable RNG, for use cases where the code needs to be able to reproducibly derive sub-RNGs for different keys, such as user IDs.
Any value supported by the seed parameter of
rng(), in addition to the following values:
a tuple of the form (seed,
Either of these forms will cause the returned function to re-derive new RNGs.
A function taking one (or more) key values, like
derive_seed(), and returning a random number generator (the type of which is determined by the
- Return type
proc_count(core_div=2, max_default=None, level=0)¶
Get the number of desired jobs for multiprocessing operations. This does not affect Numba or MKL multithreading.
This count can come from a number of sources:
The number of CPUs, divided by
max_default – The maximum number of processes to use if the environment variable is not configured.
level – The process nesting level. 0 is the outermost level of parallelism; subsequent levels control nesting. Levels deeper than 1 are rare, and it isn’t expected that callers actually have an accurate idea of the threading nesting, just that they are configuring a child. If the process count is unconfigured, then level 1 will use
core_div, and deeper levels will use 1.
The number of jobs desired.
- Return type
Clone an algorithm, but not its fitted data. This is like
scikit.base.clone(), but may not work on arbitrary SciKit estimators. LensKit algorithms are compatible with SciKit clone, however, so feel free to use that if you need more general capabilities.
This function is somewhat derived from the SciKit one.
>>> from lenskit.algorithms.bias import Bias >>> orig = Bias() >>> copy = clone(orig) >>> copy is orig False >>> copy.damping == orig.damping True