Algorithm Implementation Tips

Implementing algorithms is fun, but there are a few things that are good to keep in mind.

In general, development follows the following:

Correct
Clear
Fast

In that order. Further, we always want LensKit to be usable in an easy fashion. Code implementing algorithms, however, may be quite complex in order to achieve good performance.

Performance

We use Numba to optimize critical code paths and provide parallelism in a number of cases, such as ALS training. See the ALS source code for examples.

We also use the CSR package for sparse matrices that are usable from Numba-accelerated code, and to provide unified access to important sparse matrix operations that use MKL acceleration when available. Previous versions of LensKit included the MKL code directly, but we have moved that logic over into CSR.

If you are working on an algorithm implementation that needs access to additional MKL operations, please add the relevant operations to CSR to keep LensKit pure Python + Numba. We do not have plans to re-add the MKL wrapper logic to the LensKit core.

Random Number Generation

LensKit uses seedbank for managing RNG seeds and constructing random number generation.

In general, algorithms using randomization should have an rng_spec parameter that takes a seed or RNG, and pass this to seedbank.numpy_rng() to get a random number generator. Algorithms that use randomness at predict or recommendation time, not just training time, should support the value 'user' for the rng parameter, and if it is passed, derive a new seed for each user using seedbank.derive_seed() to allow reproducibility in the face of parallelism for common experimental designs. lenskit.util.derivable_rng() automates this logic.

Algorithm Implementation Tips

Performance

Pickling and Sharing

Random Number Generation