Model Sharing

The lenskit.sharing module provides utilities for managing models and sharing them between processes, particularly for the multiprocessing in lenskit.batch.

Sharing Mode

The only piece algorithm developers usually need to directly handle is the concept of ‘sharing mode’ when implementing custom pickling logic. To save space, it is reasonable to exclude intermediate data structures, such as caches or inverse indexes, from the pickled representation of an algorithm, and reconstruct them when the model is loaded.

However, LensKit’s multi-process sharing also uses pickling to capture the object state while using shared memory for numpy.ndarray objects. In these cases, the structures should be pickled, so they can be shared between model instances.

To support this, we have the concept of sharing mode. Code that excludes objects when pickling should call in_share_context() to determine if that exclusion should actually happen.

lenskit.sharing.in_share_context()

Query whether sharing mode is active. If True, we are currently in a sharing_mode() context, which means model pickling will be used for cross-process sharing.

lenskit.sharing.sharing_mode()

Context manager to tell models that pickling will be used for cross-process sharing, not model persistence.

Model Store API

Model stores handle persisting models into shared memory, cleaning up shared memory, and making objects available to other classes.

LensKit users and algorithm implementers will generally not need to use this code themselves, unlessthey are implementing their own batch processing logic.

lenskit.sharing.get_store(reuse=True, *, in_process=False)

Get a model store, using the best available on the current platform. The resulting store should be used as a context manager, as in:

>>> with get_store() as store:
...     pass

This function uses the following priority list for locating a suitable store:

  1. The currently-active store, if reuse=True

  2. A no-op store, if in_process=True

  3. SHMModelStore, if on Python 3.8

  4. JoblibModelStore

Parameters
  • reuse (bool) – If a store is active (with a with block), use that store instead of creating a new one.

  • in_process (bool) – If True, then create a no-op store for use without multiprocessing.

Returns

the model store.

Return type

BaseModelStore

class lenskit.sharing.BaseModelStore

Bases: lenskit.sharing.BaseModelClient

Base class for storing models for access across processes.

Stores are also context managers that initalize themselves and clean themselves up. As context managers, they are also re-entrant, and register themselves so that create_store() can re-use existing managers.

abstract client()

Get a client for the model store. Clients are cheap to pass to child processes for multiprocessing.

Returns

the model client.

Return type

BaseModelClient

init()

Initialize the store.

abstract put_model(model)

Store a model in the model store.

Parameters

model (object) – the model to store.

Returns

a key to retrieve the model with BaseModelClient.get_model()

put_serialized(path, binpickle=False)

Deserialize a model and load it into the store.

The base class method unpickles path and calls put_model().

Parameters
shutdown()

Shut down the store

class lenskit.sharing.BaseModelClient

Bases: object

Model store client to get models given keys. Clients must be able to be cheaply pickled and de-pickled to enable worker processes to access them.

abstract get_model(key)

Get a model from the model store.

Parameters

key – the model key to retrieve.

Returns

The model, previously stored with BaseModelStore.put_model(), wrapped in a SharedObject to manage underlying resources.

Return type

SharedObject

class lenskit.sharing.SharedObject(obj)

Bases: object

Wrapper for a shared object that can release it when the object is no longer needed.

Objects of this type are context managers, that return the shared object (not themselves) when entered.

Any other refernces to object, or its contents, must be released before calling release() or exiting the context manager. Among other things, that means that you will need to delete its variable:

with client.get_model(k) as model:
    # model here is the actual model object wrapped by the SharedObject
    # returned by get_model
    pass       # actually do the things you want to do
    del model  # release model, so the shared object can be closed

Be careful of stray references to the model object! Some things we have seen causing stray references include:

  • passing the algorithm to a logger (call str() on it explicitly), at least in the test harness

The default implementation uses sys.getrefcount() to provide debugging support to help catch stray references.

object

the underlying shared object.

release()

Release the shared object. Automatically called by __exit__(), so in normal use of a shared object with a with statement, this method is not needed.

The base class implementation simply deletes the object reference. Subclasses should override this method to handle their own release logic.

Model Store Implementations

We provide several model store implementations.

Memory Mapping

The memory-mapped-file store works on any supported platform and Python version. It uses Joblib’s memory-mapped Pickle extension to store models on disk and use their storage to back memory-mapped views of major data structures.

class lenskit.sharing.file.FileModelStore(*, path=None, reserialize=True)

Bases: lenskit.sharing.BaseModelStore, lenskit.sharing.file.FileClient

Model store using BinPickle’s memory-mapping pickle support.

Parameters
  • path – the path to use; otherwise uses a new temp directory under util.scratch_dir().

  • reserialize – if True (the default), models passed to put_serialized() are re-serialized in the BinPickle storage, even if they are binpickle files.

class lenskit.sharing.file.FileClient

Bases: lenskit.sharing.BaseModelClient

Client using BinPickle’s memory-mapping pickle support.

Shared Memory

This store uses Python 3.8’s multiprocessing.shared_memory module, along with out-of-band buffer support in Pickle Protcol 5, to pass model data through shared memory.

class lenskit.sharing.sharedmem.SHMModelStore

Bases: lenskit.sharing.BaseModelStore, lenskit.sharing.sharedmem.SHMClient

Model store using shared memory and Pickle Protocol 5.

This model store only works in Python 3.8 and later, as it requires both the new multiprocessing.shared_memory module and Pickle Protocol 5. It also depends on a Numpy version new enough to support Protocol 5 pickles.

Parameters
  • path – the path to use; otherwise uses a new temp directory under util.scratch_dir().

  • reserialize – if True (the default), models passed to put_serialized() are re-serialized in the SHM storage.

class lenskit.sharing.sharedmem.SHMClient

Bases: lenskit.sharing.BaseModelClient