Batch-Running Recommenders

The functions in lenskit.batch enable you to generate many recommendations or predictions at the same time, useful for evaluations and experiments.

Recommendation

lenskit.batch.recommend(algo, users, n, candidates=None, *, nprocs=None, **kwargs)

Batch-recommend for multiple users. The provided algorithm should be a algorithms.Recommender.

Parameters:
  • algo – the algorithm
  • users (array-like) – the users to recommend for
  • n (int) – the number of recommendations to generate (None for unlimited)
  • candidates – the users’ candidate sets. This can be a function, in which case it will be passed each user ID; it can also be a dictionary, in which case user IDs will be looked up in it. Pass None to use the recommender’s built-in candidate selector (usually recommended).
  • nprocs (int) – The number of processes to use for parallel recommendations.
Returns:

A frame with at least the columns user, rank, and item; possibly also score, and any other columns returned by the recommender.

Rating Prediction

lenskit.batch.predict(algo, pairs, *, nprocs=None)

Generate predictions for user-item pairs. The provided algorithm should be a algorithms.Predictor or a function of two arguments: the user ID and a list of item IDs. It should return a dictionary or a pandas.Series mapping item IDs to predictions.

To use this function, provide a pre-fit algorithm:

>>> from lenskit.algorithms.basic import Bias
>>> from lenskit.metrics.predict import rmse
>>> ratings = util.load_ml_ratings()
>>> bias = Bias()
>>> bias.fit(ratings[:-1000])
<lenskit.algorithms.basic.Bias object at ...>
>>> preds = predict(bias, ratings[-1000:])
>>> preds.head()
       user  item  rating   timestamp  prediction
99004   664  8361     3.0  1393891425    3.288286
99005   664  8528     3.5  1393891047    3.559119
99006   664  8529     4.0  1393891173    3.573008
99007   664  8636     4.0  1393891175    3.846268
99008   664  8641     4.5  1393890852    3.710635
>>> rmse(preds['prediction'], preds['rating'])
0.8326992222...
Parameters:
  • algo (lenskit.algorithms.Predictor) – A rating predictor function or algorithm.
  • pairs (pandas.DataFrame) – A data frame of (user, item) pairs to predict for. If this frame also contains a rating column, it will be included in the result.
  • nprocs (int) – The number of processes to use for parallel batch prediction.
Returns:

a frame with columns user, item, and prediction containing the prediction results. If pairs contains a rating column, this result will also contain a rating column.

Return type:

pandas.DataFrame

Scripting Evaluation

class lenskit.batch.MultiEval(path, predict=True, recommend=100, candidates=<class 'lenskit.topn.UnratedCandidates'>, nprocs=None, combine=True)

A runner for carrying out multiple evaluations, such as parameter sweeps.

Parameters:
  • path (str or pathlib.Path) – the working directory for this evaluation. It will be created if it does not exist.
  • predict (bool) – whether to generate rating predictions.
  • recommend (int) – the number of recommendations to generate per user (None to disable top-N).
  • candidates (function) – the default candidate set generator for recommendations. It should take the training data and return a candidate generator, itself a function mapping user IDs to candidate sets.
  • combine (bool) – whether to combine output; if False, output will be left in separate files, if True, it will be in a single set of files (runs, recommendations, and preditions).
add_algorithms(algos, parallel=False, attrs=[], **kwargs)

Add one or more algorithms to the run.

Parameters:
  • algos (algorithm or list) – the algorithm(s) to add.
  • parallel (bool) – if True, allow this algorithm to be trained in parallel with others.
  • attrs (list of str) – a list of attributes to extract from the algorithm objects and include in the run descriptions.
  • kwargs – additional attributes to include in the run descriptions.
add_datasets(data, name=None, candidates=None, **kwargs)

Add one or more datasets to the run.

Parameters:
  • data

    The input data set(s) to run. Can be one of the following:

    • A tuple of (train, test) data.
    • An iterable of (train, test) pairs, in which case the iterable is not consumed until it is needed.
    • A function yielding either of the above, to defer data load until it is needed.

    Data can be either data frames or paths; paths are loaded after detection using util.read_df_detect().

  • kwargs – additional attributes pertaining to these data sets.
collect_results()

Collect the results from non-combined runs into combined output files.

persist_data()

Persist the data for an experiment, replacing in-memory data sets with file names. Once this has been called, the sweep can be pickled.

run(runs=None)

Run the evaluation.

Parameters:runs (int or set-like) – If provided, a specific set of runs to run. Useful for splitting an experiment into individual runs. This is a set of 1-based run IDs, not 0-based indexes.
run_count()

Get the number of runs in this evaluation.