Prediction Accuracy Metrics¶

The lenskit.metrics.predict module contains prediction accuracy metrics. These are intended to be used as a part of a Pandas split-apply-combine operation on a data frame that contains both predictions and ratings; for convenience, the lenskit.batch.predict() function will include ratings in the prediction frame when its input user-item pairs contains ratings. So you can perform the following to compute per-user RMSE over some predictions:

from lenskit.datasets import MovieLens
from lenskit.algorithms.bias import Bias
from lenskit.batch import predict
from lenskit.metrics.predict import user_metric, rmse
ratings = MovieLens('ml-small').ratings.sample(frac=0.1)
test = ratings.iloc[:1000]
train = ratings.iloc[1000:]
algo = Bias()
algo.fit(train)
preds = predict(algo, test)
user_metric(preds, metric=rmse)

Metric Functions¶

Prediction metric functions take two series, predictions and truth, and compute a prediction accuracy metric for them.

lenskit.metrics.predict.rmse(predictions, truth, missing='error')¶

Compute RMSE (root mean squared error).

Parameters

predictions (pandas.Series) – the predictions
truth (pandas.Series) – the ground truth ratings from data
missing (string) – how to handle predictions without truth. Can be one of 'error' or 'ignore'.

Returns

the root mean squared approximation error

Return type

double

lenskit.metrics.predict.mae(predictions, truth, missing='error')¶

Compute MAE (mean absolute error).

Parameters

predictions (pandas.Series) – the predictions
truth (pandas.Series) – the ground truth ratings from data
missing (string) – how to handle predictions without truth. Can be one of 'error' or 'ignore'.

Returns

the mean absolute approximation error

Return type

double

Convenience Functions¶

These functions make it easier to compute global and per-user prediction metrics.

lenskit.metrics.predict.user_metric(predictions, *, score_column='prediction', metric=<function rmse>, **kwargs)¶

Compute a mean per-user prediction accuracy metric for a set of predictions.

Parameters

predictions (pandas.DataFrame) – Data frame containing the predictions. Must have a column rating containing ground truth and a score column with rating predictions, along with a 'user' column with user IDs.
score_column (str) – The name of the score column (defaults to 'prediction').
metric (function) – A metric function of two parameters (prediction and truth). Defaults to rmse().

Returns

The mean of the per-user value of the metric.

Return type

float

lenskit.metrics.predict.global_metric(predictions, *, score_column='prediction', metric=<function rmse>, **kwargs)¶

Compute a global prediction accuracy metric for a set of predictions.

Parameters

predictions (pandas.DataFrame) – Data frame containing the predictions. Must have a column rating containing ground truth and a score column with rating predictions.
score_column (str) – The name of the score column (defaults to 'prediction').
metric (function) – A metric function of two parameters (prediction and truth). Defaults to rmse().

Returns

The global metric value.

Return type

float

Working with Missing Data¶

LensKit rating predictors do not report predictions when their core model is unable to predict. For example, a nearest-neighbor recommender will not score an item if it cannot find any suitable neighbors. Following the Pandas convention, these items are given a score of NaN (when Pandas implements better missing data handling, it will use that, so use pandas.Series.isna()/pandas.Series.notna(), not the isnan versions.

However, this causes problems when computing predictive accuracy: recommenders are not being tested on the same set of items. If a recommender only scores the easy items, for example, it could do much better than a recommender that is willing to attempt more difficult items.

A good solution to this is to use a fallback predictor so that every item has a prediction. In LensKit, lenskit.algorithms.basic.Fallback implements this functionality; it wraps a sequence of recommenders, and for each item, uses the first one that generates a score.

You set it up like this:

cf = ItemItem(20)
base = Bias(damping=5)
algo = Fallback(cf, base)