Prediction Accuracy Metrics¶
The lenskit.metrics.predict
module contains prediction accuracy metrics.
These are intended to be used as a part of a Pandas splitapplycombine operation
on a data frame that contains both predictions and ratings; for convenience, the
lenskit.batch.predict()
function will include ratings in the prediction
frame when its input useritem pairs contains ratings. So you can perform the
following to compute peruser RMSE over some predictions:
preds = predict(algo, pairs)
user_rmse = preds.groupby('user').apply(lambda df: rmse(df.prediction, df.rating))
Metric Functions¶
Prediction metric functions take two series, predictions and truth.

lenskit.metrics.predict.
rmse
(predictions, truth, missing='error')¶ Compute RMSE (root mean squared error).
 Parameters
predictions (pandas.Series) – the predictions
truth (pandas.Series) – the ground truth ratings from data
missing (string) – how to handle predictions without truth. Can be one of
'error'
or'ignore'
.
 Returns
the root mean squared approximation error
 Return type
double

lenskit.metrics.predict.
mae
(predictions, truth, missing='error')¶ Compute MAE (mean absolute error).
 Parameters
predictions (pandas.Series) – the predictions
truth (pandas.Series) – the ground truth ratings from data
missing (string) – how to handle predictions without truth. Can be one of
'error'
or'ignore'
.
 Returns
the mean absolute approximation error
 Return type
double
Working with Missing Data¶
LensKit rating predictors do not report predictions when their core model is unable
to predict. For example, a nearestneighbor recommender will not score an item if
it cannot find any suitable neighbors. Following the Pandas convention, these items
are given a score of NaN (when Pandas implements better missing data handling, it will
use that, so use pandas.Series.isna()
/pandas.Series.notna()
, not the
isnan
versions.
However, this causes problems when computing predictive accuracy: recommenders are not being tested on the same set of items. If a recommender only scores the easy items, for example, it could do much better than a recommender that is willing to attempt more difficult items.
A good solution to this is to use a fallback predictor so that every item has a
prediction. In LensKit, lenskit.algorithms.basic.Fallback
implements
this functionality; it wraps a sequence of recommenders, and for each item, uses
the first one that generates a score.
You set it up like this:
cf = ItemItem(20)
base = Bias(damping=5)
algo = Fallback(cf, base)