# Prediction Accuracy Metrics¶

The lenskit.metrics.predict module contains prediction accuracy metrics. These are intended to be used as a part of a Pandas split-apply-combine operation on a data frame that contains both predictions and ratings; for convenience, the lenskit.batch.predict() function will include ratings in the prediction frame when its input user-item pairs contains ratings. So you can perform the following to compute per-user RMSE over some predictions:

from lenskit.datasets import MovieLens
from lenskit.algorithms.bias import Bias
from lenskit.batch import predict
from lenskit.metrics.predict import rmse
ratings = MovieLens('ml-small').ratings.sample(frac=10
test = ratings.iloc[:1000]
train = ratings.iloc[1000:]
algo = Bias()
algo.fit(train)
preds = predict(algo, pairs)
user_rmse = preds.groupby('user').apply(lambda df: rmse(df.prediction, df.rating))
user_rmse.mean()


## Metric Functions¶

Prediction metric functions take two series, predictions and truth.

lenskit.metrics.predict.rmse(predictions, truth, missing='error')

Compute RMSE (root mean squared error).

Parameters
• predictions (pandas.Series) – the predictions

• truth (pandas.Series) – the ground truth ratings from data

• missing (string) – how to handle predictions without truth. Can be one of 'error' or 'ignore'.

Returns

the root mean squared approximation error

Return type

double

lenskit.metrics.predict.mae(predictions, truth, missing='error')

Compute MAE (mean absolute error).

Parameters
• predictions (pandas.Series) – the predictions

• truth (pandas.Series) – the ground truth ratings from data

• missing (string) – how to handle predictions without truth. Can be one of 'error' or 'ignore'.

Returns

the mean absolute approximation error

Return type

double

## Working with Missing Data¶

LensKit rating predictors do not report predictions when their core model is unable to predict. For example, a nearest-neighbor recommender will not score an item if it cannot find any suitable neighbors. Following the Pandas convention, these items are given a score of NaN (when Pandas implements better missing data handling, it will use that, so use pandas.Series.isna()/pandas.Series.notna(), not the isnan versions.

However, this causes problems when computing predictive accuracy: recommenders are not being tested on the same set of items. If a recommender only scores the easy items, for example, it could do much better than a recommender that is willing to attempt more difficult items.

A good solution to this is to use a fallback predictor so that every item has a prediction. In LensKit, lenskit.algorithms.basic.Fallback implements this functionality; it wraps a sequence of recommenders, and for each item, uses the first one that generates a score.

You set it up like this:

cf = ItemItem(20)
base = Bias(damping=5)
algo = Fallback(cf, base)