Top-N Evaluation¶

LensKit’s support for top-N evaluation is in two parts, because there are some subtle complexities that make it more dfficult to get the right data in the right place for computing metrics correctly.

Top-N Analysis¶

The lenskit.topn module contains the utilities for carrying out top-N analysis, in conjucntion with lenskit.batch.recommend() and its wrapper in lenskit.batch.MultiEval.

The entry point to this is RecListAnalysis. This class encapsulates an analysis with one or more metrics, and can apply it to data frames of recommendations. An analysis requires two data frames: the recommendation frame contains the recommendations themselves, and the truth frame contains the ground truth data for the users. The analysis is flexible with regards to the columns that identify individual recommendation lists; usually these will consist of a user ID, data set identifier, and algorithm identifier(s), but the analysis is configurable and its defaults make minimal assumptions. The recommendation frame does need an item column with the recommended item IDs, and it should be in order within a single recommendation list.

The truth frame should contain (a subset of) the columns identifying recommendation lists, along with item and, if available, rating (if no rating is provided, the metrics that need a rating value will assume a rating of 1 for every item present). It can contain other items that custom metrics may find useful as well.

For example, a recommendation frame may contain:

DataSet
Partition
Algorithm
user
item
rank
score

And the truth frame:

DataSet
user
item
rating

The analysis will use this truth as the relevant item data for measuring the accuracy of the roecommendation lists. Recommendations will be matched to test ratings by data set, user, and item, using RecListAnalysis defaults.

class lenskit.topn.RecListAnalysis(group_cols=None, n_jobs=None)¶

Bases: object

Compute one or more top-N metrics over recommendation lists.

This method groups the recommendations by the specified columns, and computes the metric over each group. The default set of grouping columns is all columns except the following:

item
rank
score
rating

The truth frame, truth, is expected to match over (a subset of) the grouping columns, and contain at least an item column. If it also contains a rating column, that is used as the users’ rating for metrics that require it; otherwise, a rating value of 1 is assumed.

Warning

Currently, RecListAnalysis will silently drop users who received no recommendations. We are working on an ergonomic API for fixing this problem.

Parameters: group_cols (list) – The columns to group by, or None to use the default.

add_metric(metric, *, name=None, **kwargs)¶

Add a metric to the analysis.

A metric is a function of two arguments: the a single group of the recommendation frame, and the corresponding truth frame. The truth frame will be indexed by item ID. The recommendation frame will be in the order in the data. Many metrics are defined in lenskit.metrics.topn; they are re-exported from lenskit.topn for convenience.

Parameters

metric – The metric to compute.
name – The name to assign the metric. If not provided, the function name is used.
**kwargs – Additional arguments to pass to the metric.

compute(recs, truth, *, include_missing=False)¶

Run the analysis. Neither data frame should be meaningfully indexed.

Parameters

recs (pandas.DataFrame) – A data frame of recommendations.
truth (pandas.DataFrame) – A data frame of ground truth (test) data.
include_missing (bool) – True to include users from truth missing from recs. Matches are done via group columns that appear in both recs and truth.

Returns

The results of the analysis.

Return type

pandas.DataFrame

Metrics¶

The lenskit.metrics.topn module contains metrics for evaluating top-N recommendation lists.

Classification Metrics¶

These metrics treat the recommendation list as a classification of relevant items.

lenskit.metrics.topn.precision(recs, truth)¶: Compute recommendation precision.

lenskit.metrics.topn.recall(recs, truth)¶: Compute recommendation recall.

Ranked List Metrics¶

These metrics treat the recommendation list as a ranked list of items that may or may not be relevant.

lenskit.metrics.topn.recip_rank(recs, truth)¶

Compute the reciprocal rank of the first relevant item in a list of recommendations.

If no elements are relevant, the reciprocal rank is 0.

Utility Metrics¶

The NDCG function estimates a utility score for a ranked list of recommendations.

lenskit.metrics.topn.ndcg(recs, truth, discount=<ufunc 'log2'>)¶

Compute the normalized discounted cumulative gain.

Discounted cumultative gain is computed as:

\[\begin{align*} \mathrm{DCG}(L,u) & = \sum_{i=1}^{|L|} \frac{r_{ui}}{d(i)} \end{align*}\]

This is then normalized as follows:

\[\begin{align*} \mathrm{nDCG}(L, u) & = \frac{\mathrm{DCG}(L,u)}{\mathrm{DCG}(L_{\mathrm{ideal}}, u)} \end{align*}\]

Parameters

recs – The recommendation list.
truth – The user’s test data.
discount (ufunc) – The rank discount function. Each item’s score will be divided the discount of its rank, if the discount is greater than 1.

We also expose the internal DCG computation directly.

lenskit.metrics.topn._dcg(scores, discount=<ufunc 'log2'>)¶

Compute the Discounted Cumulative Gain of a series of recommended items with rating scores. These should be relevance scores; they can be \({0,1}\) for binary relevance data.

This is not a true top-N metric, but is a utility function for other metrics.

Parameters

scores (array-like) – The utility scores of a list of recommendations, in recommendation order.
discount (ufunc) – the rank discount function. Each item’s score will be divided the discount of its rank, if the discount is greater than 1.

Returns

the DCG of the scored items.

Return type

double