Documenting Experiments

When publishing results — either formally, through a venue such as ACM Recsys, or informally in your organization, it’s important to clearly and completely specify how the evaluation and algorithms were run.

Since LensKit is a toolkit of individual pieces that you combine into your experiment however best fits your work, there is not a clear mechanism for automating this reporting. This document is to provide guidance for what and how you should report, and how it maps to the LensKit code you are using.

Common Evaluation Problems Checklist

This checklist is to help you make sure that your evaluation and results are accurately reported.

Pass include_missing=True to compute(). This operation defaults to False for compatiability reasons, but the default will change in the future.
Correctly fill missing values from the evaluation metric results. They are reported as NaN (Pandas NA) so you can distinguish between empty lists and lists with no relevant items, but should be appropraitely filled before computing aggregates.
Pass k to add_metric() with the target list length for your experiment. LensKit cannot reliably detect how long you intended to make the recommendation lists, so you need to specify the intended length to the metrics in order to correctly account for it.

Reporting Algorithms

Note

This section is still in progress.

You need to clearly report the algorithms that you have used along with their hyperparameters.

The algorithm name should be the name of the class that you used (e.g. ItemItem). The hyperparameters are the options specified to the constructor, except for options that only affect algorithn peformance but not behavior.

For example:

Algorithm	Hyperparameters
ItemItem	\(k_\mathrm{max}=20, k_\mathrm{min}=2, s_\mathrm{min}=1.0\times 10^{-3}\)
ImplicitMF	\(k=50, \lambda_u=0.1, \lambda_i=0.1, w=40\)

If you use a top-N implementation other than the default TopN, or reconfigure its candidate selector, also clearly document that.

Reporting Experimental Setup

The important things to document are the data splitting strategy, the target recommendation list length, and the candidate selection strategy (if something other than the default of all unrated items is used).

If you use one of LensKit’s built-in splitting strategies from crossfold without modification, report:

The splitting function used.
The number of partitions or test samples.
The number of users per sample (when using sample_users) or ratings per sample (when using sample_ratings).
When using a user-based strategy (either partition_users or sample_users), the test rating selection strategy (class and parameters), e.g. SampleN(5).

Any additional pre-processing (e.g. filtering ratings) should also be clearly described. If additional test pre-processing is done, such as removing ratings below a threshold, document that as well.

Since the experimental protocol is implemented directly in Python code, automated reporting is not practical.

Reporting Metrics

Reporting the metrics themelves is relatively straightforward. The lenskit.topn.RecListAnalysis.compute() method will return a data frame with a metric score for each list. Group those by algorithm and report the resulting scores (typically with a mean).

The following code will produce a table of algorithm scores for hit rate, nDCG and MRR, assuming that your algorithm identifier is in a column named algo and the target list length is in N:

rla = RecListAnalysis()
rla.add_metric(topn.hit, k=N)
rla.add_metric(topn.ndcg, k=N)
rla.add_metric(topn.recip_rank, k=N)
scores = rla.compute(recs, test, include_missing=True)
# empty lists will have na scores
scores.fillna(0, inplace=True)
# group by agorithm
algo_scores = scores.groupby('algorithm')[['hit', 'ndcg', 'recip_rank']].mean()
algo_scores = algo_scores.rename(columns={
    'hit': 'HR',
    'ndcg': 'nDCG',
    'recip_rank': 'MRR'
})

You can then use pandas.DataFrame.to_latex() to convert algo_scores to a LaTeX table to include in your paper.

Citing LensKit

Finally, cite [LKPY] as the package used for producing and/or evaluating recommendations.

[LKPY]

Michael D. Ekstrand. 2020. LensKit for Python: Next-Generation Software for Recommender Systems Experiments. In <cite>Proceedings of the 29th ACM International Conference on Information and Knowledge Management</cite> (CIKM ‘20). DOI:10.1145/3340531.3412778. arXiv:1809.03125 [cs.IR].

@INPROCEEDINGS{lkpy,
title           = "{LensKit} for {Python}: Next-Generation Software for
                    Recommender System Experiments",
booktitle       = "Proceedings of the 29th {ACM} International Conference on
                    Information and Knowledge Management",
author          = "Ekstrand, Michael D.",
year            =  2020,
url             = "http://dx.doi.org/10.1145/3340531.3412778",
conference      = "CIKM '20",
doi             = "10.1145/3340531.3412778"
pages           = "2999--3006"
}