Classic Matrix Factorization

LKPY provides classical matrix factorization implementations.

Common Support

The mf_common module contains common support code for matrix factorization algorithms. This class, MFPredictor, defines the parameters that are estimated during the Algorithm.fit() process on common matrix factorization algorithms.

class lenskit.algorithms.mf_common.MFPredictor

Bases: lenskit.algorithms.Predictor

Common predictor for matrix factorization.

user_index_

Users in the model (length=:math:m).

Type

pandas.Index

item_index_

Items in the model (length=:math:n).

Type

pandas.Index

user_features_

The \(m \times k\) user-feature matrix.

Type

numpy.ndarray

item_features_

The \(n \times k\) item-feature matrix.

Type

numpy.ndarray

property n_features

The number of features.

property n_users

The number of users.

property n_items

The number of items.

lookup_user(user)

Look up the index for a user.

Parameters

user – the user ID to look up

Returns

the user index.

Return type

int

lookup_items(items)

Look up the indices for a set of items.

Parameters

items (array-like) – the item IDs to look up.

Returns

the item indices. Unknown items will have negative indices.

Return type

numpy.ndarray

score(user, items, u_features=None)

Score a set of items for a user. User and item parameters must be indices into the matrices.

Parameters
  • user (int) – the user index

  • items (array-like of int) – the item indices

  • raw (bool) – if True, do return raw scores without biases added back.

Returns

the scores for the items.

Return type

numpy.ndarray

Alternating Least Squares

LensKit provides alternating least squares implementations of matrix factorization suitable for explicit feedback data. These implementations are parallelized with Numba, and perform best with the MKL from Conda.

class lenskit.algorithms.als.BiasedMF(features, *, iterations=20, reg=0.1, damping=5, bias=True, method='cd', rng_spec=None, progress=None, save_user_features=True)

Bases: lenskit.algorithms.mf_common.MFPredictor

Biased matrix factorization trained with alternating least squares [ZWSP08]. This is a prediction-oriented algorithm suitable for explicit feedback data, using the alternating least squares approach to compute \(P\) and \(Q\) to minimize the regularized squared reconstruction error of the ratings matrix.

It provides two solvers for the optimization step (the method parameter):

'cd' (the default)

Coordinate descent [TakacsPilaszyT11], adapted for a separately-trained bias model and to use weighted regularization as in the original ALS paper [ZWSP08].

'lu'

A direct implementation of the original ALS [ZWSP08] using LU-decomposition to solve for the optimized matrices.

See the base class MFPredictor for documentation on the estimated parameters you can extract from a trained model.

Parameters
  • features (int) – the number of features to train

  • iterations (int) – the number of iterations to train

  • reg (float) – the regularization factor; can also be a tuple (ureg, ireg) to specify separate user and item regularization terms.

  • damping (float) – damping factor for the underlying bias.

  • bias (bool or Bias) – the bias model. If True, fits a Bias with damping damping.

  • method (str) – the solver to use (see above).

  • rng_spec – Random number generator or state (see lenskit.util.random.rng()).

  • progress – a tqdm.tqdm()-compatible progress bar function

fit(ratings, **kwargs)

Run ALS to train a model.

Parameters

ratings – the ratings data frame.

Returns

The algorithm (for chaining).

fit_iters(ratings, **kwargs)

Run ALS to train a model, returning each iteration as a generator.

Parameters

ratings – the ratings data frame.

Returns

The algorithm (for chaining).

predict_for_user(user, items, ratings=None)

Compute predictions for a user and items.

Parameters
  • user – the user ID

  • items (array-like) – the items to predict

  • ratings (pandas.Series) – the user’s ratings (indexed by item id); if provided, they may be used to override or augment the model’s notion of a user’s preferences.

Returns

scores for the items, indexed by item id.

Return type

pandas.Series

class lenskit.algorithms.als.ImplicitMF(features, *, iterations=20, reg=0.1, weight=40, use_ratings=False, method='cg', rng_spec=None, progress=None, save_user_features=True)

Bases: lenskit.algorithms.mf_common.MFPredictor

Implicit matrix factorization trained with alternating least squares [HKV08]. This algorithm outputs ‘predictions’, but they are not on a meaningful scale. If its input data contains rating values, these will be used as the ‘confidence’ values; otherwise, confidence will be 1 for every rated item.

See the base class MFPredictor for documentation on the estimated parameters you can extract from a trained model.

With weight \(w\), this function decomposes the matrix \(\mathbb{1}^* + Rw\), where \(\mathbb{1}^*\) is an \(m \times n\) matrix of all 1s.

Changed in version 0.14: By default, ImplicitMF ignores a rating column if one is present in the training data. This can be changed through the use_ratings option.

Changed in version 0.13: In versions prior to 0.13, ImplicitMF used the rating column if it was present. In 0.13, we added an option to control whether or not the rating column is used; it initially defaulted to True, but with a warning. In 0.14 it defaults to False.

Parameters
  • features (int) – the number of features to train

  • iterations (int) – the number of iterations to train

  • reg (float) – the regularization factor

  • weight (float) – the scaling weight for positive samples (\(\alpha\) in [HKV08]).

  • use_ratings (bool) – Whether to use the rating column, if present. Defaults to False; when True, the values from the rating column are used, and multipled by weight; if False, ImplicitMF treats every rated user-item pair as having a rating of 1.

  • method (str) –

    the training method.

    'cg' (the default)

    Conjugate gradient method [TakacsPilaszyT11].

    'lu'

    A direct implementation of the original implicit-feedback ALS concept [HKV08] using LU-decomposition to solve for the optimized matrices.

  • rng_spec – Random number generator or state (see lenskit.util.random.rng()).

  • progress – a tqdm.tqdm()-compatible progress bar function

fit(ratings, **kwargs)

Train a model using the specified ratings (or similar) data.

Parameters
  • ratings (pandas.DataFrame) – The ratings data.

  • kwargs – Additional training data the algorithm may require. Algorithms should avoid using the same keyword arguments for different purposes, so that they can be more easily hybridized.

Returns

The algorithm object.

predict_for_user(user, items, ratings=None)

Compute predictions for a user and items.

Parameters
  • user – the user ID

  • items (array-like) – the items to predict

  • ratings (pandas.Series) – the user’s ratings (indexed by item id); if provided, they may be used to override or augment the model’s notion of a user’s preferences.

Returns

scores for the items, indexed by item id.

Return type

pandas.Series

SciKit SVD

This code implements a traditional SVD using scikit-learn. It requires scikit-learn to be installed in order to function.

class lenskit.algorithms.svd.BiasedSVD(features, *, damping=5, bias=True, algorithm='randomized')

Bases: lenskit.algorithms.Predictor

Biased matrix factorization for implicit feedback using SciKit-Learn’s SVD solver (sklearn.decomposition.TruncatedSVD). It operates by first computing the bias, then computing the SVD of the bias residuals.

You’ll generally want one of the iterative SVD implementations such as lennskit.algorithms.als.BiasedMF; this is here primarily as an example and for cases where you want to evaluate a pure SVD implementation.

fit(ratings, **kwargs)

Train a model using the specified ratings (or similar) data.

Parameters
  • ratings (pandas.DataFrame) – The ratings data.

  • kwargs – Additional training data the algorithm may require. Algorithms should avoid using the same keyword arguments for different purposes, so that they can be more easily hybridized.

Returns

The algorithm object.

predict_for_user(user, items, ratings=None)

Compute predictions for a user and items.

Parameters
  • user – the user ID

  • items (array-like) – the items to predict

  • ratings (pandas.Series) – the user’s ratings (indexed by item id); if provided, they may be used to override or augment the model’s notion of a user’s preferences.

Returns

scores for the items, indexed by item id.

Return type

pandas.Series

get_params(deep=True)

Get the parameters for this algorithm (as in scikit-learn). Algorithm parameters should match constructor argument names.

The default implementation returns all attributes that match a constructor parameter name. It should be compatible with sklearn.base.BaseEstimator.get_params() method so that LensKit alogrithms can be cloned with sklearn.base.clone() as well as lenskit.util.clone().

Returns

the algorithm parameters.

Return type

dict

FunkSVD

FunkSVD is an SVD-like matrix factorization that uses stochastic gradient descent, configured much like coordinate descent, to train the user-feature and item-feature matrices. We generally don’t recommend using it in new applications or experiments; the ALS-based algorithms are less sensitive to hyperparameters, and the TensorFlow algorithms provide more optimized gradient descent training of the same prediction model.

class lenskit.algorithms.funksvd.FunkSVD(features, iterations=100, *, lrate=0.001, reg=0.015, damping=5, range=None, bias=True, random_state=None)

Bases: lenskit.algorithms.mf_common.MFPredictor

Algorithm class implementing FunkSVD matrix factorization. FunkSVD is a regularized biased matrix factorization technique trained with featurewise stochastic gradient descent.

See the base class MFPredictor for documentation on the estimated parameters you can extract from a trained model.

Parameters
  • features (int) – the number of features to train

  • iterations (int) – the number of iterations to train each feature

  • lrate (double) – the learning rate

  • reg (double) – the regularization factor

  • damping (double) – damping factor for the underlying mean

  • bias (Predictor) – the underlying bias model to fit. If True, then a bias.Bias model is fit with damping.

  • range (tuple) – the (min, max) rating values to clamp ratings, or None to leave predictions unclamped.

  • random_state – The random state for shuffling the data prior to training.

fit(ratings, **kwargs)

Train a FunkSVD model.

Parameters

ratings – the ratings data frame.

predict_for_user(user, items, ratings=None)

Compute predictions for a user and items.

Parameters
  • user – the user ID

  • items (array-like) – the items to predict

  • ratings (pandas.Series) – the user’s ratings (indexed by item id); if provided, they may be used to override or augment the model’s notion of a user’s preferences.

Returns

scores for the items, indexed by item id.

Return type

pandas.Series