Classic Matrix Factorization

LKPY provides classical matrix factorization implementations.

Common Support

The mf_common module contains common support code for matrix factorization algorithms. This class, MFPredictor, defines the parameters that are estimated during the Algorithm.fit() process on common matrix factorization algorithms.

class lenskit.algorithms.mf_common.MFPredictor

Bases: Predictor

Common predictor for matrix factorization.

user_index_

Users in the model (length=:math:m).

Type:

pandas.Index

item_index_

Items in the model (length=:math:n).

Type:

pandas.Index

user_features_

The $$m \times k$$ user-feature matrix.

Type:

numpy.ndarray

item_features_

The $$n \times k$$ item-feature matrix.

Type:

numpy.ndarray

property n_features

The number of features.

property n_users

The number of users.

property n_items

The number of items.

lookup_user(user)

Look up the index for a user.

Parameters:

user – the user ID to look up

Returns:

the user index.

Return type:

int

lookup_items(items)

Look up the indices for a set of items.

Parameters:

items (array-like) – the item IDs to look up.

Returns:

the item indices. Unknown items will have negative indices.

Return type:

numpy.ndarray

score(user, items, u_features=None)

Score a set of items for a user. User and item parameters must be indices into the matrices.

Parameters:
• user (int) – the user index

• items (array-like of int) – the item indices

• raw (bool) – if True, do return raw scores without biases added back.

Returns:

the scores for the items.

Return type:

numpy.ndarray

Alternating Least Squares

LensKit provides alternating least squares implementations of matrix factorization suitable for explicit feedback data. These implementations are parallelized with Numba, and perform best with the MKL from Conda.

class lenskit.algorithms.als.BiasedMF(features, *, iterations=20, reg=0.1, damping=5, bias=True, method='cd', rng_spec=None, progress=None, save_user_features=True)

Bases: MFPredictor

Biased matrix factorization trained with alternating least squares [ZWSP08]. This is a prediction-oriented algorithm suitable for explicit feedback data, using the alternating least squares approach to compute $$P$$ and $$Q$$ to minimize the regularized squared reconstruction error of the ratings matrix.

It provides two solvers for the optimization step (the method parameter):

'cd' (the default)

Coordinate descent , adapted for a separately-trained bias model and to use weighted regularization as in the original ALS paper [ZWSP08].

'lu'

A direct implementation of the original ALS [ZWSP08] using LU-decomposition to solve for the optimized matrices.

See the base class MFPredictor for documentation on the estimated parameters you can extract from a trained model.

Parameters:
• features (int) – the number of features to train

• iterations (int) – the number of iterations to train

• reg (float) – the regularization factor; can also be a tuple (ureg, ireg) to specify separate user and item regularization terms.

• damping (float) – damping factor for the underlying bias.

• bias (bool or Bias) – the bias model. If True, fits a Bias with damping damping.

• method (str) – the solver to use (see above).

• rng_spec – Random number generator or state (see lenskit.util.random.rng()).

• progress – a tqdm.tqdm()-compatible progress bar function

fit(ratings, **kwargs)

Run ALS to train a model.

Parameters:

ratings – the ratings data frame.

Returns:

The algorithm (for chaining).

fit_iters(ratings, **kwargs)

Run ALS to train a model, returning each iteration as a generator.

Parameters:

ratings – the ratings data frame.

Returns:

The algorithm (for chaining).

predict_for_user(user, items, ratings=None)

Compute predictions for a user and items.

Parameters:
• user – the user ID

• items (array-like) – the items to predict

• ratings (pandas.Series) – the user’s ratings (indexed by item id); if provided, they may be used to override or augment the model’s notion of a user’s preferences.

Returns:

scores for the items, indexed by item id.

Return type:

pandas.Series

class lenskit.algorithms.als.ImplicitMF(features, *, iterations=20, reg=0.1, weight=40, use_ratings=False, method='cg', rng_spec=None, progress=None, save_user_features=True)

Bases: MFPredictor

Implicit matrix factorization trained with alternating least squares [HKV08]. This algorithm outputs ‘predictions’, but they are not on a meaningful scale. If its input data contains rating values, these will be used as the ‘confidence’ values; otherwise, confidence will be 1 for every rated item.

See the base class MFPredictor for documentation on the estimated parameters you can extract from a trained model.

With weight $$w$$, this function decomposes the matrix $$\mathbb{1}^* + Rw$$, where $$\mathbb{1}^*$$ is an $$m \times n$$ matrix of all 1s.

Changed in version 0.14: By default, ImplicitMF ignores a rating column if one is present in the training data. This can be changed through the use_ratings option.

Changed in version 0.13: In versions prior to 0.13, ImplicitMF used the rating column if it was present. In 0.13, we added an option to control whether or not the rating column is used; it initially defaulted to True, but with a warning. In 0.14 it defaults to False.

Parameters:
• features (int) – the number of features to train

• iterations (int) – the number of iterations to train

• reg (float) – the regularization factor

• weight (float) – the scaling weight for positive samples ($$\alpha$$ in [HKV08]).

• use_ratings (bool) – Whether to use the rating column, if present. Defaults to False; when True, the values from the rating column are used, and multipled by weight; if False, ImplicitMF treats every rated user-item pair as having a rating of 1.

• method (str) –

the training method.

'cg' (the default)

'lu'

A direct implementation of the original implicit-feedback ALS concept [HKV08] using LU-decomposition to solve for the optimized matrices.

• rng_spec – Random number generator or state (see lenskit.util.random.rng()).

• progress – a tqdm.tqdm()-compatible progress bar function

fit(ratings, **kwargs)

Train a model using the specified ratings (or similar) data.

Parameters:
• ratings (pandas.DataFrame) – The ratings data.

• kwargs – Additional training data the algorithm may require. Algorithms should avoid using the same keyword arguments for different purposes, so that they can be more easily hybridized.

Returns:

The algorithm object.

predict_for_user(user, items, ratings=None)

Compute predictions for a user and items.

Parameters:
• user – the user ID

• items (array-like) – the items to predict

• ratings (pandas.Series) – the user’s ratings (indexed by item id); if provided, they may be used to override or augment the model’s notion of a user’s preferences.

Returns:

scores for the items, indexed by item id.

Return type:

pandas.Series

SciKit SVD

This code implements a traditional SVD using scikit-learn. It requires scikit-learn to be installed in order to function.

class lenskit.algorithms.svd.BiasedSVD(features, *, damping=5, bias=True, algorithm='randomized')

Bases: Predictor

Biased matrix factorization for implicit feedback using SciKit-Learn’s SVD solver (sklearn.decomposition.TruncatedSVD). It operates by first computing the bias, then computing the SVD of the bias residuals.

You’ll generally want one of the iterative SVD implementations such as lennskit.algorithms.als.BiasedMF; this is here primarily as an example and for cases where you want to evaluate a pure SVD implementation.

fit(ratings, **kwargs)

Train a model using the specified ratings (or similar) data.

Parameters:
• ratings (pandas.DataFrame) – The ratings data.

• kwargs – Additional training data the algorithm may require. Algorithms should avoid using the same keyword arguments for different purposes, so that they can be more easily hybridized.

Returns:

The algorithm object.

predict_for_user(user, items, ratings=None)

Compute predictions for a user and items.

Parameters:
• user – the user ID

• items (array-like) – the items to predict

• ratings (pandas.Series) – the user’s ratings (indexed by item id); if provided, they may be used to override or augment the model’s notion of a user’s preferences.

Returns:

scores for the items, indexed by item id.

Return type:

pandas.Series

get_params(deep=True)

Get the parameters for this algorithm (as in scikit-learn). Algorithm parameters should match constructor argument names.

The default implementation returns all attributes that match a constructor parameter name. It should be compatible with sklearn.base.BaseEstimator.get_params() method so that LensKit alogrithms can be cloned with sklearn.base.clone() as well as lenskit.util.clone().

Returns:

the algorithm parameters.

Return type:

dict

FunkSVD

FunkSVD is an SVD-like matrix factorization that uses stochastic gradient descent, configured much like coordinate descent, to train the user-feature and item-feature matrices. We generally don’t recommend using it in new applications or experiments; the ALS-based algorithms are less sensitive to hyperparameters, and the TensorFlow algorithms provide more optimized gradient descent training of the same prediction model.

class lenskit.algorithms.funksvd.FunkSVD(features, iterations=100, *, lrate=0.001, reg=0.015, damping=5, range=None, bias=True, random_state=None)

Bases: MFPredictor

Algorithm class implementing FunkSVD matrix factorization. FunkSVD is a regularized biased matrix factorization technique trained with featurewise stochastic gradient descent.

See the base class MFPredictor for documentation on the estimated parameters you can extract from a trained model.

Parameters:
• features (int) – the number of features to train

• iterations (int) – the number of iterations to train each feature

• lrate (double) – the learning rate

• reg (double) – the regularization factor

• damping (double) – damping factor for the underlying mean

• bias (Predictor) – the underlying bias model to fit. If True, then a bias.Bias model is fit with damping.

• range (tuple) – the (min, max) rating values to clamp ratings, or None to leave predictions unclamped.

• random_state – The random state for shuffling the data prior to training.

fit(ratings, **kwargs)

Train a FunkSVD model.

Parameters:

ratings – the ratings data frame.

predict_for_user(user, items, ratings=None)

Compute predictions for a user and items.

Parameters:
• user – the user ID

• items (array-like) – the items to predict

• ratings (pandas.Series) – the user’s ratings (indexed by item id); if provided, they may be used to override or augment the model’s notion of a user’s preferences.

Returns:

scores for the items, indexed by item id.

Return type:

pandas.Series