Classic Matrix Factorization
LKPY provides classical matrix factorization implementations.
Common Support
The mf_common
module contains common support code for matrix factorization
algorithms. This class, MFPredictor
,
defines the parameters that are estimated during the Algorithm.fit()
process on common matrix factorization algorithms.
- class lenskit.algorithms.mf_common.MFPredictor
Bases:
Predictor
Common predictor for matrix factorization.
- user_index_
Users in the model (length=:math:m).
- Type:
- item_index_
Items in the model (length=:math:n).
- Type:
- user_features_
The \(m \times k\) user-feature matrix.
- Type:
- item_features_
The \(n \times k\) item-feature matrix.
- Type:
- property n_features
The number of features.
- property n_users
The number of users.
- property n_items
The number of items.
- lookup_user(user)
Look up the index for a user.
- Parameters:
user – the user ID to look up
- Returns:
the user index.
- Return type:
- lookup_items(items)
Look up the indices for a set of items.
- Parameters:
items (array-like) – the item IDs to look up.
- Returns:
the item indices. Unknown items will have negative indices.
- Return type:
- score(user, items, u_features=None)
Score a set of items for a user. User and item parameters must be indices into the matrices.
- Parameters:
- Returns:
the scores for the items.
- Return type:
Alternating Least Squares
LensKit provides alternating least squares implementations of matrix factorization suitable for explicit feedback data. These implementations are parallelized with Numba, and perform best with the MKL from Conda.
- class lenskit.algorithms.als.BiasedMF(features, *, epochs=10, reg=0.1, damping=5, bias=True, rng_spec=None, save_user_features=True)
Bases:
ALSBase
Biased matrix factorization trained with alternating least squares []. This is a prediction-oriented algorithm suitable for explicit feedback data, using the alternating least squares approach to compute \(P\) and \(Q\) to minimize the regularized squared reconstruction error of the ratings matrix.
It provides two solvers for the optimization step (the method parameter):
'cd'
(the default)Coordinate descent [], adapted for a separately-trained bias model and to use weighted regularization as in the original ALS paper [].
'cholesky'
The original ALS [], using Cholesky decomposition to solve for the optimized matrices.
'lu'
:Deprecated alias for
'cholskey'
See the base class
MFPredictor
for documentation on the estimated parameters you can extract from a trained model.- Parameters:
features (int) – the number of features to train
epochs (int) – the number of iterations to train
reg (float | tuple[float, float]) – the regularization factor; can also be a tuple
(ureg, ireg)
to specify separate user and item regularization terms.damping (float) – damping factor for the underlying bias.
bias (Bias | None) – the bias model. If
True
, fits aBias
with dampingdamping
.rng_spec (Optional[SeedLike]) – Random number generator or state (see
seedbank.numpy_rng()
).progress – a
tqdm.tqdm()
-compatible progress bar functionsave_user_features (bool)
- property logger
Overridden in implementation to provide the logger.
- prepare_data(ratings)
Prepare data for training this model. This takes in the ratings, and is supposed to do two things:
Normalize or transform the rating/interaction data, as needed, for training.
Store any parameters learned from the normalization (e.g. means) in the appropriate member variables.
Return the training data object to use for model training.
- Parameters:
ratings (DataFrame)
- initial_params(nrows, ncols)
Compute initial parameter values of the specified shape.
- als_half_epoch(epoch, context)
Run one half of an ALS training epoch.
- Parameters:
epoch (int)
context (TrainContext)
- new_user_embedding(user, ratings)
Generate an embedding for a user given their current ratings.
- class lenskit.algorithms.als.ImplicitMF(features, *, epochs=20, reg=0.1, weight=40, use_ratings=False, rng_spec=None, save_user_features=True)
Bases:
ALSBase
Implicit matrix factorization trained with alternating least squares []. This algorithm outputs ‘predictions’, but they are not on a meaningful scale. If its input data contains
rating
values, these will be used as the ‘confidence’ values; otherwise, confidence will be 1 for every rated item.See the base class
MFPredictor
for documentation on the estimated parameters you can extract from a trained model.With weight \(w\), this function decomposes the matrix \(\mathbb{1}^* + Rw\), where \(\mathbb{1}^*\) is an \(m \times n\) matrix of all 1s.
Changed in version 2024.1:
ImplicitMF
no longer supports multiple training methods. It always uses Cholesky decomposition now.Changed in version 0.14: By default,
ImplicitMF
ignores arating
column if one is present in the training data. This can be changed through theuse_ratings
option.Changed in version 0.13: In versions prior to 0.13,
ImplicitMF
used the rating column if it was present. In 0.13, we added an option to control whether or not the rating column is used; it initially defaulted toTrue
, but with a warning. In 0.14 it defaults toFalse
.- Parameters:
features (int) – The number of features to train
epochs (int) – The number of iterations to train
reg (float | tuple[float, float]) – The regularization factor
weight (float) – The scaling weight for positive samples (\(\alpha\) in []).
use_ratings (bool) – Whether to use the rating column, if present. Defaults to
False
; whenTrue
, the values from therating
column are used, and multipled byweight
; ifFalse
, ImplicitMF treats every rated user-item pair as having a rating of 1.rng_spec (Optional[SeedLike]) – Random number generator or state (see
lenskit.util.random.rng()
).progress – a
tqdm.tqdm()
-compatible progress bar functionsave_user_features (bool)
- property logger
Overridden in implementation to provide the logger.
- fit(ratings, **kwargs)
Run ALS to train a model.
- Parameters:
ratings – the ratings data frame.
- Returns:
The algorithm (for chaining).
- prepare_data(ratings)
Prepare data for training this model. This takes in the ratings, and is supposed to do two things:
Normalize or transform the rating/interaction data, as needed, for training.
Store any parameters learned from the normalization (e.g. means) in the appropriate member variables.
Return the training data object to use for model training.
- Parameters:
ratings (DataFrame)
- Return type:
TrainingData
- initial_params(nrows, ncols)
Compute initial parameter values of the specified shape.
- als_half_epoch(epoch, context)
Run one half of an ALS training epoch.
SciKit SVD
This code implements a traditional SVD using scikit-learn. It requires scikit-learn
to
be installed in order to function.
- class lenskit.algorithms.svd.BiasedSVD(features, *, damping=5, bias=True, algorithm='randomized')
Bases:
Predictor
Biased matrix factorization for implicit feedback using SciKit-Learn’s SVD solver (
sklearn.decomposition.TruncatedSVD
). It operates by first computing the bias, then computing the SVD of the bias residuals.You’ll generally want one of the iterative SVD implementations such as
lennskit.algorithms.als.BiasedMF
; this is here primarily as an example and for cases where you want to evaluate a pure SVD implementation.- fit(ratings, **kwargs)
Train a model using the specified ratings (or similar) data.
- Parameters:
ratings (pandas.DataFrame) – The ratings data.
kwargs – Additional training data the algorithm may require. Algorithms should avoid using the same keyword arguments for different purposes, so that they can be more easily hybridized.
- Returns:
The algorithm object.
- predict_for_user(user, items, ratings=None)
Compute predictions for a user and items.
- Parameters:
user – the user ID
items (array-like) – the items to predict
ratings (pandas.Series) – the user’s ratings (indexed by item id); if provided, they may be used to override or augment the model’s notion of a user’s preferences.
- Returns:
scores for the items, indexed by item id.
- Return type:
- get_params(deep=True)
Get the parameters for this algorithm (as in scikit-learn). Algorithm parameters should match constructor argument names.
The default implementation returns all attributes that match a constructor parameter name. It should be compatible with
sklearn.base.BaseEstimator.get_params()
method so that LensKit alogrithms can be cloned withsklearn.base.clone()
as well aslenskit.util.clone()
.- Returns:
the algorithm parameters.
- Return type:
FunkSVD
FunkSVD is an SVD-like matrix factorization that uses stochastic gradient descent, configured much like coordinate descent, to train the user-feature and item-feature matrices. We generally don’t recommend using it in new applications or experiments; the ALS-based algorithms are less sensitive to hyperparameters, and the TensorFlow algorithms provide more optimized gradient descent training of the same prediction model.
- class lenskit.algorithms.funksvd.FunkSVD(features, iterations=100, *, lrate=0.001, reg=0.015, damping=5, range=None, bias=True, random_state=None)
Bases:
MFPredictor
Algorithm class implementing FunkSVD matrix factorization. FunkSVD is a regularized biased matrix factorization technique trained with featurewise stochastic gradient descent.
See the base class
MFPredictor
for documentation on the estimated parameters you can extract from a trained model.- Parameters:
features (int) – the number of features to train
iterations (int) – the number of iterations to train each feature
lrate (double) – the learning rate
reg (double) – the regularization factor
damping (double) – damping factor for the underlying mean
bias (Predictor) – the underlying bias model to fit. If
True
, then abias.Bias
model is fit withdamping
.range (tuple) – the
(min, max)
rating values to clamp ratings, orNone
to leave predictions unclamped.random_state – The random state for shuffling the data prior to training.
- fit(ratings, **kwargs)
Train a FunkSVD model.
- Parameters:
ratings – the ratings data frame.
- predict_for_user(user, items, ratings=None)
Compute predictions for a user and items.
- Parameters:
user – the user ID
items (array-like) – the items to predict
ratings (pandas.Series) – the user’s ratings (indexed by item id); if provided, they may be used to override or augment the model’s notion of a user’s preferences.
- Returns:
scores for the items, indexed by item id.
- Return type: