TensorFlow for LensKit

This package provides algorithm implementations, particularly matrix factorization, using TensorFlow. These algorithms serve two purposes:

  • Provide classic algorithms ready to use for recommendation or as baselines for new techniques.

  • Demonstrate how to connect TensorFlow to LensKit for use in your own experiments.

To install:

pip install lenskit-tf

Or (preferred, once published):

conda install -c conda-forge lenskit-tf

Warning

These implementations are not yet battle-tested — they are here primarily for demonstration purposes at this time.

Biased MF

These models implement the standard biased matrix factorization model, like lenskit.algorithms.als.BiasedMF, but learn the model parameters using TensorFlow’s gradient descent instead of the alternating least squares algorithm.

Bias-Based

class lenskit_tf.BiasedMF(features=50, *, bias=True, damping=5, epochs=5, batch_size=10000, reg=0.02, rng_spec=None)

Bases: lenskit.algorithms.mf_common.MFPredictor

Biased matrix factorization model for explicit feedback, optimized with TensorFlow.

This is a basic TensorFlow implementation of the biased matrix factorization model for rating prediction:

\[s(i|u) = b + b_u + b_i + \vec{p}_u \cdot \vec{q_i}\]

User and item embedding matrices are regularized with \(L_2\) regularization, governed by a regularization term \(\lambda\). Regularizations for the user and item embeddings are then computed as follows:

\[\begin{split}\lambda_u = \lambda / |U| \\ \lambda_i = \lambda / |I| \\\end{split}\]

This rescaling allows the regularization term to be independent of the number of users and items.

Because the model is very simple, this algorithm works best with large batch sizes.

This implementation uses lenskit.algorithms.bias.Bias for computing the biases, and uses TensorFlow to fit a matrix factorization on the residuals. It then extracts the resulting matrices, and relies on MFPredictor to implement the prediction logic, like lenskit.algorithms.als.BiasedMF. Its code is suitable as an example of how to build a Keras/TensorFlow algorithm implementation for LensKit where TF is only used in the train stage.

A variety of resources informed the design, most notably this one.

Parameters
  • features (int) – The number of latent features to learn.

  • bias – The bias model to use.

  • damping – The bias damping, if bias is True.

  • epochs (int) – The number of epochs to train.

  • batch_size (int) – The Keras batch size.

  • reg (double) – The regularization term \(\lambda\) used to derive embedding vector regularizations.

  • rng_spec – The random number generator initialization.

fit(ratings, **kwargs)

Train a model using the specified ratings (or similar) data.

Parameters
  • ratings (pandas.DataFrame) – The ratings data.

  • kwargs – Additional training data the algorithm may require. Algorithms should avoid using the same keyword arguments for different purposes, so that they can be more easily hybridized.

Returns

The algorithm object.

predict_for_user(user, items, ratings=None)

Compute predictions for a user and items.

Parameters
  • user – the user ID

  • items (array-like) – the items to predict

  • ratings (pandas.Series) – the user’s ratings (indexed by item id); if provided, they may be used to override or augment the model’s notion of a user’s preferences.

Returns

scores for the items, indexed by item id.

Return type

pandas.Series

Fully Integrated

class lenskit_tf.IntegratedBiasMF(features=50, *, epochs=5, batch_size=10000, reg=0.02, bias_reg=0.2, rng_spec=None)

Bases: lenskit.algorithms.Predictor

Biased matrix factorization model for explicit feedback, optimizing both bias and embeddings with TensorFlow.

This is a basic TensorFlow implementation of the biased matrix factorization model for rating prediction:

\[s(i|u) = b + b_u + b_i + \vec{p}_u \cdot \vec{q_i}\]

User and item embedding matrices are regularized with \(L_2\) regularization, governed by a regularization term \(\lambda\). Regularizations for the user and item embeddings are then computed as follows:

\[\begin{split}\lambda_u = \lambda / |U| \\ \lambda_i = \lambda / |I| \\\end{split}\]

This rescaling allows the regularization term to be independent of the number of users and items. The same rescaling applies to the bias regularization.

Because the model is very simple, this algorithm works best with large batch sizes.

This implementation uses TensorFlow to fit the entire model, including user/item biases and residuals, and uses TensorFlow to do the final predictions as well. Its code is suitable as an example of how to build a Keras/TensorFlow algorithm implementation for LensKit where TF used for the entire process.

A variety of resources informed the design, most notably this one and `Chin-chi Hsu's example code`_.

Parameters
  • features (int) – The number of latent features to learn.

  • epochs (int) – The number of epochs to train.

  • batch_size (int) – The Keras batch size.

  • reg (double) – The regularization term for the embedding vectors.

  • bias_reg (double) – The regularization term for the bias vectors.

  • rng_spec – The random number generator initialization.

model

The Keras model.

fit(ratings, **kwargs)

Train a model using the specified ratings (or similar) data.

Parameters
  • ratings (pandas.DataFrame) – The ratings data.

  • kwargs – Additional training data the algorithm may require. Algorithms should avoid using the same keyword arguments for different purposes, so that they can be more easily hybridized.

Returns

The algorithm object.

predict_for_user(user, items, ratings=None)

Compute predictions for a user and items.

Parameters
  • user – the user ID

  • items (array-like) – the items to predict

  • ratings (pandas.Series) – the user’s ratings (indexed by item id); if provided, they may be used to override or augment the model’s notion of a user’s preferences.

Returns

scores for the items, indexed by item id.

Return type

pandas.Series

Bayesian Personalized Rating

class lenskit_tf.BPR(features=50, *, epochs=5, batch_size=10000, reg=0.02, neg_count=1, neg_weight=True, rng_spec=None)

Bases: lenskit.algorithms.Predictor

Bayesian Personalized Ranking with matrix factorization, optimized with TensorFlow.

This is a basic TensorFlow implementation of the BPR algorithm _[BPR].

User and item embedding matrices are regularized with \(L_2\) regularization, governed by a regularization term \(\lambda\). Regularizations for the user and item embeddings are then computed as follows:

\[\begin{split}\lambda_u = \lambda / |U| \\ \lambda_i = \lambda / |I| \\\end{split}\]

This rescaling allows the regularization term to be independent of the number of users and items.

Because the model is relatively simple, optimization works best with large batch sizes.

Parameters
  • features (int) – The number of latent features to learn.

  • epochs (int) – The number of epochs to train.

  • batch_size (int) – The Keras batch size. This is the number of positive examples to sample in each batch. If neg_count is greater than 1, the batch size will be similarly multipled.

  • reg (double) – The regularization term for the embedding vectors.

  • neg_count (int) – The number of negative examples to sample for each positive one.

  • neg_weight (bool) – Whether to weight negative sampling by popularity (True) or not.

  • rng_spec – The random number generator initialization.

model

The Keras model.

fit(ratings, **kwargs)

Train a model using the specified ratings (or similar) data.

Parameters
  • ratings (pandas.DataFrame) – The ratings data.

  • kwargs – Additional training data the algorithm may require. Algorithms should avoid using the same keyword arguments for different purposes, so that they can be more easily hybridized.

Returns

The algorithm object.

predict_for_user(user, items, ratings=None)

Compute predictions for a user and items.

Parameters
  • user – the user ID

  • items (array-like) – the items to predict

  • ratings (pandas.Series) – the user’s ratings (indexed by item id); if provided, they may be used to override or augment the model’s notion of a user’s preferences.

Returns

scores for the items, indexed by item id.

Return type

pandas.Series