BPR#

class libreco.algorithms.BPR(task='ranking', data_info=None, loss_type='bpr', embed_size=16, norm_embed=False, n_epochs=20, lr=0.001, lr_decay=False, epsilon=1e-05, reg=None, batch_size=256, sampler='random', num_neg=1, use_tf=True, seed=42, lower_upper_bound=None, tf_sess_config=None, optimizer='adam', num_threads=1)[source]#

Bases: EmbedBase

Bayesian Personalized Ranking algorithm.

BPR is implemented in both TensorFlow and Cython.

Caution

BPR can only be used in ranking task.
BPR can only use bpr loss in loss_type.

Parameters:

task ({'ranking'}) – Recommendation task. See Task.
data_info (DataInfo object) – Object that contains useful information for training and inference.
loss_type ({'bpr'}) – Loss for model training.
embed_size (int, default: 16) – Vector size of embeddings.
norm_embed (bool, default: False) – Whether to l2 normalize output embeddings.
n_epochs (int, default: 10) – Number of epochs for training.
lr (float, default 0.001) – Learning rate for training.
lr_decay (bool, default: False) – Whether to use learning rate decay.
epsilon (float, default: 1e-5) – A small constant added to the denominator to improve numerical stability in Adam optimizer. According to the official comment, default value of 1e-8 for epsilon is generally not good, so here we choose 1e-5. Users can try tuning this hyperparameter if the training is unstable.
reg (float or None, default: None) – Regularization parameter, must be non-negative or None.
batch_size (int, default: 256) – Batch size for training.
sampler ({'random', 'unconsumed', 'popular'}, default: 'random') –
Negative sampling strategy.
- 'random' means random sampling.
- 'unconsumed' samples items that the target user did not consume before.
- 'popular' has a higher probability to sample popular items as negative samples.
New in version 1.1.0.
num_neg (int, default: 1) – Number of negative samples for each positive sample, only used in ranking task.
use_tf (bool, default: True) – Whether to use TensorFlow or Cython version. The TensorFlow version is more accurate, whereas the Cython version is faster.
seed (int, default: 42) – Random seed.
lower_upper_bound (tuple or None, default: None) – Lower and upper score bound for rating task.
tf_sess_config (dict or None, default: None) – Optional TensorFlow session config, see ConfigProto options.
optimizer ({'sgd', 'momentum', 'adam'}, default: 'adam') – Optimizer used in Cython version.
num_threads (int, default: 1) – Number of threads used in Cython version.

References

Steffen Rendle et al. BPR: Bayesian Personalized Ranking from Implicit Feedback.

fit(train_data, neg_sampling, verbose=1, shuffle=True, eval_data=None, metrics=None, k=10, eval_batch_size=8192, eval_user_num=None, num_workers=0)[source]#

Fit BPR model on the training data.

Parameters:

train_data (TransformedSet object) – Data object used for training.
neg_sampling (bool) –
Whether to perform negative sampling for training or evaluating data.

New in version 1.1.0.
verbose (int, default: 1) – Print verbosity. If eval_data is provided, setting it to higher than 1 will print evaluation metrics during training.
shuffle (bool, default: True) – Whether to shuffle the training data.
eval_data (TransformedSet object, default: None) – Data object used for evaluating.
metrics (list or None, default: None) – List of metrics for evaluating.
k (int, default: 10) – Parameter of metrics, e.g. recall at k, ndcg at k
eval_batch_size (int, default: 8192) – Batch size for evaluating.
eval_user_num (int or None, default: None) – Number of users for evaluating. Setting it to a positive number will sample users randomly from eval data.
num_workers (int, default: 0) – How many subprocesses to use for data loading. 0 means that the data will be loaded in the main process.

get_item_embedding(item=None, include_bias=False)#

Get item embedding(s) from the model.

Parameters:

item (int or str or None, default: None) – Query item id. If it is None, all item embeddings will be returned.
include_bias (bool, default: False) – Whether to include bias term in returned embeddings.

Returns:

item_embedding – Returned item embeddings.

Return type:

numpy.ndarray

Raises:

ValueError – If the item does not appear in the training data.
AssertionError – If the model has not been trained.

get_user_embedding(user=None, include_bias=False)#

Get user embedding(s) from the model.

Parameters:

user (int or str or None, default: None) – Query user id. If it is None, all user embeddings will be returned.
include_bias (bool, default: False) – Whether to include bias term in returned embeddings.

Returns:

user_embedding – Returned user embeddings.

Return type:

numpy.ndarray

Raises:

ValueError – If the user does not appear in the training data.
AssertionError – If the model has not been trained.

init_knn(approximate, sim_type, M=100, ef_construction=200, ef_search=200)#

Initialize k-nearest-search model.

Parameters:

approximate (bool) – Whether to use approximate nearest neighbor search. If it is True, nmslib must be installed. The HNSW method in nmslib is used.
sim_type ({'cosine', 'inner-product'}) – Similarity space type.
M (int, default: 100) – Parameter in HNSW, refer to nmslib doc.
ef_construction (int, default: 200) –
Parameter in HNSW, refer to nmslib doc.
ef_search (int, default: 200) –
Parameter in HNSW, refer to nmslib doc.

Raises:

ValueError – If sim_type is not one of (‘cosine’, ‘inner-product’).
ModuleNotFoundError – If approximate=True and nmslib is not installed.

classmethod load(path, model_name, data_info, **kwargs)#

Load saved embed model for inference.

Parameters:

path (str) – File folder path to save model.
model_name (str) – Name of the saved model file.
data_info (DataInfo object) – Object that contains some useful information.

Returns:

model – Loaded embed model.

Return type:

type(cls)