SVD++#

class libreco.algorithms.SVDpp(task, data_info, loss_type='cross_entropy', embed_size=16, n_epochs=20, lr=0.001, lr_decay=False, epsilon=1e-05, reg=None, batch_size=256, num_neg=1, seed=42, recent_num=30, lower_upper_bound=None, tf_sess_config=None)[source]#

Bases: EmbedBase

SVD++ algorithm.

Parameters:
  • task ({'rating', 'ranking'}) – Recommendation task. See Task.

  • data_info (DataInfo object) – Object that contains useful information for training and inference.

  • loss_type ({'cross_entropy', 'focal'}, default: 'cross_entropy') – Loss for model training.

  • embed_size (int, default: 16) – Vector size of embeddings.

  • n_epochs (int, default: 10) – Number of epochs for training.

  • lr (float, default 0.001) – Learning rate for training.

  • lr_decay (bool, default: False) – Whether to use learning rate decay.

  • epsilon (float, default: 1e-5) – A small constant added to the denominator to improve numerical stability in Adam optimizer. According to the official comment, default value of 1e-8 for epsilon is generally not good, so here we choose 1e-5. Users can try tuning this hyperparameter if the training is unstable.

  • reg (float or None, default: None) – Regularization parameter, must be non-negative or None.

  • batch_size (int, default: 256) – Batch size for training.

  • num_neg (int, default: 1) – Number of negative samples for each positive sample, only used in ranking task.

  • seed (int, default: 42) – Random seed.

  • lower_upper_bound (tuple or None, default: None) – Lower and upper score bound for rating task.

  • tf_sess_config (dict or None, default: None) – Optional TensorFlow session config, see ConfigProto options.

fit(train_data, verbose=1, shuffle=True, eval_data=None, metrics=None, **kwargs)[source]#

Fit embed model on the training data.

Parameters:
  • train_data (TransformedSet object) – Data object used for training.

  • verbose (int, default: 1) – Print verbosity. If eval_data is provided, setting it to higher than 1 will print evaluation metrics during training.

  • shuffle (bool, default: True) – Whether to shuffle the training data.

  • eval_data (TransformedSet object, default: None) – Data object used for evaluating.

  • metrics (list or None, default: None) – List of metrics for evaluating.

  • k (int, default: 10) – Parameter of metrics, e.g. recall at k, ndcg at k

  • eval_batch_size (int, default: 8192) – Batch size for evaluating.

  • eval_user_num (int or None, default: None) – Number of users for evaluating. Setting it to a positive number will sample users randomly from eval data.

Raises:

RuntimeError – If fit() is called from a loaded model(load()).

get_item_embedding(item=None)#

Get item embedding(s) from the model.

Parameters:

item (int or str or None) – Query item id. If it is None, all item embeddings will be returned.

Returns:

item_embedding – Returned item embeddings.

Return type:

numpy.ndarray

Raises:
  • ValueError – If the item does not appear in the training data.

  • AssertionError – If the model has not been trained.

get_user_embedding(user=None)#

Get user embedding(s) from the model.

Parameters:

user (int or str or None) – Query user id. If it is None, all user embeddings will be returned.

Returns:

user_embedding – Returned user embeddings.

Return type:

numpy.ndarray

Raises:
  • ValueError – If the user does not appear in the training data.

  • AssertionError – If the model has not been trained.

init_knn(approximate, sim_type, M=100, ef_construction=200, ef_search=200)#

Initialize k-nearest-search model.

Parameters:
  • approximate (bool) – Whether to use approximate nearest neighbor search. If it is True, nmslib must be installed. The HNSW method in nmslib is used.

  • sim_type ({'cosine', 'inner-product'}) – Similarity space type.

  • M (int, default: 100) – Parameter in HNSW, refer to nmslib doc.

  • ef_construction (int, default: 200) –

    Parameter in HNSW, refer to nmslib doc.

  • ef_search (int, default: 200) –

    Parameter in HNSW, refer to nmslib doc.

Raises:
  • ValueError – If sim_type is not one of (‘cosine’, ‘inner-product’).

  • ModuleNotFoundError – If approximate=True and nmslib is not installed.

classmethod load(path, model_name, data_info, **kwargs)#

Load saved embed model for inference.

Parameters:
  • path (str) – File folder path to save model.

  • model_name (str) – Name of the saved model file.

  • data_info (DataInfo object) – Object that contains some useful information.

Returns:

model – Loaded embed model.

Return type:

type(cls)

See also

save

predict(user, item, cold_start='average', inner_id=False)#

Make prediction(s) on given user(s) and item(s).

Parameters:
  • user (int or str or array_like) – User id or batch of user ids.

  • item (int or str or array_like) – Item id or batch of item ids.

  • cold_start ({'popular', 'average'}, default: 'average') –

    Cold start strategy.

    • ’popular’ will sample from popular items.

    • ’average’ will use the average of all the user/item embeddings as the representation of the cold-start user/item.

  • inner_id (bool, default: False) – Whether to use inner_id defined in libreco. For library users inner_id may never be used.

Returns:

prediction – Predicted scores for each user-item pair.

Return type:

float or numpy.ndarray

recommend_user(user, n_rec, cold_start='average', inner_id=False, filter_consumed=True, random_rec=False)#

Recommend a list of items for given user(s).

Parameters:
  • user (int or str or array_like) – User id or batch of user ids to recommend.

  • n_rec (int) – Number of recommendations to return.

  • cold_start ({'popular', 'average'}, default: 'average') –

    Cold start strategy.

    • ’popular’ will sample from popular items.

    • ’average’ will use the average of all the user/item embeddings as the representation of the cold-start user/item.

  • inner_id (bool, default: False) – Whether to use inner_id defined in libreco. For library users inner_id may never be used.

  • filter_consumed (bool, default: True) – Whether to filter out items that a user has previously consumed.

  • random_rec (bool, default: False) – Whether to choose items for recommendation based on their prediction scores.

Returns:

recommendation – Recommendation result with user ids as keys and array_like recommended items as values.

Return type:

dict of {Union[int, str, array_like] : numpy.ndarray}

save(path, model_name, inference_only=False, **kwargs)#

Save embed model for inference or retraining.

Parameters:
  • path (str) – File folder path to save model.

  • model_name (str) – Name of the saved model file.

  • inference_only (bool, default: False) – Whether to save model only for inference. If it is True, only embeddings will be saved. Otherwise, model variables will be saved.

See also

load

search_knn_items(item, k)#

Search most similar k items.

Parameters:
  • item (int or str) – Query item id.

  • k (int) – Number of similar items.

Returns:

similar items – A list of k similar items.

Return type:

list

search_knn_users(user, k)#

Search most similar k users.

Parameters:
  • user (int or str) – Query user id.

  • k (int) – Number of similar users.

Returns:

similar users – A list of k similar users.

Return type:

list