Base Classes#

class libreco.bases.Base(task, data_info, lower_upper_bound=None)[source]#

Bases: ABC

Base class for all recommendation models.

Parameters:
  • task ({'rating', 'ranking'}) – Recommendation task. See Task.

  • data_info (DataInfo object) – Object that contains useful information for training and inference.

  • lower_upper_bound (list or tuple, default: None) – Lower and upper score bound for rating task.

abstract fit(train_data, **kwargs)[source]#

Fit model on the training data.

Parameters:

train_data (TransformedSet object) – Data object used for training.

abstract predict(user, item, **kwargs)[source]#

Predict score for given user and item.

Parameters:
  • user (int or str or array_like) – User id or batch of user ids.

  • item (int or str or array_like) – Item id or batch of item ids.

Returns:

prediction – Predicted scores for each user-item pair.

Return type:

float or numpy.ndarray

abstract recommend_user(user, n_rec, **kwargs)[source]#

Recommend a list of items for given user.

Parameters:
  • user (int or str or array_like) – User id or batch of user ids to recommend.

  • n_rec (int) – Number of recommendations to return.

Returns:

recommendation – Recommendation result with user ids as keys and array_like recommended items as values.

Return type:

dict of {Union[int, str, array_like] : numpy.ndarray}

abstract save(path, model_name, **kwargs)[source]#

Save model for inference or retraining.

Parameters:
  • path (str) – File folder path to save model.

  • model_name (str) – Name of the saved model file.

See also

load

abstract classmethod load(path, model_name, data_info, **kwargs)[source]#

Load saved model for inference.

Parameters:
  • path (str) – File folder path to save model.

  • model_name (str) – Name of the saved model file.

  • data_info (DataInfo object) – Object that contains some useful information.

Returns:

model – Loaded model.

Return type:

type(cls)

See also

save

class libreco.bases.EmbedBase(task, data_info, embed_size, lower_upper_bound=None)[source]#

Bases: Base

Base class for embed models.

Models that can generate user and item embeddings.

Parameters:
  • task ({'rating', 'ranking'}) – Recommendation task. See Task.

  • data_info (DataInfo object) – Object that contains useful information for training and inference.

  • embed_size (int) – Vector size of embeddings.

  • lower_upper_bound (tuple or None, default: None) – Lower and upper score bound for rating task.

fit(train_data, verbose=1, shuffle=True, eval_data=None, metrics=None, k=10, eval_batch_size=8192, eval_user_num=None)[source]#

Fit embed model on the training data.

Parameters:
  • train_data (TransformedSet object) – Data object used for training.

  • verbose (int, default: 1) – Print verbosity. If eval_data is provided, setting it to higher than 1 will print evaluation metrics during training.

  • shuffle (bool, default: True) – Whether to shuffle the training data.

  • eval_data (TransformedSet object, default: None) – Data object used for evaluating.

  • metrics (list or None, default: None) – List of metrics for evaluating.

  • k (int, default: 10) – Parameter of metrics, e.g. recall at k, ndcg at k

  • eval_batch_size (int, default: 8192) – Batch size for evaluating.

  • eval_user_num (int or None, default: None) – Number of users for evaluating. Setting it to a positive number will sample users randomly from eval data.

Raises:

RuntimeError – If fit() is called from a loaded model(load()).

predict(user, item, cold_start='average', inner_id=False)[source]#

Make prediction(s) on given user(s) and item(s).

Parameters:
  • user (int or str or array_like) – User id or batch of user ids.

  • item (int or str or array_like) – Item id or batch of item ids.

  • cold_start ({'popular', 'average'}, default: 'average') –

    Cold start strategy.

    • ’popular’ will sample from popular items.

    • ’average’ will use the average of all the user/item embeddings as the representation of the cold-start user/item.

  • inner_id (bool, default: False) – Whether to use inner_id defined in libreco. For library users inner_id may never be used.

Returns:

prediction – Predicted scores for each user-item pair.

Return type:

float or numpy.ndarray

recommend_user(user, n_rec, cold_start='average', inner_id=False, filter_consumed=True, random_rec=False)[source]#

Recommend a list of items for given user(s).

Parameters:
  • user (int or str or array_like) – User id or batch of user ids to recommend.

  • n_rec (int) – Number of recommendations to return.

  • cold_start ({'popular', 'average'}, default: 'average') –

    Cold start strategy.

    • ’popular’ will sample from popular items.

    • ’average’ will use the average of all the user/item embeddings as the representation of the cold-start user/item.

  • inner_id (bool, default: False) – Whether to use inner_id defined in libreco. For library users inner_id may never be used.

  • filter_consumed (bool, default: True) – Whether to filter out items that a user has previously consumed.

  • random_rec (bool, default: False) – Whether to choose items for recommendation based on their prediction scores.

Returns:

recommendation – Recommendation result with user ids as keys and array_like recommended items as values.

Return type:

dict of {Union[int, str, array_like] : numpy.ndarray}

save(path, model_name, inference_only=False, **kwargs)[source]#

Save embed model for inference or retraining.

Parameters:
  • path (str) – File folder path to save model.

  • model_name (str) – Name of the saved model file.

  • inference_only (bool, default: False) – Whether to save model only for inference. If it is True, only embeddings will be saved. Otherwise, model variables will be saved.

See also

load

classmethod load(path, model_name, data_info, **kwargs)[source]#

Load saved embed model for inference.

Parameters:
  • path (str) – File folder path to save model.

  • model_name (str) – Name of the saved model file.

  • data_info (DataInfo object) – Object that contains some useful information.

Returns:

model – Loaded embed model.

Return type:

type(cls)

See also

save

get_user_embedding(user=None)[source]#

Get user embedding(s) from the model.

Parameters:

user (int or str or None) – Query user id. If it is None, all user embeddings will be returned.

Returns:

user_embedding – Returned user embeddings.

Return type:

numpy.ndarray

Raises:
  • ValueError – If the user does not appear in the training data.

  • AssertionError – If the model has not been trained.

get_item_embedding(item=None)[source]#

Get item embedding(s) from the model.

Parameters:

item (int or str or None) – Query item id. If it is None, all item embeddings will be returned.

Returns:

item_embedding – Returned item embeddings.

Return type:

numpy.ndarray

Raises:
  • ValueError – If the item does not appear in the training data.

  • AssertionError – If the model has not been trained.

init_knn(approximate, sim_type, M=100, ef_construction=200, ef_search=200)[source]#

Initialize k-nearest-search model.

Parameters:
  • approximate (bool) – Whether to use approximate nearest neighbor search. If it is True, nmslib must be installed. The HNSW method in nmslib is used.

  • sim_type ({'cosine', 'inner-product'}) – Similarity space type.

  • M (int, default: 100) – Parameter in HNSW, refer to nmslib doc.

  • ef_construction (int, default: 200) –

    Parameter in HNSW, refer to nmslib doc.

  • ef_search (int, default: 200) –

    Parameter in HNSW, refer to nmslib doc.

Raises:
  • ValueError – If sim_type is not one of (‘cosine’, ‘inner-product’).

  • ModuleNotFoundError – If approximate=True and nmslib is not installed.

search_knn_users(user, k)[source]#

Search most similar k users.

Parameters:
  • user (int or str) – Query user id.

  • k (int) – Number of similar users.

Returns:

similar users – A list of k similar users.

Return type:

list

search_knn_items(item, k)[source]#

Search most similar k items.

Parameters:
  • item (int or str) – Query item id.

  • k (int) – Number of similar items.

Returns:

similar items – A list of k similar items.

Return type:

list

class libreco.bases.TfBase(task, data_info, lower_upper_bound=None, tf_sess_config=None)[source]#

Bases: Base

Base class for TF models.

Models that relies on TensorFlow graph for inference. Although some models such as RNN4Rec, SVD etc., are trained using TensorFlow, they don’t belong to this base class since their inference only uses embeddings.

Parameters:
  • task ({'rating', 'ranking'}) – Recommendation task. See Task.

  • data_info (DataInfo object) – Object that contains useful information for training and inference.

  • lower_upper_bound (tuple or None) – Lower and upper score bound for rating task.

  • tf_sess_config (dict or None) – Optional TensorFlow session config, see ConfigProto options.

fit(train_data, verbose=1, shuffle=True, eval_data=None, metrics=None, k=10, eval_batch_size=8192, eval_user_num=None)[source]#

Fit TF model on the training data.

Parameters:
  • train_data (TransformedSet object) – Data object used for training.

  • verbose (int, default: 1) – Print verbosity. If eval_data is provided, setting it to higher than 1 will print evaluation metrics during training.

  • shuffle (bool, default: True) – Whether to shuffle the training data.

  • eval_data (TransformedSet object, default: None) – Data object used for evaluating.

  • metrics (list or None, default: None) – List of metrics for evaluating.

  • k (int, default: 10) – Parameter of metrics, e.g. recall at k, ndcg at k

  • eval_batch_size (int, default: 8192) – Batch size for evaluating.

  • eval_user_num (int or None, default: None) – Number of users for evaluating. Setting it to a positive number will sample users randomly from eval data.

Raises:

RuntimeError – If fit() is called from a loaded model(load()).

predict(user, item, feats=None, cold_start='average', inner_id=False)[source]#

Make prediction(s) on given user(s) and item(s).

Parameters:
  • user (int or str or array_like) – User id or batch of user ids.

  • item (int or str or array_like) – Item id or batch of item ids.

  • feats (dict or pandas.Series or None, default: None) – Extra features used in prediction.

  • cold_start ({'popular', 'average'}, default: 'average') –

    Cold start strategy.

    • ’popular’ will sample from popular items.

    • ’average’ will use the average of all the user/item embeddings as the representation of the cold-start user/item.

  • inner_id (bool, default: False) – Whether to use inner_id defined in libreco. For library users inner_id may never be used.

Returns:

prediction – Predicted scores for each user-item pair.

Return type:

float or numpy.ndarray

recommend_user(user, n_rec, user_feats=None, item_data=None, cold_start='average', inner_id=False, filter_consumed=True, random_rec=False)[source]#

Recommend a list of items for given user(s).

Parameters:
  • user (int or str or array_like) – User id or batch of user ids to recommend.

  • n_rec (int) – Number of recommendations to return.

  • user_feats (dict or pandas.Series or None, default: None) – Extra user features for recommendation.

  • item_data (pandas.DataFrame or None, default: None) – Extra item features for recommendation.

  • cold_start ({'popular', 'average'}, default: 'average') –

    Cold start strategy.

    • ’popular’ will sample from popular items.

    • ’average’ will use the average of all the user/item embeddings as the representation of the cold-start user/item.

  • inner_id (bool, default: False) – Whether to use inner_id defined in libreco. For library users inner_id may never be used.

  • filter_consumed (bool, default: True) – Whether to filter out items that a user has previously consumed.

  • random_rec (bool, default: False) – Whether to choose items for recommendation based on their prediction scores.

Returns:

recommendation – Recommendation result with user ids as keys and array_like recommended items as values.

Return type:

dict of {Union[int, str, array_like] : numpy.ndarray}

save(path, model_name, manual=True, inference_only=False)[source]#

Save TF model for inference or retraining.

Parameters:
  • path (str) – File folder path to save model.

  • model_name (str) – Name of the saved model file.

  • manual (bool, default: True) – Whether to save model variables using numpy.

  • inference_only (bool, default: False) – Whether to save model variables only for inference.

See also

load

classmethod load(path, model_name, data_info, manual=True)[source]#

Load saved TF model for inference.

Parameters:
  • path (str) – File folder path to save model.

  • model_name (str) – Name of the saved model file.

  • data_info (DataInfo object) – Object that contains some useful information.

  • manual (bool, default: True) – Whether to load model variables using numpy. If you save the model using manual, you should also load the mode using manual.

Returns:

model – Loaded TF model.

Return type:

type(cls)

See also

save

class libreco.bases.CfBase(task, data_info, cf_type, sim_type='cosine', k_sim=20, store_top_k=True, block_size=None, num_threads=1, min_common=1, mode='invert', seed=42, lower_upper_bound=None)[source]#

Bases: Base

Base class for CF models.

Parameters:
  • task ({'rating', 'ranking'}) – Recommendation task. See Task.

  • data_info (DataInfo object) – Object that contains useful information for training and inference.

  • cf_type ({'user_cf', 'item_cf'}) – Specific CF type.

  • sim_type ({'cosine', 'pearson', 'jaccard'}, default: 'cosine') – Types for computing similarities.

  • k_sim (int, default: 20) – Number of similar items to use.

  • store_top_k (bool, default: True) – Whether to store top k similar users after training.

  • block_size (int or None, default: None) – Block size for computing similarity matrix. Large block size makes computation faster, but may cause memory issue.

  • num_threads (int, default: 1) – Number of threads to use.

  • min_common (int, default: 1) – Number of minimum common users to consider when computing similarities.

  • mode ({'forward', 'invert'}, default: 'invert') – Whether to use forward index or invert index.

  • seed (int, default: 42) – Random seed.

  • lower_upper_bound (tuple or None, default: None) – Lower and upper score bound for rating task.

fit(train_data, verbose=1, eval_data=None, metrics=None, k=10, eval_batch_size=8192, eval_user_num=None)[source]#

Fit CF model on the training data.

Parameters:
  • train_data (TransformedSet object) – Data object used for training.

  • verbose (int, default: 1) – Print verbosity. If eval_data is provided, setting it to higher than 1 will print evaluation metrics during training.

  • eval_data (TransformedSet object, default: None) – Data object used for evaluating.

  • metrics (list or None, default: None) – List of metrics for evaluating.

  • k (int, default: 10) – Parameter of metrics, e.g. recall at k, ndcg at k

  • eval_batch_size (int, default: 8192) – Batch size for evaluating.

  • eval_user_num (int or None, default: None) – Number of users for evaluating. Setting it to a positive number will sample users randomly from eval data.

recommend_user(user, n_rec, cold_start='popular', inner_id=False, filter_consumed=True, random_rec=False)[source]#

Recommend a list of items for given user(s).

Parameters:
  • user (int or str or array_like) – User id or batch of user ids to recommend.

  • n_rec (int) – Number of recommendations to return.

  • cold_start ({'popular'}, default: 'popular') – Cold start strategy, CF models can only use ‘popular’ strategy.

  • inner_id (bool, default: False) – Whether to use inner_id defined in libreco. For library users inner_id may never be used.

  • filter_consumed (bool, default: True) – Whether to filter out items that a user has previously consumed.

  • random_rec (bool, default: False) – Whether to choose items for recommendation based on their prediction scores.

Returns:

recommendation – Recommendation result with user ids as keys and array_like recommended items as values.

Return type:

dict[Union[int, str, array_like], numpy.ndarray]

save(path, model_name, **kwargs)[source]#

Save model for inference or retraining.

Parameters:
  • path (str) – File folder path to save model.

  • model_name (str) – Name of the saved model file.

See also

load

classmethod load(path, model_name, data_info, **kwargs)[source]#

Load saved model for inference.

Parameters:
  • path (str) – File folder path to save model.

  • model_name (str) – Name of the saved model file.

  • data_info (DataInfo object) – Object that contains some useful information.

Returns:

model – Loaded model.

Return type:

type(cls)

See also

save

abstract predict(user, item, **kwargs)#

Predict score for given user and item.

Parameters:
  • user (int or str or array_like) – User id or batch of user ids.

  • item (int or str or array_like) – Item id or batch of item ids.

Returns:

prediction – Predicted scores for each user-item pair.

Return type:

float or numpy.ndarray

class libreco.bases.GensimBase(task, data_info, embed_size=16, norm_embed=False, window_size=5, n_epochs=5, n_threads=0, seed=42, lower_upper_bound=None)[source]#

Bases: EmbedBase

fit(train_data, verbose=1, shuffle=True, eval_data=None, metrics=None, k=10, eval_batch_size=8192, eval_user_num=None)[source]#

Fit embed model on the training data.

Parameters:
  • train_data (TransformedSet object) – Data object used for training.

  • verbose (int, default: 1) – Print verbosity. If eval_data is provided, setting it to higher than 1 will print evaluation metrics during training.

  • shuffle (bool, default: True) – Whether to shuffle the training data.

  • eval_data (TransformedSet object, default: None) – Data object used for evaluating.

  • metrics (list or None, default: None) – List of metrics for evaluating.

  • k (int, default: 10) – Parameter of metrics, e.g. recall at k, ndcg at k

  • eval_batch_size (int, default: 8192) – Batch size for evaluating.

  • eval_user_num (int or None, default: None) – Number of users for evaluating. Setting it to a positive number will sample users randomly from eval data.

Raises:

RuntimeError – If fit() is called from a loaded model(load()).

save(path, model_name, inference_only=False, **_)[source]#

Save embed model for inference or retraining.

Parameters:
  • path (str) – File folder path to save model.

  • model_name (str) – Name of the saved model file.

  • inference_only (bool, default: False) – Whether to save model only for inference. If it is True, only embeddings will be saved. Otherwise, model variables will be saved.

See also

load

rebuild_model(path, model_name)[source]#

Assign the saved model variables to the newly initialized model.

This method is used before retraining the new model, in order to avoid training from scratch every time we get some new data.

Parameters:
  • path (str) – File folder path for the saved model variables.

  • model_name (str) – Name of the saved model file.

get_item_embedding(item=None)#

Get item embedding(s) from the model.

Parameters:

item (int or str or None) – Query item id. If it is None, all item embeddings will be returned.

Returns:

item_embedding – Returned item embeddings.

Return type:

numpy.ndarray

Raises:
  • ValueError – If the item does not appear in the training data.

  • AssertionError – If the model has not been trained.

get_user_embedding(user=None)#

Get user embedding(s) from the model.

Parameters:

user (int or str or None) – Query user id. If it is None, all user embeddings will be returned.

Returns:

user_embedding – Returned user embeddings.

Return type:

numpy.ndarray

Raises:
  • ValueError – If the user does not appear in the training data.

  • AssertionError – If the model has not been trained.

init_knn(approximate, sim_type, M=100, ef_construction=200, ef_search=200)#

Initialize k-nearest-search model.

Parameters:
  • approximate (bool) –

    Whether to use approximate nearest neighbor search. If it is True, nmslib must be installed. The HNSW method in nmslib is used.

  • sim_type ({'cosine', 'inner-product'}) – Similarity space type.

  • M (int, default: 100) –

    Parameter in HNSW, refer to nmslib doc.

  • ef_construction (int, default: 200) –

    Parameter in HNSW, refer to nmslib doc.

  • ef_search (int, default: 200) –

    Parameter in HNSW, refer to nmslib doc.

Raises:
  • ValueError – If sim_type is not one of (‘cosine’, ‘inner-product’).

  • ModuleNotFoundError – If approximate=True and nmslib is not installed.

classmethod load(path, model_name, data_info, **kwargs)#

Load saved embed model for inference.

Parameters:
  • path (str) – File folder path to save model.

  • model_name (str) – Name of the saved model file.

  • data_info (DataInfo object) – Object that contains some useful information.

Returns:

model – Loaded embed model.

Return type:

type(cls)

See also

save

predict(user, item, cold_start='average', inner_id=False)#

Make prediction(s) on given user(s) and item(s).

Parameters:
  • user (int or str or array_like) – User id or batch of user ids.

  • item (int or str or array_like) – Item id or batch of item ids.

  • cold_start ({'popular', 'average'}, default: 'average') –

    Cold start strategy.

    • ’popular’ will sample from popular items.

    • ’average’ will use the average of all the user/item embeddings as the representation of the cold-start user/item.

  • inner_id (bool, default: False) – Whether to use inner_id defined in libreco. For library users inner_id may never be used.

Returns:

prediction – Predicted scores for each user-item pair.

Return type:

float or numpy.ndarray

recommend_user(user, n_rec, cold_start='average', inner_id=False, filter_consumed=True, random_rec=False)#

Recommend a list of items for given user(s).

Parameters:
  • user (int or str or array_like) – User id or batch of user ids to recommend.

  • n_rec (int) – Number of recommendations to return.

  • cold_start ({'popular', 'average'}, default: 'average') –

    Cold start strategy.

    • ’popular’ will sample from popular items.

    • ’average’ will use the average of all the user/item embeddings as the representation of the cold-start user/item.

  • inner_id (bool, default: False) – Whether to use inner_id defined in libreco. For library users inner_id may never be used.

  • filter_consumed (bool, default: True) – Whether to filter out items that a user has previously consumed.

  • random_rec (bool, default: False) – Whether to choose items for recommendation based on their prediction scores.

Returns:

recommendation – Recommendation result with user ids as keys and array_like recommended items as values.

Return type:

dict of {Union[int, str, array_like] : numpy.ndarray}

search_knn_items(item, k)#

Search most similar k items.

Parameters:
  • item (int or str) – Query item id.

  • k (int) – Number of similar items.

Returns:

similar items – A list of k similar items.

Return type:

list

search_knn_users(user, k)#

Search most similar k users.

Parameters:
  • user (int or str) – Query user id.

  • k (int) – Number of similar users.

Returns:

similar users – A list of k similar users.

Return type:

list