Base Classes#
- class libreco.bases.Base(task, data_info, lower_upper_bound=None)[source]#
Bases:
ABC
Base class for all recommendation models.
- Parameters:
- abstract fit(train_data, neg_sampling, **kwargs)[source]#
Fit model on the training data.
- Parameters:
train_data (
TransformedSet
object) – Data object used for training.neg_sampling (bool) – Whether to perform negative sampling for training or evaluating data.
- abstract recommend_user(user, n_rec, **kwargs)[source]#
Recommend a list of items for given user.
- Parameters:
- Returns:
recommendation – Recommendation result with user ids as keys and array_like recommended items as values.
- Return type:
- abstract save(path, model_name, **kwargs)[source]#
Save model for inference or retraining.
- Parameters:
See also
- class libreco.bases.EmbedBase(task, data_info, embed_size, lower_upper_bound=None)[source]#
Bases:
Base
Base class for embed models.
Models that can generate user and item embeddings for inference. See algorithm list.
- Parameters:
task ({'rating', 'ranking'}) – Recommendation task. See Task.
data_info (
DataInfo
object) – Object that contains useful information for training and inference.embed_size (int) – Vector size of embeddings.
lower_upper_bound (tuple or None, default: None) – Lower and upper score bound for rating task.
- fit(train_data, neg_sampling, verbose=1, shuffle=True, eval_data=None, metrics=None, k=10, eval_batch_size=8192, eval_user_num=None, num_workers=0)[source]#
Fit embed model on the training data.
- Parameters:
train_data (
TransformedSet
object) – Data object used for training.neg_sampling (bool) –
Whether to perform negative sampling for training or evaluating data.
New in version 1.1.0.
Note
Negative sampling is needed if your data is implicit(i.e., task is ranking) and ONLY contains positive labels. Otherwise, it should be False.
verbose (int, default: 1) –
Print verbosity.
verbose <= 0
: Print nothing.verbose == 1
: Print progress bar and training time.verbose > 1
: Print evaluation metrics ifeval_data
is provided.
shuffle (bool, default: True) – Whether to shuffle the training data.
eval_data (
TransformedSet
object, default: None) – Data object used for evaluating.metrics (list or None, default: None) – List of metrics for evaluating.
k (int, default: 10) – Parameter of metrics, e.g. recall at k, ndcg at k
eval_batch_size (int, default: 8192) – Batch size for evaluating.
eval_user_num (int or None, default: None) – Number of users for evaluating. Setting it to a positive number will sample users randomly from eval data.
num_workers (int, default: 0) –
How many subprocesses to use for training data loading. 0 means that the data will be loaded in the main process, which is slower than multiprocessing.
New in version 1.1.0.
Caution
Using multiprocessing(
num_workers
> 0) may consume more memory than single processing. See Multi-process data loading.
- Raises:
RuntimeError – If
fit()
is called from a loaded model(load()
).AssertionError – If
neg_sampling
parameter is not bool type.
- predict(user, item, cold_start='average', inner_id=False)[source]#
Make prediction(s) on given user(s) and item(s).
- Parameters:
user (int or str or array_like) – User id or batch of user ids.
item (int or str or array_like) – Item id or batch of item ids.
cold_start ({'popular', 'average'}, default: 'average') –
Cold start strategy.
’popular’ will sample from popular items.
’average’ will use the average of all the user/item embeddings as the representation of the cold-start user/item.
inner_id (bool, default: False) – Whether to use inner_id defined in libreco. For library users inner_id may never be used.
- Returns:
prediction – Predicted scores for each user-item pair.
- Return type:
- recommend_user(user, n_rec, cold_start='average', inner_id=False, filter_consumed=True, random_rec=False)[source]#
Recommend a list of items for given user(s).
- Parameters:
user (int or str or array_like) – User id or batch of user ids to recommend.
n_rec (int) – Number of recommendations to return.
cold_start ({'popular', 'average'}, default: 'average') –
Cold start strategy.
’popular’ will sample from popular items.
’average’ will use the average of all the user/item embeddings as the representation of the cold-start user/item.
inner_id (bool, default: False) – Whether to use inner_id defined in libreco. For library users inner_id may never be used.
filter_consumed (bool, default: True) – Whether to filter out items that a user has previously consumed.
random_rec (bool, default: False) – Whether to choose items for recommendation based on their prediction scores.
- Returns:
recommendation – Recommendation result with user ids as keys and array_like recommended items as values.
- Return type:
- save(path, model_name, inference_only=False, **kwargs)[source]#
Save embed model for inference or retraining.
- Parameters:
See also
- classmethod load(path, model_name, data_info, **kwargs)[source]#
Load saved embed model for inference.
- Parameters:
- Returns:
model – Loaded embed model.
- Return type:
type(cls)
See also
- get_user_embedding(user=None, include_bias=False)[source]#
Get user embedding(s) from the model.
- Parameters:
- Returns:
user_embedding – Returned user embeddings.
- Return type:
- Raises:
ValueError – If the user does not appear in the training data.
AssertionError – If the model has not been trained.
- get_item_embedding(item=None, include_bias=False)[source]#
Get item embedding(s) from the model.
- Parameters:
- Returns:
item_embedding – Returned item embeddings.
- Return type:
- Raises:
ValueError – If the item does not appear in the training data.
AssertionError – If the model has not been trained.
- init_knn(approximate, sim_type, M=100, ef_construction=200, ef_search=200)[source]#
Initialize k-nearest-search model.
- Parameters:
approximate (bool) – Whether to use approximate nearest neighbor search. If it is True, nmslib must be installed. The HNSW method in nmslib is used.
sim_type ({'cosine', 'inner-product'}) – Similarity space type.
M (int, default: 100) – Parameter in HNSW, refer to nmslib doc.
ef_construction (int, default: 200) –
Parameter in HNSW, refer to nmslib doc.
ef_search (int, default: 200) –
Parameter in HNSW, refer to nmslib doc.
- Raises:
ValueError – If sim_type is not one of (‘cosine’, ‘inner-product’).
ModuleNotFoundError – If approximate=True and nmslib is not installed.
- class libreco.bases.TfBase(task, data_info, lower_upper_bound=None, tf_sess_config=None)[source]#
Bases:
Base
Base class for TF models.
Models that relies on TensorFlow graph for inference. Although some models such as RNN4Rec, SVD etc., are trained using TensorFlow, they don’t inherit from this base class since their inference only uses embeddings.
- Parameters:
task ({'rating', 'ranking'}) – Recommendation task. See Task.
data_info (
DataInfo
object) – Object that contains useful information for training and inference.lower_upper_bound (tuple or None) – Lower and upper score bound for rating task.
tf_sess_config (dict or None) – Optional TensorFlow session config, see ConfigProto options.
- fit(train_data, neg_sampling, verbose=1, shuffle=True, eval_data=None, metrics=None, k=10, eval_batch_size=8192, eval_user_num=None, num_workers=0)[source]#
Fit TF model on the training data.
- Parameters:
train_data (
TransformedSet
object) – Data object used for training.neg_sampling (bool) –
Whether to perform negative sampling for training or evaluating data.
New in version 1.1.0.
Note
Negative sampling is needed if your data is implicit(i.e., task is ranking) and ONLY contains positive labels. Otherwise, it should be False.
verbose (int, default: 1) –
Print verbosity.
verbose <= 0
: Print nothing.verbose == 1
: Print progress bar and training time.verbose > 1
: Print evaluation metrics ifeval_data
is provided.
shuffle (bool, default: True) – Whether to shuffle the training data.
eval_data (
TransformedSet
object, default: None) – Data object used for evaluating.metrics (list or None, default: None) – List of metrics for evaluating.
k (int, default: 10) – Parameter of metrics, e.g. recall at k, ndcg at k
eval_batch_size (int, default: 8192) – Batch size for evaluating.
eval_user_num (int or None, default: None) – Number of users for evaluating. Setting it to a positive number will sample users randomly from eval data.
num_workers (int, default: 0) –
How many subprocesses to use for training data loading. 0 means that the data will be loaded in the main process, which is slower than multiprocessing.
New in version 1.1.0.
Caution
Using multiprocessing(
num_workers
> 0) may consume more memory than single processing. See Multi-process data loading.
- Raises:
RuntimeError – If
fit()
is called from a loaded model(load()
).AssertionError – If
neg_sampling
parameter is not bool type.
- predict(user, item, feats=None, cold_start='average', inner_id=False)[source]#
Make prediction(s) on given user(s) and item(s).
- Parameters:
user (int or str or array_like) – User id or batch of user ids.
item (int or str or array_like) – Item id or batch of item ids.
feats (dict or pandas.Series or None, default: None) – Extra features used in prediction.
cold_start ({'popular', 'average'}, default: 'average') –
Cold start strategy.
’popular’ will sample from popular items.
’average’ will use the average of all the user/item embeddings as the representation of the cold-start user/item.
inner_id (bool, default: False) – Whether to use inner_id defined in libreco. For library users inner_id may never be used.
- Returns:
prediction – Predicted scores for each user-item pair.
- Return type:
- recommend_user(user, n_rec, user_feats=None, seq=None, cold_start='average', inner_id=False, filter_consumed=True, random_rec=False)[source]#
Recommend a list of items for given user(s).
If both
user_feats
andseq
areNone
, the model will use the stored features for recommendation, and thecold_start
strategy will be used for unknown users.If either
user_feats
orseq
is provided, the model will use them for recommendation. In this case, if theuser
is unknown, it will be set to padding id, which means thecold_start
strategy will not be applied. This situation is common when one wants to recommend for an unknown user based on user features or behavior sequence.- Parameters:
user (int or str or array_like) – User id or batch of user ids to recommend.
n_rec (int) – Number of recommendations to return.
user_feats (dict or None, default: None) – Extra user features for recommendation.
seq (list or numpy.ndarray or None, default: None) –
Extra item sequence for recommendation. If the sequence length is larger than recent_num hyperparameter specified in the model, it will be truncated. If smaller, it will be padded.
New in version 1.1.0.
cold_start ({'popular', 'average'}, default: 'average') –
Cold start strategy.
’popular’ will sample from popular items.
’average’ will use the average of all the user/item embeddings as the representation of the cold-start user/item.
inner_id (bool, default: False) – Whether to use inner_id defined in libreco. For library users inner_id may never be used.
filter_consumed (bool, default: True) – Whether to filter out items that a user has previously consumed.
random_rec (bool, default: False) – Whether to choose items for recommendation based on their prediction scores.
- Returns:
recommendation – Recommendation result with user ids as keys and array_like recommended items as values.
- Return type:
- save(path, model_name, manual=True, inference_only=False)[source]#
Save TF model for inference or retraining.
- Parameters:
See also
- classmethod load(path, model_name, data_info, manual=True)[source]#
Load saved TF model for inference.
- Parameters:
path (str) – File folder path to save model.
model_name (str) – Name of the saved model file.
data_info (
DataInfo
object) – Object that contains some useful information.manual (bool, default: True) – Whether to load model variables using numpy. If you save the model using manual, you should also load the mode using manual.
- Returns:
model – Loaded TF model.
- Return type:
type(cls)
See also
- class libreco.bases.CfBase(task, data_info, cf_type, sim_type='cosine', k_sim=20, store_top_k=True, block_size=None, num_threads=1, min_common=1, mode='invert', seed=42, lower_upper_bound=None)[source]#
Bases:
Base
Base class for CF models.
- Parameters:
task ({'rating', 'ranking'}) – Recommendation task. See Task.
data_info (
DataInfo
object) – Object that contains useful information for training and inference.cf_type ({'user_cf', 'item_cf'}) – Specific CF type.
sim_type ({'cosine', 'pearson', 'jaccard'}, default: 'cosine') – Types for computing similarities.
k_sim (int, default: 20) – Number of similar items to use.
store_top_k (bool, default: True) – Whether to store top k similar users after training.
block_size (int or None, default: None) – Block size for computing similarity matrix. Large block size makes computation faster, but may cause memory issue.
num_threads (int, default: 1) – Number of threads to use.
min_common (int, default: 1) – Number of minimum common users to consider when computing similarities.
mode ({'forward', 'invert'}, default: 'invert') – Whether to use forward index or invert index.
seed (int, default: 42) – Random seed.
lower_upper_bound (tuple or None, default: None) – Lower and upper score bound for rating task.
- fit(train_data, neg_sampling, verbose=1, eval_data=None, metrics=None, k=10, eval_batch_size=8192, eval_user_num=None)[source]#
Fit CF model on the training data.
- Parameters:
train_data (
TransformedSet
object) – Data object used for training.neg_sampling (bool) –
Whether to perform negative sampling for evaluating data.
New in version 1.1.0.
verbose (int, default: 1) – Print verbosity. If eval_data is provided, setting it to higher than 1 will print evaluation metrics during training.
eval_data (
TransformedSet
object, default: None) – Data object used for evaluating.metrics (list or None, default: None) – List of metrics for evaluating.
k (int, default: 10) – Parameter of metrics, e.g. recall at k, ndcg at k
eval_batch_size (int, default: 8192) – Batch size for evaluating.
eval_user_num (int or None, default: None) – Number of users for evaluating. Setting it to a positive number will sample users randomly from eval data.
- recommend_user(user, n_rec, cold_start='popular', inner_id=False, filter_consumed=True, random_rec=False)[source]#
Recommend a list of items for given user(s).
- Parameters:
user (int or str or array_like) – User id or batch of user ids to recommend.
n_rec (int) – Number of recommendations to return.
cold_start ({'popular'}, default: 'popular') – Cold start strategy, CF models can only use ‘popular’ strategy.
inner_id (bool, default: False) – Whether to use inner_id defined in libreco. For library users inner_id may never be used.
filter_consumed (bool, default: True) – Whether to filter out items that a user has previously consumed.
random_rec (bool, default: False) – Whether to choose items for recommendation based on their prediction scores.
- Returns:
recommendation – Recommendation result with user ids as keys and array_like recommended items as values.
- Return type:
dict[Union[int, str, array_like], numpy.ndarray]
- save(path, model_name, **kwargs)[source]#
Save model for inference or retraining.
- Parameters:
See also
- classmethod load(path, model_name, data_info, **kwargs)[source]#
Load saved model for inference.
- Parameters:
- Returns:
model – Loaded model.
- Return type:
type(cls)
See also
- abstract predict(user, item, **kwargs)#
Predict score for given user and item.
- class libreco.bases.GensimBase(task, data_info, embed_size=16, norm_embed=False, window_size=5, n_epochs=5, n_threads=0, seed=42, lower_upper_bound=None)[source]#
Bases:
EmbedBase
Base class for models that use Gensim for training.
Including Item2Vec and Deepwalk.
- fit(train_data, neg_sampling, verbose=1, shuffle=True, eval_data=None, metrics=None, k=10, eval_batch_size=8192, eval_user_num=None, **kwargs)[source]#
Fit embed model on the training data.
- Parameters:
train_data (
TransformedSet
object) – Data object used for training.neg_sampling (bool) –
Whether to perform negative sampling for training or evaluating data.
New in version 1.1.0.
Note
Negative sampling is needed if your data is implicit(i.e., task is ranking) and ONLY contains positive labels. Otherwise, it should be False.
verbose (int, default: 1) –
Print verbosity.
verbose <= 0
: Print nothing.verbose == 1
: Print progress bar and training time.verbose > 1
: Print evaluation metrics ifeval_data
is provided.
shuffle (bool, default: True) – Whether to shuffle the training data.
eval_data (
TransformedSet
object, default: None) – Data object used for evaluating.metrics (list or None, default: None) – List of metrics for evaluating.
k (int, default: 10) – Parameter of metrics, e.g. recall at k, ndcg at k
eval_batch_size (int, default: 8192) – Batch size for evaluating.
eval_user_num (int or None, default: None) – Number of users for evaluating. Setting it to a positive number will sample users randomly from eval data.
num_workers (int, default: 0) –
How many subprocesses to use for training data loading. 0 means that the data will be loaded in the main process, which is slower than multiprocessing.
New in version 1.1.0.
Caution
Using multiprocessing(
num_workers
> 0) may consume more memory than single processing. See Multi-process data loading.
- Raises:
RuntimeError – If
fit()
is called from a loaded model(load()
).AssertionError – If
neg_sampling
parameter is not bool type.
- save(path, model_name, inference_only=False, **_)[source]#
Save embed model for inference or retraining.
- Parameters:
See also
- rebuild_model(path, model_name)[source]#
Assign the saved model variables to the newly initialized model.
This method is used before retraining the new model, in order to avoid training from scratch every time we get some new data.
- get_item_embedding(item=None, include_bias=False)#
Get item embedding(s) from the model.
- Parameters:
- Returns:
item_embedding – Returned item embeddings.
- Return type:
- Raises:
ValueError – If the item does not appear in the training data.
AssertionError – If the model has not been trained.
- get_user_embedding(user=None, include_bias=False)#
Get user embedding(s) from the model.
- Parameters:
- Returns:
user_embedding – Returned user embeddings.
- Return type:
- Raises:
ValueError – If the user does not appear in the training data.
AssertionError – If the model has not been trained.
- init_knn(approximate, sim_type, M=100, ef_construction=200, ef_search=200)#
Initialize k-nearest-search model.
- Parameters:
approximate (bool) –
Whether to use approximate nearest neighbor search. If it is True, nmslib must be installed. The HNSW method in nmslib is used.
sim_type ({'cosine', 'inner-product'}) – Similarity space type.
M (int, default: 100) –
Parameter in HNSW, refer to nmslib doc.
ef_construction (int, default: 200) –
Parameter in HNSW, refer to nmslib doc.
ef_search (int, default: 200) –
Parameter in HNSW, refer to nmslib doc.
- Raises:
ValueError – If sim_type is not one of (‘cosine’, ‘inner-product’).
ModuleNotFoundError – If approximate=True and nmslib is not installed.
- classmethod load(path, model_name, data_info, **kwargs)#
Load saved embed model for inference.
- Parameters:
- Returns:
model – Loaded embed model.
- Return type:
type(cls)
See also
- predict(user, item, cold_start='average', inner_id=False)#
Make prediction(s) on given user(s) and item(s).
- Parameters:
user (int or str or array_like) – User id or batch of user ids.
item (int or str or array_like) – Item id or batch of item ids.
cold_start ({'popular', 'average'}, default: 'average') –
Cold start strategy.
’popular’ will sample from popular items.
’average’ will use the average of all the user/item embeddings as the representation of the cold-start user/item.
inner_id (bool, default: False) – Whether to use inner_id defined in libreco. For library users inner_id may never be used.
- Returns:
prediction – Predicted scores for each user-item pair.
- Return type:
- recommend_user(user, n_rec, cold_start='average', inner_id=False, filter_consumed=True, random_rec=False)#
Recommend a list of items for given user(s).
- Parameters:
user (int or str or array_like) – User id or batch of user ids to recommend.
n_rec (int) – Number of recommendations to return.
cold_start ({'popular', 'average'}, default: 'average') –
Cold start strategy.
’popular’ will sample from popular items.
’average’ will use the average of all the user/item embeddings as the representation of the cold-start user/item.
inner_id (bool, default: False) – Whether to use inner_id defined in libreco. For library users inner_id may never be used.
filter_consumed (bool, default: True) – Whether to filter out items that a user has previously consumed.
random_rec (bool, default: False) – Whether to choose items for recommendation based on their prediction scores.
- Returns:
recommendation – Recommendation result with user ids as keys and array_like recommended items as values.
- Return type:
- search_knn_items(item, k)#
Search most similar k items.
- class libreco.bases.SageBase(task, data_info, loss_type='cross_entropy', paradigm='i2i', embed_size=16, n_epochs=20, lr=0.001, lr_decay=False, epsilon=1e-08, amsgrad=False, reg=None, batch_size=256, num_neg=1, dropout_rate=0.0, remove_edges=False, num_layers=2, num_neighbors=3, num_walks=10, sample_walk_len=5, margin=1.0, sampler='random', start_node='random', focus_start=False, seed=42, device='cuda', lower_upper_bound=None, full_inference=False)[source]#
Bases:
EmbedBase
Base class for GraphSage and PinSage.
Graph neural network algorithms using neighbor sampling and node features.
See also
- fit(train_data, neg_sampling, verbose=1, shuffle=True, eval_data=None, metrics=None, k=10, eval_batch_size=8192, eval_user_num=None, num_workers=0)#
Fit embed model on the training data.
- Parameters:
train_data (
TransformedSet
object) – Data object used for training.neg_sampling (bool) –
Whether to perform negative sampling for training or evaluating data.
New in version 1.1.0.
Note
Negative sampling is needed if your data is implicit(i.e., task is ranking) and ONLY contains positive labels. Otherwise, it should be False.
verbose (int, default: 1) –
Print verbosity.
verbose <= 0
: Print nothing.verbose == 1
: Print progress bar and training time.verbose > 1
: Print evaluation metrics ifeval_data
is provided.
shuffle (bool, default: True) – Whether to shuffle the training data.
eval_data (
TransformedSet
object, default: None) – Data object used for evaluating.metrics (list or None, default: None) – List of metrics for evaluating.
k (int, default: 10) – Parameter of metrics, e.g. recall at k, ndcg at k
eval_batch_size (int, default: 8192) – Batch size for evaluating.
eval_user_num (int or None, default: None) – Number of users for evaluating. Setting it to a positive number will sample users randomly from eval data.
num_workers (int, default: 0) –
How many subprocesses to use for training data loading. 0 means that the data will be loaded in the main process, which is slower than multiprocessing.
New in version 1.1.0.
Caution
Using multiprocessing(
num_workers
> 0) may consume more memory than single processing. See Multi-process data loading.
- Raises:
RuntimeError – If
fit()
is called from a loaded model(load()
).AssertionError – If
neg_sampling
parameter is not bool type.
- get_item_embedding(item=None, include_bias=False)#
Get item embedding(s) from the model.
- Parameters:
- Returns:
item_embedding – Returned item embeddings.
- Return type:
- Raises:
ValueError – If the item does not appear in the training data.
AssertionError – If the model has not been trained.
- get_user_embedding(user=None, include_bias=False)#
Get user embedding(s) from the model.
- Parameters:
- Returns:
user_embedding – Returned user embeddings.
- Return type:
- Raises:
ValueError – If the user does not appear in the training data.
AssertionError – If the model has not been trained.
- init_knn(approximate, sim_type, M=100, ef_construction=200, ef_search=200)#
Initialize k-nearest-search model.
- Parameters:
approximate (bool) –
Whether to use approximate nearest neighbor search. If it is True, nmslib must be installed. The HNSW method in nmslib is used.
sim_type ({'cosine', 'inner-product'}) – Similarity space type.
M (int, default: 100) –
Parameter in HNSW, refer to nmslib doc.
ef_construction (int, default: 200) –
Parameter in HNSW, refer to nmslib doc.
ef_search (int, default: 200) –
Parameter in HNSW, refer to nmslib doc.
- Raises:
ValueError – If sim_type is not one of (‘cosine’, ‘inner-product’).
ModuleNotFoundError – If approximate=True and nmslib is not installed.
- classmethod load(path, model_name, data_info, **kwargs)#
Load saved embed model for inference.
- Parameters:
- Returns:
model – Loaded embed model.
- Return type:
type(cls)
See also
- predict(user, item, cold_start='average', inner_id=False)#
Make prediction(s) on given user(s) and item(s).
- Parameters:
user (int or str or array_like) – User id or batch of user ids.
item (int or str or array_like) – Item id or batch of item ids.
cold_start ({'popular', 'average'}, default: 'average') –
Cold start strategy.
’popular’ will sample from popular items.
’average’ will use the average of all the user/item embeddings as the representation of the cold-start user/item.
inner_id (bool, default: False) – Whether to use inner_id defined in libreco. For library users inner_id may never be used.
- Returns:
prediction – Predicted scores for each user-item pair.
- Return type:
- recommend_user(user, n_rec, cold_start='average', inner_id=False, filter_consumed=True, random_rec=False)#
Recommend a list of items for given user(s).
- Parameters:
user (int or str or array_like) – User id or batch of user ids to recommend.
n_rec (int) – Number of recommendations to return.
cold_start ({'popular', 'average'}, default: 'average') –
Cold start strategy.
’popular’ will sample from popular items.
’average’ will use the average of all the user/item embeddings as the representation of the cold-start user/item.
inner_id (bool, default: False) – Whether to use inner_id defined in libreco. For library users inner_id may never be used.
filter_consumed (bool, default: True) – Whether to filter out items that a user has previously consumed.
random_rec (bool, default: False) – Whether to choose items for recommendation based on their prediction scores.
- Returns:
recommendation – Recommendation result with user ids as keys and array_like recommended items as values.
- Return type:
- save(path, model_name, inference_only=False, **kwargs)#
Save embed model for inference or retraining.
- Parameters:
See also
- search_knn_items(item, k)#
Search most similar k items.
- class libreco.bases.DynEmbedBase(task, data_info, embed_size, norm_embed, recent_num=None, random_num=None, lower_upper_bound=None, tf_sess_config=None)[source]#
Bases:
EmbedBase
Base class for dynamic embedding models.
These models can generate embedding and make recommendation based on arbitrary user features or item sequences. So they also need to save the tf variables for inference.
New in version 1.2.0.
- convert_array_id(user, inner_id)[source]#
Convert a single user to inner user id.
If the user doesn’t exist, it will be converted to padding id. The return type should be array_like for further shape compatibility.
- recommend_user(user, n_rec, user_feats=None, seq=None, cold_start='average', inner_id=False, filter_consumed=True, random_rec=False)[source]#
Recommend a list of items for given user(s).
If both
user_feats
andseq
areNone
, the model will use the precomputed embeddings for recommendation, and thecold_start
strategy will be used for unknown users.If either
user_feats
orseq
is provided, the model will generate user embedding dynamically for recommendation. In this case, if theuser
is unknown, it will be set to padding id, which means thecold_start
strategy will not be applied. This situation is common when one wants to recommend for an unknown user based on user features or behavior sequence.- Parameters:
user (int or str or array_like) – User id or batch of user ids to recommend.
n_rec (int) – Number of recommendations to return.
user_feats (dict or None, default: None) –
Extra user features for recommendation.
New in version 1.2.0.
seq (list or numpy.ndarray or None, default: None) –
Extra item sequence for recommendation. If the sequence length is larger than recent_num hyperparameter specified in the model, it will be truncated. If smaller, it will be padded.
New in version 1.1.0.
cold_start ({'popular', 'average'}, default: 'average') –
Cold start strategy.
’popular’ will sample from popular items.
’average’ will use the average of all the user/item embeddings as the representation of the cold-start user/item.
inner_id (bool, default: False) – Whether to use inner_id defined in libreco. For library users inner_id may never be used.
filter_consumed (bool, default: True) – Whether to filter out items that a user has previously consumed.
random_rec (bool, default: False) – Whether to choose items for recommendation based on their prediction scores.
- Returns:
recommendation – Recommendation result with user ids as keys and array_like recommended items as values.
- Return type:
- dyn_user_embedding(user, user_feats=None, seq=None, include_bias=False, inner_id=False)[source]#
Generate user embedding based on given user features or item sequence.
New in version 1.2.0.
- Parameters:
user_feats (dict or None, default: None) – Extra user features for recommendation.
seq (list or numpy.ndarray or None, default: None) – Extra item sequence for recommendation. If the sequence length is larger than recent_num hyperparameter specified in the model, it will be truncated. If smaller, it will be padded.
include_bias (bool, default: False) – Whether to include bias term in returned embeddings. Note some models such as SVD, BPR etc., use bias term in model inference.
inner_id (bool, default: False) – Whether to use inner_id defined in libreco. For library users inner_id may never be used.
- Returns:
user_embedding – Generated dynamic user embeddings.
- Return type:
- Raises:
ValueError – If user is not a single user.
ValueError – If seq is provided but the model doesn’t support sequence recommendation.
- save(path, model_name, inference_only=False, **_)[source]#
Save embed model for inference or retraining.
- Parameters:
See also
- classmethod load(path, model_name, data_info, **kwargs)[source]#
Load saved embed model for inference.
- Parameters:
- Returns:
model – Loaded embed model.
- Return type:
type(cls)
See also
- fit(train_data, neg_sampling, verbose=1, shuffle=True, eval_data=None, metrics=None, k=10, eval_batch_size=8192, eval_user_num=None, num_workers=0)#
Fit embed model on the training data.
- Parameters:
train_data (
TransformedSet
object) – Data object used for training.neg_sampling (bool) –
Whether to perform negative sampling for training or evaluating data.
New in version 1.1.0.
Note
Negative sampling is needed if your data is implicit(i.e., task is ranking) and ONLY contains positive labels. Otherwise, it should be False.
verbose (int, default: 1) –
Print verbosity.
verbose <= 0
: Print nothing.verbose == 1
: Print progress bar and training time.verbose > 1
: Print evaluation metrics ifeval_data
is provided.
shuffle (bool, default: True) – Whether to shuffle the training data.
eval_data (
TransformedSet
object, default: None) – Data object used for evaluating.metrics (list or None, default: None) – List of metrics for evaluating.
k (int, default: 10) – Parameter of metrics, e.g. recall at k, ndcg at k
eval_batch_size (int, default: 8192) – Batch size for evaluating.
eval_user_num (int or None, default: None) – Number of users for evaluating. Setting it to a positive number will sample users randomly from eval data.
num_workers (int, default: 0) –
How many subprocesses to use for training data loading. 0 means that the data will be loaded in the main process, which is slower than multiprocessing.
New in version 1.1.0.
Caution
Using multiprocessing(
num_workers
> 0) may consume more memory than single processing. See Multi-process data loading.
- Raises:
RuntimeError – If
fit()
is called from a loaded model(load()
).AssertionError – If
neg_sampling
parameter is not bool type.
- get_item_embedding(item=None, include_bias=False)#
Get item embedding(s) from the model.
- Parameters:
- Returns:
item_embedding – Returned item embeddings.
- Return type:
- Raises:
ValueError – If the item does not appear in the training data.
AssertionError – If the model has not been trained.
- get_user_embedding(user=None, include_bias=False)#
Get user embedding(s) from the model.
- Parameters:
- Returns:
user_embedding – Returned user embeddings.
- Return type:
- Raises:
ValueError – If the user does not appear in the training data.
AssertionError – If the model has not been trained.
- init_knn(approximate, sim_type, M=100, ef_construction=200, ef_search=200)#
Initialize k-nearest-search model.
- Parameters:
approximate (bool) –
Whether to use approximate nearest neighbor search. If it is True, nmslib must be installed. The HNSW method in nmslib is used.
sim_type ({'cosine', 'inner-product'}) – Similarity space type.
M (int, default: 100) –
Parameter in HNSW, refer to nmslib doc.
ef_construction (int, default: 200) –
Parameter in HNSW, refer to nmslib doc.
ef_search (int, default: 200) –
Parameter in HNSW, refer to nmslib doc.
- Raises:
ValueError – If sim_type is not one of (‘cosine’, ‘inner-product’).
ModuleNotFoundError – If approximate=True and nmslib is not installed.
- predict(user, item, cold_start='average', inner_id=False)#
Make prediction(s) on given user(s) and item(s).
- Parameters:
user (int or str or array_like) – User id or batch of user ids.
item (int or str or array_like) – Item id or batch of item ids.
cold_start ({'popular', 'average'}, default: 'average') –
Cold start strategy.
’popular’ will sample from popular items.
’average’ will use the average of all the user/item embeddings as the representation of the cold-start user/item.
inner_id (bool, default: False) – Whether to use inner_id defined in libreco. For library users inner_id may never be used.
- Returns:
prediction – Predicted scores for each user-item pair.
- Return type:
- search_knn_items(item, k)#
Search most similar k items.