ALS#
- class libreco.algorithms.ALS(task, data_info, embed_size=16, n_epochs=10, reg=None, alpha=10, use_cg=True, n_threads=1, seed=42, lower_upper_bound=None)[source]#
Bases:
EmbedBase
Alternating Least Squares algorithm.
One can use conjugate gradient optimization and set more n_threads to accelerate training.
- Parameters:
task ({'rating', 'ranking'}) – Recommendation task. See Task.
data_info (
DataInfo
object) – Object that contains useful information for training and inference.embed_size (int, default: 16) – Vector size of embeddings.
n_epochs (int, default: 10) – Number of epochs for training.
reg (float or None, default: None) – Regularization parameter, must be non-negative or None.
alpha (int, default: 10) – Parameter used for increasing confidence level, only applied for ranking task.
use_cg (bool, default: True) – Whether to use conjugate gradient optimization. See reference.
n_threads (int, default: 1) – Number of threads to use.
seed (int, default: 42) – Random seed.
lower_upper_bound (tuple or None, default: None) – Lower and upper score bound for rating task.
References
[1] Haoming Li et al. Matrix Completion via Alternating Least Square(ALS).
[2] Yifan Hu et al. Collaborative Filtering for Implicit Feedback Datasets.
[3] Gábor Takács et al. Applications of the Conjugate Gradient Method for Implicit Feedback Collaborative Filtering.
- fit(train_data, neg_sampling, verbose=1, shuffle=True, eval_data=None, metrics=None, k=10, eval_batch_size=8192, eval_user_num=None, **kwargs)[source]#
Fit ALS model on the training data.
- Parameters:
train_data (
TransformedSet
object) – Data object used for training.neg_sampling (bool) –
Whether to perform negative sampling for evaluating data.
New in version 1.1.0.
verbose (int, default: 1) – Print verbosity. If eval_data is provided, setting it to higher than 1 will print evaluation metrics during training.
shuffle (bool, default: True) – Whether to shuffle the training data.
eval_data (
TransformedSet
object, default: None) – Data object used for evaluating.metrics (list or None, default: None) – List of metrics for evaluating.
k (int, default: 10) – Parameter of metrics, e.g. recall at k, ndcg at k
eval_batch_size (int, default: 8192) – Batch size for evaluating.
eval_user_num (int or None, default: None) – Number of users for evaluating. Setting it to a positive number will sample users randomly from eval data.
- get_item_embedding(item=None, include_bias=False)#
Get item embedding(s) from the model.
- Parameters:
- Returns:
item_embedding – Returned item embeddings.
- Return type:
- Raises:
ValueError – If the item does not appear in the training data.
AssertionError – If the model has not been trained.
- get_user_embedding(user=None, include_bias=False)#
Get user embedding(s) from the model.
- Parameters:
- Returns:
user_embedding – Returned user embeddings.
- Return type:
- Raises:
ValueError – If the user does not appear in the training data.
AssertionError – If the model has not been trained.
- init_knn(approximate, sim_type, M=100, ef_construction=200, ef_search=200)#
Initialize k-nearest-search model.
- Parameters:
approximate (bool) – Whether to use approximate nearest neighbor search. If it is True, nmslib must be installed. The HNSW method in nmslib is used.
sim_type ({'cosine', 'inner-product'}) – Similarity space type.
M (int, default: 100) – Parameter in HNSW, refer to nmslib doc.
ef_construction (int, default: 200) –
Parameter in HNSW, refer to nmslib doc.
ef_search (int, default: 200) –
Parameter in HNSW, refer to nmslib doc.
- Raises:
ValueError – If sim_type is not one of (‘cosine’, ‘inner-product’).
ModuleNotFoundError – If approximate=True and nmslib is not installed.
- classmethod load(path, model_name, data_info, **kwargs)#
Load saved embed model for inference.
- Parameters:
- Returns:
model – Loaded embed model.
- Return type:
type(cls)
See also
- predict(user, item, cold_start='average', inner_id=False)#
Make prediction(s) on given user(s) and item(s).
- Parameters:
user (int or str or array_like) – User id or batch of user ids.
item (int or str or array_like) – Item id or batch of item ids.
cold_start ({'popular', 'average'}, default: 'average') –
Cold start strategy.
’popular’ will sample from popular items.
’average’ will use the average of all the user/item embeddings as the representation of the cold-start user/item.
inner_id (bool, default: False) – Whether to use inner_id defined in libreco. For library users inner_id may never be used.
- Returns:
prediction – Predicted scores for each user-item pair.
- Return type:
- recommend_user(user, n_rec, cold_start='average', inner_id=False, filter_consumed=True, random_rec=False)#
Recommend a list of items for given user(s).
- Parameters:
user (int or str or array_like) – User id or batch of user ids to recommend.
n_rec (int) – Number of recommendations to return.
cold_start ({'popular', 'average'}, default: 'average') –
Cold start strategy.
’popular’ will sample from popular items.
’average’ will use the average of all the user/item embeddings as the representation of the cold-start user/item.
inner_id (bool, default: False) – Whether to use inner_id defined in libreco. For library users inner_id may never be used.
filter_consumed (bool, default: True) – Whether to filter out items that a user has previously consumed.
random_rec (bool, default: False) – Whether to choose items for recommendation based on their prediction scores.
- Returns:
recommendation – Recommendation result with user ids as keys and array_like recommended items as values.
- Return type:
- search_knn_items(item, k)#
Search most similar k items.