LightGCN#

class libreco.algorithms.LightGCN(task, data_info, loss_type='bpr', embed_size=16, n_epochs=20, lr=0.001, lr_decay=False, epsilon=1e-08, amsgrad=False, reg=None, batch_size=256, num_neg=1, dropout_rate=0.0, n_layers=3, margin=1.0, sampler='random', seed=42, device='cuda', lower_upper_bound=None, with_training=True)[source]#

Bases: EmbedBase

LightGCN algorithm.

Caution

LightGCN can only be used in ranking task.

Parameters:

task ({'ranking'}) – Recommendation task. See Task.
data_info (DataInfo object) – Object that contains useful information for training and inference.
loss_type ({'cross_entropy', 'focal', 'bpr', 'max_margin'}, default: 'bpr') – Loss for model training.
embed_size (int, default: 16) – Vector size of embeddings.
n_epochs (int, default: 10) – Number of epochs for training.
lr (float, default 0.001) – Learning rate for training.
lr_decay (bool, default: False) – Whether to use learning rate decay.
epsilon (float, default: 1e-8) – A small constant added to the denominator to improve numerical stability in Adam optimizer.
amsgrad (bool, default: False) – Whether to use the AMSGrad variant from the paper On the Convergence of Adam and Beyond.
reg (float or None, default: None) – Regularization parameter, must be non-negative or None.
batch_size (int, default: 256) – Batch size for training.
num_neg (int, default: 1) – Number of negative samples for each positive sample.
dropout_rate (float, default: 0.0) – Probability of a node being dropped. 0.0 means dropout is not used.
n_layers (int, default: 3) – Number of GCN layer.
margin (float, default: 1.0) – Margin used in max_margin loss.
sampler ({'random', 'unconsumed', 'popular'}, default: 'random') –
Negative sampling strategy.
- 'random' means random sampling.
- 'unconsumed' samples items that the target user did not consume before.
- 'popular' has a higher probability to sample popular items as negative samples.
seed (int, default: 42) – Random seed.
device ({'cpu', 'cuda'}, default: 'cuda') –
Refer to torch.device.

Changed in version 1.0.0: Accept str type 'cpu' or 'cuda', instead of torch.device(...).
lower_upper_bound (tuple or None, default: None) – Lower and upper score bound for rating task.

References

Xiangnan He et al. LightGCN: Simplifying and Powering Graph Convolution Network for Recommendation.

fit(train_data, neg_sampling, verbose=1, shuffle=True, eval_data=None, metrics=None, k=10, eval_batch_size=8192, eval_user_num=None, num_workers=0)#

Fit embed model on the training data.

Parameters:

train_data (TransformedSet object) – Data object used for training.
neg_sampling (bool) –
Whether to perform negative sampling for training or evaluating data.

New in version 1.1.0.

Note

Negative sampling is needed if your data is implicit(i.e., task is ranking) and ONLY contains positive labels. Otherwise, it should be False.
verbose (int, default: 1) – Print verbosity. If eval_data is provided, setting it to higher than 1 will print evaluation metrics during training.
shuffle (bool, default: True) – Whether to shuffle the training data.
eval_data (TransformedSet object, default: None) – Data object used for evaluating.
metrics (list or None, default: None) – List of metrics for evaluating.
k (int, default: 10) – Parameter of metrics, e.g. recall at k, ndcg at k
eval_batch_size (int, default: 8192) – Batch size for evaluating.
eval_user_num (int or None, default: None) – Number of users for evaluating. Setting it to a positive number will sample users randomly from eval data.
num_workers (int, default: 0) –
How many subprocesses to use for training data loading. 0 means that the data will be loaded in the main process, which is slower than multiprocessing.

New in version 1.1.0.

Caution

Using multiprocessing(num_workers > 0) may consume more memory than single processing. See Multi-process data loading.

Raises:

RuntimeError – If fit() is called from a loaded model(load()).
AssertionError – If neg_sampling parameter is not bool type.

get_item_embedding(item=None)#

Get item embedding(s) from the model.

Parameters:

item (int or str or None) – Query item id. If it is None, all item embeddings will be returned.

Returns:

item_embedding – Returned item embeddings.

Return type:

numpy.ndarray

Raises:

ValueError – If the item does not appear in the training data.
AssertionError – If the model has not been trained.

get_user_embedding(user=None)#

Get user embedding(s) from the model.

Parameters:

user (int or str or None) – Query user id. If it is None, all user embeddings will be returned.

Returns:

user_embedding – Returned user embeddings.

Return type:

numpy.ndarray

Raises:

ValueError – If the user does not appear in the training data.
AssertionError – If the model has not been trained.

init_knn(approximate, sim_type, M=100, ef_construction=200, ef_search=200)#

Initialize k-nearest-search model.

Parameters:

approximate (bool) – Whether to use approximate nearest neighbor search. If it is True, nmslib must be installed. The HNSW method in nmslib is used.
sim_type ({'cosine', 'inner-product'}) – Similarity space type.
M (int, default: 100) – Parameter in HNSW, refer to nmslib doc.
ef_construction (int, default: 200) –
Parameter in HNSW, refer to nmslib doc.
ef_search (int, default: 200) –
Parameter in HNSW, refer to nmslib doc.

Raises:

ValueError – If sim_type is not one of (‘cosine’, ‘inner-product’).
ModuleNotFoundError – If approximate=True and nmslib is not installed.

classmethod load(path, model_name, data_info, **kwargs)#

Load saved embed model for inference.

Parameters:

path (str) – File folder path to save model.
model_name (str) – Name of the saved model file.
data_info (DataInfo object) – Object that contains some useful information.

Returns:

model – Loaded embed model.

Return type:

type(cls)