GraphSageDGL#
- class libreco.algorithms.GraphSageDGL(*args, **kwargs)[source]#
Bases:
SageBase
GraphSageDGL algorithm.
Note
This algorithm is implemented in DGL.
Caution
GraphSageDGL can only be used in
ranking
task.New in version 0.12.0.
- Parameters:
task ({'ranking'}) – Recommendation task. See Task.
data_info (
DataInfo
object) – Object that contains useful information for training and inference.loss_type ({'cross_entropy', 'focal', 'bpr', 'max_margin'}, default: 'cross_entropy') – Loss for model training.
paradigm ({'u2i', 'i2i'}, default: 'i2i') –
Choice for features in model.
'u2i'
will combine user features and item features.'i2i'
will only use item features, this is the setting in the original paper.
aggregator_type ({'mean', 'gcn', 'pool', 'lstm'}, default: 'mean') – Aggregator type to use in GraphSage. Refer to SAGEConv in DGL.
embed_size (int, default: 16) – Vector size of embeddings.
n_epochs (int, default: 10) – Number of epochs for training.
lr (float, default 0.001) – Learning rate for training.
lr_decay (bool, default: False) – Whether to use learning rate decay.
epsilon (float, default: 1e-8) – A small constant added to the denominator to improve numerical stability in Adam optimizer.
amsgrad (bool, default: False) – Whether to use the AMSGrad variant from the paper On the Convergence of Adam and Beyond.
reg (float or None, default: None) – Regularization parameter, must be non-negative or None.
batch_size (int, default: 256) – Batch size for training.
num_neg (int, default: 1) – Number of negative samples for each positive sample.
dropout_rate (float, default: 0.0) – Probability of a node being dropped. 0.0 means dropout is not used.
remove_edges (bool, default: False) – Whether to remove edges between target node and its positive pair nodes when target node’s sampled neighbor nodes contain positive pair nodes. This only applies in ‘i2i’ paradigm.
num_layers (int, default: 2) – Number of GCN layers.
num_neighbors (int, default: 3) – Number of sampled neighbors in each layer
num_walks (int, default: 10) – Number of random walks to sample positive item pairs. This only applies in ‘i2i’ paradigm.
sample_walk_len (int, default: 5) – Length of each random walk to sample positive item pairs.
margin (float, default: 1.0) – Margin used in max_margin loss.
sampler ({'random', 'unconsumed', 'popular', 'out-batch'}, default: 'random') –
Negative sampling strategy. The
'u2i'
paradigm can use'random'
,'unconsumed'
,'popular'
, and the'i2i'
paradigm can use'random'
,'out-batch'
,'popular'
.'random'
means random sampling.'unconsumed'
samples items that the target user did not consume before. This can’t be used in'i2i'
since it has no users.'popular'
has a higher probability to sample popular items as negative samples.'out-batch'
samples items that didn’t appear in the batch. This can only be used in'i2i'
paradigm.
start_node ({'random', 'unpopular'}, default: 'random') – Strategy for choosing start nodes in random walks.
'unpopular'
will place a higher probability on unpopular items, which may increase diversity but hurt metrics. This only applies in'i2i'
paradigm.focus_start (bool, default: False) – Whether to keep the start nodes in random walk sampling. The purpose of the parameter
start_node
andfocus_start
is oversampling unpopular items. If you setstart_node='popular'
andfocus_start=True
, unpopular items will be kept in positive samples, which may increase diversity.full_inference (bool, default: False) – Whether to get item embedding by aggregating over all neighbor embeddings.
seed (int, default: 42) – Random seed.
device ({'cpu', 'cuda'}, default: 'cuda') –
Refer to torch.device.
Changed in version 1.0.0: Accept str type
'cpu'
or'cuda'
, instead oftorch.device(...)
.lower_upper_bound (tuple or None, default: None) – Lower and upper score bound for rating task.
See also
References
William L. Hamilton et al. Inductive Representation Learning on Large Graphs.
- fit(train_data, neg_sampling, verbose=1, shuffle=True, eval_data=None, metrics=None, k=10, eval_batch_size=8192, eval_user_num=None, num_workers=0)#
Fit embed model on the training data.
- Parameters:
train_data (
TransformedSet
object) – Data object used for training.neg_sampling (bool) –
Whether to perform negative sampling for training or evaluating data.
New in version 1.1.0.
Note
Negative sampling is needed if your data is implicit(i.e., task is ranking) and ONLY contains positive labels. Otherwise, it should be False.
verbose (int, default: 1) – Print verbosity. If eval_data is provided, setting it to higher than 1 will print evaluation metrics during training.
shuffle (bool, default: True) – Whether to shuffle the training data.
eval_data (
TransformedSet
object, default: None) – Data object used for evaluating.metrics (list or None, default: None) – List of metrics for evaluating.
k (int, default: 10) – Parameter of metrics, e.g. recall at k, ndcg at k
eval_batch_size (int, default: 8192) – Batch size for evaluating.
eval_user_num (int or None, default: None) – Number of users for evaluating. Setting it to a positive number will sample users randomly from eval data.
num_workers (int, default: 0) –
How many subprocesses to use for training data loading. 0 means that the data will be loaded in the main process, which is slower than multiprocessing.
New in version 1.1.0.
Caution
Using multiprocessing(
num_workers
> 0) may consume more memory than single processing. See Multi-process data loading.
- Raises:
RuntimeError – If
fit()
is called from a loaded model(load()
).AssertionError – If
neg_sampling
parameter is not bool type.
- get_item_embedding(item=None, include_bias=False)#
Get item embedding(s) from the model.
- Parameters:
- Returns:
item_embedding – Returned item embeddings.
- Return type:
- Raises:
ValueError – If the item does not appear in the training data.
AssertionError – If the model has not been trained.
- get_user_embedding(user=None, include_bias=False)#
Get user embedding(s) from the model.
- Parameters:
- Returns:
user_embedding – Returned user embeddings.
- Return type:
- Raises:
ValueError – If the user does not appear in the training data.
AssertionError – If the model has not been trained.
- init_knn(approximate, sim_type, M=100, ef_construction=200, ef_search=200)#
Initialize k-nearest-search model.
- Parameters:
approximate (bool) – Whether to use approximate nearest neighbor search. If it is True, nmslib must be installed. The HNSW method in nmslib is used.
sim_type ({'cosine', 'inner-product'}) – Similarity space type.
M (int, default: 100) – Parameter in HNSW, refer to nmslib doc.
ef_construction (int, default: 200) –
Parameter in HNSW, refer to nmslib doc.
ef_search (int, default: 200) –
Parameter in HNSW, refer to nmslib doc.
- Raises:
ValueError – If sim_type is not one of (‘cosine’, ‘inner-product’).
ModuleNotFoundError – If approximate=True and nmslib is not installed.
- classmethod load(path, model_name, data_info, **kwargs)#
Load saved embed model for inference.
- Parameters:
- Returns:
model – Loaded embed model.
- Return type:
type(cls)
See also
- predict(user, item, cold_start='average', inner_id=False)#
Make prediction(s) on given user(s) and item(s).
- Parameters:
user (int or str or array_like) – User id or batch of user ids.
item (int or str or array_like) – Item id or batch of item ids.
cold_start ({'popular', 'average'}, default: 'average') –
Cold start strategy.
’popular’ will sample from popular items.
’average’ will use the average of all the user/item embeddings as the representation of the cold-start user/item.
inner_id (bool, default: False) – Whether to use inner_id defined in libreco. For library users inner_id may never be used.
- Returns:
prediction – Predicted scores for each user-item pair.
- Return type:
- rebuild_model(path, model_name)#
Assign the saved model variables to the newly initialized model.
This method is used before retraining the new model, in order to avoid training from scratch every time we get some new data.
- recommend_user(user, n_rec, cold_start='average', inner_id=False, filter_consumed=True, random_rec=False)#
Recommend a list of items for given user(s).
- Parameters:
user (int or str or array_like) – User id or batch of user ids to recommend.
n_rec (int) – Number of recommendations to return.
cold_start ({'popular', 'average'}, default: 'average') –
Cold start strategy.
’popular’ will sample from popular items.
’average’ will use the average of all the user/item embeddings as the representation of the cold-start user/item.
inner_id (bool, default: False) – Whether to use inner_id defined in libreco. For library users inner_id may never be used.
filter_consumed (bool, default: True) – Whether to filter out items that a user has previously consumed.
random_rec (bool, default: False) – Whether to choose items for recommendation based on their prediction scores.
- Returns:
recommendation – Recommendation result with user ids as keys and array_like recommended items as values.
- Return type:
- save(path, model_name, inference_only=False, **kwargs)#
Save embed model for inference or retraining.
- Parameters:
See also
- search_knn_items(item, k)#
Search most similar k items.