DeepFM#

class libreco.algorithms.DeepFM(task, data_info, loss_type='cross_entropy', embed_size=16, n_epochs=20, lr=0.001, lr_decay=False, epsilon=1e-05, reg=None, batch_size=256, num_neg=1, use_bn=True, dropout_rate=None, hidden_units=(128, 64, 32), multi_sparse_combiner='sqrtn', seed=42, lower_upper_bound=None, tf_sess_config=None)[source]#

Bases: TfBase

DeepFM algorithm.

Parameters:
  • task ({'rating', 'ranking'}) – Recommendation task. See Task.

  • data_info (DataInfo object) – Object that contains useful information for training and inference.

  • loss_type ({'cross_entropy', 'focal'}, default: 'cross_entropy') – Loss for model training.

  • embed_size (int, default: 16) – Vector size of embeddings.

  • n_epochs (int, default: 10) – Number of epochs for training.

  • lr (float, default 0.001) – Learning rate for training.

  • lr_decay (bool, default: False) – Whether to use learning rate decay.

  • epsilon (float, default: 1e-5) – A small constant added to the denominator to improve numerical stability in Adam optimizer. According to the official comment, default value of 1e-8 for epsilon is generally not good, so here we choose 1e-5. Users can try tuning this hyperparameter if the training is unstable.

  • reg (float or None, default: None) – Regularization parameter, must be non-negative or None.

  • batch_size (int, default: 256) – Batch size for training.

  • num_neg (int, default: 1) – Number of negative samples for each positive sample, only used in ranking task.

  • use_bn (bool, default: True) – Whether to use batch normalization.

  • dropout_rate (float or None, default: None) – Probability of an element to be zeroed. If it is None, dropout is not used.

  • hidden_units (int, list of int or tuple of (int,), default: (128, 64, 32)) –

    Number of layers and corresponding layer size in MLP.

    Changed in version 1.0.0: Accept type of int, list or tuple, instead of str.

  • multi_sparse_combiner ({'normal', 'mean', 'sum', 'sqrtn'}, default: 'sqrtn') – Options for combining multi_sparse features.

  • seed (int, default: 42) – Random seed.

  • lower_upper_bound (tuple or None, default: None) – Lower and upper score bound for rating task.

  • tf_sess_config (dict or None, default: None) – Optional TensorFlow session config, see ConfigProto options.

fit(train_data, verbose=1, shuffle=True, eval_data=None, metrics=None, k=10, eval_batch_size=8192, eval_user_num=None)#

Fit TF model on the training data.

Parameters:
  • train_data (TransformedSet object) – Data object used for training.

  • verbose (int, default: 1) – Print verbosity. If eval_data is provided, setting it to higher than 1 will print evaluation metrics during training.

  • shuffle (bool, default: True) – Whether to shuffle the training data.

  • eval_data (TransformedSet object, default: None) – Data object used for evaluating.

  • metrics (list or None, default: None) – List of metrics for evaluating.

  • k (int, default: 10) – Parameter of metrics, e.g. recall at k, ndcg at k

  • eval_batch_size (int, default: 8192) – Batch size for evaluating.

  • eval_user_num (int or None, default: None) – Number of users for evaluating. Setting it to a positive number will sample users randomly from eval data.

Raises:

RuntimeError – If fit() is called from a loaded model(load()).

classmethod load(path, model_name, data_info, manual=True)#

Load saved TF model for inference.

Parameters:
  • path (str) – File folder path to save model.

  • model_name (str) – Name of the saved model file.

  • data_info (DataInfo object) – Object that contains some useful information.

  • manual (bool, default: True) – Whether to load model variables using numpy. If you save the model using manual, you should also load the mode using manual.

Returns:

model – Loaded TF model.

Return type:

type(cls)

See also

save

predict(user, item, feats=None, cold_start='average', inner_id=False)#

Make prediction(s) on given user(s) and item(s).

Parameters:
  • user (int or str or array_like) – User id or batch of user ids.

  • item (int or str or array_like) – Item id or batch of item ids.

  • feats (dict or pandas.Series or None, default: None) – Extra features used in prediction.

  • cold_start ({'popular', 'average'}, default: 'average') –

    Cold start strategy.

    • ’popular’ will sample from popular items.

    • ’average’ will use the average of all the user/item embeddings as the representation of the cold-start user/item.

  • inner_id (bool, default: False) – Whether to use inner_id defined in libreco. For library users inner_id may never be used.

Returns:

prediction – Predicted scores for each user-item pair.

Return type:

float or numpy.ndarray

rebuild_model(path, model_name, full_assign=True)#

Assign the saved model variables to the newly initialized model.

This method is used before retraining the new model, in order to avoid training from scratch every time we get some new data.

Parameters:
  • path (str) – File folder path for the saved model variables.

  • model_name (str) – Name of the saved model file.

  • full_assign (bool, default: True) – Whether to also restore the variables of Adam optimizer.

recommend_user(user, n_rec, user_feats=None, item_data=None, cold_start='average', inner_id=False, filter_consumed=True, random_rec=False)#

Recommend a list of items for given user(s).

Parameters:
  • user (int or str or array_like) – User id or batch of user ids to recommend.

  • n_rec (int) – Number of recommendations to return.

  • user_feats (dict or pandas.Series or None, default: None) – Extra user features for recommendation.

  • item_data (pandas.DataFrame or None, default: None) – Extra item features for recommendation.

  • cold_start ({'popular', 'average'}, default: 'average') –

    Cold start strategy.

    • ’popular’ will sample from popular items.

    • ’average’ will use the average of all the user/item embeddings as the representation of the cold-start user/item.

  • inner_id (bool, default: False) – Whether to use inner_id defined in libreco. For library users inner_id may never be used.

  • filter_consumed (bool, default: True) – Whether to filter out items that a user has previously consumed.

  • random_rec (bool, default: False) – Whether to choose items for recommendation based on their prediction scores.

Returns:

recommendation – Recommendation result with user ids as keys and array_like recommended items as values.

Return type:

dict of {Union[int, str, array_like] : numpy.ndarray}

save(path, model_name, manual=True, inference_only=False)#

Save TF model for inference or retraining.

Parameters:
  • path (str) – File folder path to save model.

  • model_name (str) – Name of the saved model file.

  • manual (bool, default: True) – Whether to save model variables using numpy.

  • inference_only (bool, default: False) – Whether to save model variables only for inference.

See also

load