DataInfo#

class libreco.data.DataInfo(col_name_mapping=None, interaction_data=None, user_sparse_unique=None, user_dense_unique=None, item_sparse_unique=None, item_dense_unique=None, user_consumed=None, item_consumed=None, user_unique_vals=None, item_unique_vals=None, sparse_unique_vals=None, sparse_offset=None, sparse_oov=None, multi_sparse_unique_vals=None, multi_sparse_combine_info=None, seed=42)[source]#

Object for storing and updating information of indices and features.

Parameters:
  • col_name_mapping (dict of {dict : int} or None, default: None) – Column name to index mapping, which has the format: {column_family_name: {column_name: index}}. If no such family, the default format would be: {column_family_name: {[]: []}}

  • interaction_data (pandas.DataFrame or None, default: None) – Data contains user, item and label columns

  • user_sparse_unique (numpy.ndarray or None, default: None) – Unique sparse features for all users in train data.

  • user_dense_unique (numpy.ndarray or None, default: None) – Unique dense features for all users in train data.

  • item_sparse_unique (numpy.ndarray or None, default: None) – Unique sparse features for all items in train data.

  • item_dense_unique (numpy.ndarray or None, default: None) – Unique dense features for all items in train data.

  • user_consumed (dict of {int : list} or None, default: None) – All consumed items by each user.

  • item_consumed (dict of {int : list} or None, default: None) – All consumed users by each item.

  • user_unique_vals (numpy.ndarray or None, default: None) – All the unique users in train data.

  • item_unique_vals (numpy.ndarray or None, default: None) – All the unique items in train data.

  • sparse_unique_vals (dict of {str : numpy.ndarray} or None, default: None) – All sparse features’ unique values.

  • sparse_offset (numpy.ndarray or None, default: None) – Offset for each sparse feature in all sparse values. Often used in the embedding layer.

  • sparse_oov (numpy.ndarray or None, default: None) – Out-of-vocabulary place for each sparse feature. Often used in cold-start.

  • multi_sparse_unique_vals (dict of {str : numpy.ndarray} or None, default: None) – All multi-sparse features’ unique values.

  • multi_sparse_combine_info (MultiSparseInfo or None, default: None) – Multi-sparse field information.

  • seed (int, default: 42) – Random seed.

Variables:
  • col_name_mapping (dict of {dict : int} or None) – See Parameters

  • user_consumed (dict of {int, list}) – Every users’ consumed items in train data.

  • item_consumed (dict of {int, list}) – Every items’ consumed users in train data.

See also

MultiSparseInfo

property global_mean#

Mean value of all labels in rating task.

property min_max_rating#

Min and max value of all labels in rating task.

property sparse_col#

Sparse column name to index mapping.

property dense_col#

Dense column name to index mapping.

property user_sparse_col#

User sparse column name to index mapping.

property user_dense_col#

User dense column name to index mapping.

property item_sparse_col#

Item sparse column name to index mapping.

property item_dense_col#

Item dense column name to index mapping.

property user_col#

All the user column names, including sparse and dense.

property item_col#

All the item column names, including sparse and dense.

property n_users#

Number of users in train data.

property n_items#

Number of items in train data.

property user2id#

User original id to inner id mapping.

property item2id#

Item original id to inner id mapping.

property id2user#

User inner id to original id mapping.

property id2item#

User inner id to original id mapping.

property data_size#

Train data size.

__repr__()[source]#

Output train data information: "n_users, n_items, data density".

assign_user_features(user_data)[source]#

Assign user features to this data_info object from user_data.

Parameters:

user_data (pandas.DataFrame) – Data contains new user features.

assign_item_features(item_data)[source]#

Assign item features to this data_info object from item_data.

Parameters:

item_data (pandas.DataFrame) – Data contains new item features.

property popular_items#

A number of popular items in train data which often used in cold-start.

save(path, model_name)[source]#

Save DataInfo Object.

Parameters:
  • path (str) – File folder path to save DataInfo.

  • model_name (str) – Name of the saved file.

classmethod load(path, model_name)[source]#

Load saved DataInfo.

Parameters:
  • path (str) – File folder path to save DataInfo.

  • model_name (str) – Name of the saved file.

class libreco.data.MultiSparseInfo(field_offset: Iterable[int], field_len: Iterable[int], feat_oov: ndarray, pad_val: Dict[str, Any])[source]#

dataclasses for storing multi-sparse features information.

A group of multi-sparse features are considered as a “field”, e.g., (“genre1”, “genre2”, “genre3”) form a “genre” field, and features belong to the same field share the same oov.

Variables:
  • field_offset (list of int) – All multi-sparse fields’ offset in all expanded sparse features.

  • field_len (list of int) – All multi-sparse fields’ sizes.

  • feat_oov (numpy.ndarray) – All multi-sparse fields’ oov.

  • pad_val (dict of {str : Any}) – Padding value in multi-sparse columns.