DataInfo#
- class libreco.data.DataInfo(col_name_mapping=None, interaction_data=None, user_sparse_unique=None, user_dense_unique=None, item_sparse_unique=None, item_dense_unique=None, user_indices=None, item_indices=None, user_unique_vals=None, item_unique_vals=None, sparse_unique_vals=None, sparse_offset=None, sparse_oov=None, multi_sparse_unique_vals=None, multi_sparse_combine_info=None)[source]#
Object for storing and updating indices and features information.
- Parameters:
col_name_mapping (dict of {dict : int} or None, default: None) – Column name to index mapping, which has the format:
{column_family_name: {column_name: index}}
. If no such family, the default format would be: {column_family_name: {[]: []}interaction_data (pandas.DataFrame or None, default: None) – Data contains
user
,item
andlabel
columnsuser_sparse_unique (numpy.ndarray or None, default: None) – Unique sparse features for all users in train data.
user_dense_unique (numpy.ndarray or None, default: None) – Unique dense features for all users in train data.
item_sparse_unique (numpy.ndarray or None, default: None) – Unique sparse features for all items in train data.
item_dense_unique (numpy.ndarray or None, default: None) – Unique dense features for all items in train data.
user_indices (numpy.ndarray or None, default: None) – Mapped inner user indices from train data.
item_indices (numpy.ndarray or None, default: None) – Mapped inner item indices from train data.
user_unique_vals (numpy.ndarray or None, default: None) – All the unique users in train data.
item_unique_vals (numpy.ndarray or None, default: None) – All the unique items in train data.
sparse_unique_vals (dict of {str : numpy.ndarray} or None, default: None) – All sparse features’ unique values.
sparse_offset (numpy.ndarray or None, default: None) – Offset for each sparse feature in all sparse values. Often used in the
embedding
layer.sparse_oov (numpy.ndarray or None, default: None) – Out-of-vocabulary place for each sparse feature. Often used in cold-start.
multi_sparse_unique_vals (dict of {str : numpy.ndarray} or None, default: None) – All multi-sparse features’ unique values.
multi_sparse_combine_info (MultiSparseInfo or None, default: None) – Multi-sparse field information.
- Variables:
col_name_mapping (dict of {dict : int} or None) – See Parameters
user_consumed (dict of {int, list}) – Every users’ consumed items in train data.
item_consumed (dict of {int, list}) – Every items’ consumed users in train data.
popular_items (list) – A number of popular items in train data. Often used in cold-start.
See also
- property global_mean#
Mean value of all labels in rating task.
- property min_max_rating#
Min and max value of all labels in rating task.
- property sparse_col#
Sparse column name to index mapping.
- property dense_col#
Dense column name to index mapping.
- property user_sparse_col#
User sparse column name to index mapping.
- property user_dense_col#
User dense column name to index mapping.
- property item_sparse_col#
Item sparse column name to index mapping.
- property item_dense_col#
Item dense column name to index mapping.
- property user_col#
All the user column names, including sparse and dense.
- property item_col#
All the item column names, including sparse and dense.
- property n_users#
Number of users in train data.
- property n_items#
Number of items in train data.
- property user2id#
User original id to inner id mapping.
- property item2id#
Item original id to inner id mapping.
- property id2user#
User inner id to original id mapping.
- property id2item#
User inner id to original id mapping.
- property data_size#
Train data size.
- assign_user_features(user_data)[source]#
Assign user features to this
data_info
object fromuser_data
.- Parameters:
user_data (pandas.DataFrame) – Data contains new user features.
- assign_item_features(item_data)[source]#
Assign item features to this
data_info
object fromitem_data
.- Parameters:
item_data (pandas.DataFrame) – Data contains new item features.
- class libreco.data.MultiSparseInfo(field_offset: Iterable[int], field_len: Iterable[int], feat_oov: ndarray)[source]#
dataclass
object for storing Multi-sparse features information.A group of multi-sparse features are considered a “field”. e.g. [“genre1”, “genre2”, “genre3”] form a field “genre”. So this object contains fields’ offset, field’s length and fields’ oov. Since features belong to the same field share one oov.
- Variables:
field_offset (list of int) – All multi-sparse fields’ offset in all expanded sparse features.
feat_oov (numpy.ndarray) – All multi-sparse fields’ oov.