DataInfo#
- class libreco.data.DataInfo(col_name_mapping=None, interaction_data=None, user_sparse_unique=None, user_dense_unique=None, item_sparse_unique=None, item_dense_unique=None, user_consumed=None, item_consumed=None, user_unique_vals=None, item_unique_vals=None, sparse_unique_vals=None, sparse_offset=None, sparse_oov=None, multi_sparse_unique_vals=None, multi_sparse_combine_info=None, seed=42)[source]#
Object for storing and updating information of indices and features.
- Parameters:
col_name_mapping (dict of {dict : int} or None, default: None) – Column name to index mapping, which has the format:
{column_family_name: {column_name: index}}
. If no such family, the default format would be: {column_family_name: {[]: []}}interaction_data (pandas.DataFrame or None, default: None) – Data contains
user
,item
andlabel
columnsuser_sparse_unique (numpy.ndarray or None, default: None) – Unique sparse features for all users in train data.
user_dense_unique (numpy.ndarray or None, default: None) – Unique dense features for all users in train data.
item_sparse_unique (numpy.ndarray or None, default: None) – Unique sparse features for all items in train data.
item_dense_unique (numpy.ndarray or None, default: None) – Unique dense features for all items in train data.
user_consumed (dict of {int : list} or None, default: None) – All consumed items by each user.
item_consumed (dict of {int : list} or None, default: None) – All consumed users by each item.
user_unique_vals (numpy.ndarray or None, default: None) – All the unique users in train data.
item_unique_vals (numpy.ndarray or None, default: None) – All the unique items in train data.
sparse_unique_vals (dict of {str : numpy.ndarray} or None, default: None) – All sparse features’ unique values.
sparse_offset (numpy.ndarray or None, default: None) – Offset for each sparse feature in all sparse values. Often used in the
embedding
layer.sparse_oov (numpy.ndarray or None, default: None) – Out-of-vocabulary place for each sparse feature. Often used in cold-start.
multi_sparse_unique_vals (dict of {str : numpy.ndarray} or None, default: None) – All multi-sparse features’ unique values.
multi_sparse_combine_info (MultiSparseInfo or None, default: None) – Multi-sparse field information.
seed (int, default: 42) – Random seed.
- Variables:
See also
- property global_mean#
Mean value of all labels in rating task.
- property min_max_rating#
Min and max value of all labels in rating task.
- property sparse_col#
Sparse column name to index mapping.
- property dense_col#
Dense column name to index mapping.
- property user_sparse_col#
User sparse column name to index mapping.
- property user_dense_col#
User dense column name to index mapping.
- property item_sparse_col#
Item sparse column name to index mapping.
- property item_dense_col#
Item dense column name to index mapping.
- property user_col#
All the user column names, including sparse and dense.
- property item_col#
All the item column names, including sparse and dense.
- property n_users#
Number of users in train data.
- property n_items#
Number of items in train data.
- property user2id#
User original id to inner id mapping.
- property item2id#
Item original id to inner id mapping.
- property id2user#
User inner id to original id mapping.
- property id2item#
User inner id to original id mapping.
- property data_size#
Train data size.
- assign_user_features(user_data)[source]#
Assign user features to this
data_info
object fromuser_data
.- Parameters:
user_data (pandas.DataFrame) – Data contains new user features.
- assign_item_features(item_data)[source]#
Assign item features to this
data_info
object fromitem_data
.- Parameters:
item_data (pandas.DataFrame) – Data contains new item features.
- property popular_items#
A number of popular items in train data which often used in cold-start.
- class libreco.data.MultiSparseInfo(field_offset: Iterable[int], field_len: Iterable[int], feat_oov: ndarray, pad_val: Dict[str, Any])[source]#
dataclasses for storing multi-sparse features information.
A group of multi-sparse features are considered as a “field”, e.g., (“genre1”, “genre2”, “genre3”) form a “genre” field, and features belong to the same field share the same oov.