LibRecommender#

Build status Documentation Status CI status Codecov status Pypi version Downloads Codacy Code style: black Ruff License

LibRecommender is an easy-to-use recommender system focused on end-to-end recommendation process. It contains a training(libreco) and serving(libserving) module to let users quickly train and deploy different kinds of recommendation models.

The main features are:

  • Implements a number of popular recommendation algorithms such as FM, DIN, LightGCN etc. See full algorithm list.

  • A hybrid recommender system, which allows users to use either collaborative-filtering or content-based features. New features can be added on the fly.

  • Low memory usage, automatically convert categorical and multi-value categorical features to sparse representation.

  • Supports training for both explicit and implicit datasets, as well as negative sampling on implicit data.

  • Provides end-to-end workflow, i.e. data handling / preprocessing -> model training -> evaluate -> save/load -> serving.

  • Supports cold-start prediction and recommendation.

  • Supports dynamic feature and sequence recommendation.

  • Provides unified and friendly API for all algorithms.

  • Easy to retrain model with new users/items from new data.

Quick Start#

The two tabs below demonstrate the process of train, evaluate, predict, recommend and cold-start.

  1. Pure example(collaborative filtering), which uses LightGCN model.

  2. Feat example(use features), which uses YouTubeRanking model.

    train_data, eval_data, test_data = random_split(data, multi_ratios=[0.8, 0.1, 0.1])

    train_data, data_info = DatasetPure.build_trainset(train_data)
    eval_data = DatasetPure.build_evalset(eval_data)
    test_data = DatasetPure.build_testset(test_data)
    print(data_info)  # n_users: 5894, n_items: 3253, data sparsity: 0.4172 %

    lightgcn = LightGCN(
        task="ranking",
        data_info=data_info,
        loss_type="bpr",
        embed_size=16,
        n_epochs=3,
        lr=1e-3,
        batch_size=2048,
        num_neg=1,
        device="cuda",
    )
    # monitor metrics on eval_data during training
    lightgcn.fit(
        train_data,
        neg_sampling=True,  # sample negative items for train and eval data
        verbose=2,
        eval_data=eval_data,
        metrics=["loss", "roc_auc", "precision", "recall", "ndcg"],
    )

    # do final evaluation on test data
    print(
        "evaluate_result: ",
        evaluate(
            model=lightgcn,
            data=test_data,
            neg_sampling=True,  # sample negative items for test data
            metrics=["loss", "roc_auc", "precision", "recall", "ndcg"],
        ),
    )
    # predict preference of user 2211 to item 110
    print("prediction: ", lightgcn.predict(user=2211, item=110))
    # recommend 7 items for user 2211
    print("recommendation: ", lightgcn.recommend_user(user=2211, n_rec=7))

    # cold-start prediction
    print(
        "cold prediction: ",
        lightgcn.predict(user="ccc", item="not item", cold_start="average"),
    )
    # cold-start recommendation
    print(
        "cold recommendation: ",
        lightgcn.recommend_user(user="are we good?", n_rec=7, cold_start="popular"),
    )
    train_data, test_data = split_by_ratio_chrono(data, test_size=0.2)

    # specify complete columns information
    sparse_col = ["sex", "occupation", "genre1", "genre2", "genre3"]
    dense_col = ["age"]
    user_col = ["sex", "age", "occupation"]
    item_col = ["genre1", "genre2", "genre3"]

    train_data, data_info = DatasetFeat.build_trainset(
        train_data, user_col, item_col, sparse_col, dense_col
    )
    test_data = DatasetFeat.build_testset(test_data)
    print(data_info)  # n_users: 5953, n_items: 3209, data density: 0.4213 %

    ytb_ranking = YouTubeRanking(
        task="ranking",
        data_info=data_info,
        embed_size=16,
        n_epochs=3,
        lr=1e-4,
        batch_size=512,
        use_bn=True,
        hidden_units=(128, 64, 32),
    )
    ytb_ranking.fit(
        train_data,
        neg_sampling=True,  # sample negative items train and eval data
        verbose=2,
        shuffle=True,
        eval_data=test_data,
        metrics=["loss", "roc_auc", "precision", "recall", "map", "ndcg"],
    )

    # predict preference of user 2211 to item 110
    print("prediction: ", ytb_ranking.predict(user=2211, item=110))
    # recommend 7 items for user 2211
    print("recommendation: ", ytb_ranking.recommend_user(user=2211, n_rec=7))

    # cold-start prediction
    print(
        "cold prediction: ",
        ytb_ranking.predict(user="ccc", item="not item", cold_start="average"),
    )
    # cold-start recommendation
    print(
        "cold recommendation: ",
        ytb_ranking.recommend_user(user="are we good?", n_rec=7, cold_start="popular"),
    )

Indices and tables#