Python Serving Guide#
Introduction#
This guide mainly describes how to serve a trained model using the libserving module
in LibRecommender. Prior to LibRecommender version 0.10.0
, Flask
was used to construct the serving web server. However, to take full advantage of the asynchronous
feature and modern async/await
syntax in Python, we’ve decided to switch to Sanic framework.
Unlike flask, a sanic server can run in production directly, and the typical command is like this:
$ sanic server:app --host=127.0.0.1 --port=8000 --dev --access-logs -v --workers 1 # develop mode
$ sanic server:app --no-access-logs --workers 10 # production mode
Refer to Running Sanic for more details.
Rust
Beyond Python, one can also use Rust to serve a model. See Rust Serving Guide.
From model serving’s perspective, currently there are three kinds of models in LibRecommender:
knn-based model
embed-based model
tensorflow-based model.
As for models trained with PyTorch, they all belong to embed-based model.
The following is the main serving workflow:
Serialize the trained model to disk.
Load model and save to redis.
Run the sanic server.
Make http request to the server and gain recommendation.
Here we choose NOT to save the trained model directly to redis, since:
1) Even you save the model to redis in the first place, you’ll end up with saving to disk anyway :)
2) We try to keep the requirements of the main libreco
module as minimal as possible.
So during serving, one should start redis server first:
$ redis-server
Error
Note that sometimes using redis in model serving can be error-prone:
For example, you served a DeepFM
model at first, and later on you decided to use
another pure
model, say NCF
. Since DeepFM
is a feat
model,
some feature information may have been saved into redis. If you forget to remove
these feature information before using NCF
, the server may mistakenly load it
and eventually causing an error.
Attention
In this guide we assume the following codes are all executed in LibRecommender/libserving
folder,
so one needs to clone the repository first:
$ git clone https://github.com/massquantity/LibRecommender.git
$ cd LibRecommender/libserving
Note about Dependencies#
The serving related dependencies are listed in main README.
redis-py introduced async support since 4.2.0.
Pydantic has introduced breaking changes in V2. Consider upgrading to
pydantic >= 2.0
if you encounter validation errors.According to the official doc, faiss can’t be installed from pip directly. But someone has built wheel packages, refer to faiss-wheels. So now the pip option is available for faiss.
We use TensorFlow Serving to serve tensorflow-based models, and typically it is installed through Docker. However, The latest TensorFlow Serving might not work in some cases, and we have encountered similar error described in this issue on TensorFlow Serving 2.9 and 2.10. So for now the workable version is 2.8.2, and one should pull docker image like this:
$ sudo docker pull tensorflow/serving:2.8.2
Saving Format#
In libserving
, the primary data serialization format is JSON.
Aside from JSON, models built upon TensorFlow are saved using its own
tf.saved_model API.
The SavedModel
format provides a language-neutral format to save machine-learning models.
KNN-based Model#
KNN-based models refer to the classic UserCF
and ItemCF
algorithms, which leverage
a similarity matrix to find similar users/items for recommendation.
Due to the large number of users/items, it is often impractical to store the whole
similarity matrix, so here we may only save the most similar k neighbors for each user/item.
Below is an example usage which saves 10 neighbors per item using ItermCF.
One should also specify model-saving path
:
>>> from libreco.algorithms import ItemCF
>>> from libreco.data import DatasetPure
>>> from libserving.serialization import knn2redis, save_knn
>>> train_data, data_info = DatasetPure.build_trainset(...)
>>> model = ItemCF(...)
>>> model.fit(...) # train model
>>> path = "knn_model" # specify model saving directory
>>> save_knn(path, model, k=10) # save model in json format
>>> knn2redis(path, host="localhost", port=6379, db=0) # load json from path and save model to redis
$ sanic sanic_serving.knn_deploy:app --dev --access-logs -v --workers 1 # run sanic server
# make requests
$ python request.py --user 1 --n_rec 10 --algo knn
$ curl -d '{"user": 1, "n_rec": 10}' -X POST http://127.0.0.1:8000/knn/recommend
# {'Recommend result for user 1': ['480', '589', '2571', '260', '2028', '1198', '1387', '1214', '1291', '1197']}
Embed-based Model#
Embed-based models perform similarity searching on embeddings to make recommendation,
so we only need to save a bunch of embeddings. This kind of model includes
SVD
, SVD++
, ALS
, BPR
, YouTubeRetrieval
, Item2Vec
, DeepWalk
,
RNN4Rec
, Caser
, WaveNet
, NGCF
, LightGCN
, GraphSage
, PinSage
,
TwoTower
.
In practice, to speed up serving, some ANN(Approximate Nearest Neighbors) libraries
are often used to find similar embeddings. Here in libserving
, we use
faiss to do such thing.
Below is an example usage which uses ALS
. One should also specify model-saving path
:
>>> from libreco.algorithms import ALS
>>> from libreco.data import DatasetPure
>>> from libserving.serialization import embed2redis, save_embed
>>> train_data, data_info = DatasetPure.build_trainset(...)
>>> model = ALS(...)
>>> model.fit(...) # train model
>>> path = "embed_model" # specify model saving directory
>>> save_embed(path, model) # save model in json format
>>> embed2redis(path, host="localhost", port=6379, db=0) # load json from path and save model to redis
The following code will train faiss index on model’s item embeddings and save to disk as file name
faiss_index.bin
. The saved index will be loaded in sanic server.
>>> from libserving.serialization import save_faiss_index
>>> save_faiss_index(path, model)
$ sanic sanic_serving.embed_deploy:app --dev --access-logs -v --workers 1 # run sanic server
# make requests
$ python request.py --user 1 --n_rec 10 --algo embed
$ curl -d '{"user": 1, "n_rec": 10}' -X POST http://127.0.0.1:8000/embed/recommend
# {'Recommend result for user 1': ['593', '1270', '318', '2858', '1196', '2571', '1617', '260', '1200', '457']}
TensorFlow-based Model#
As stated above, tensorflow-based model will typically be saved in SavedModel
format.
These model mainly contains neural networks, including NCF
, WideDeep
, FM
,
DeepFM
, YouTubeRanking
, AutoInt
, DIN
.
We assume TensorFlow Serving has already been installed through Docker. After successfully starting the docker container, we can post request to the serving model inside the sanic server and get the recommendation.
Below is an example usage which uses DIN
. One should also specify model-saving path
:
>>> from libreco.algorithms import DIN
>>> from libreco.data import DatasetFeat
>>> from libserving.serialization import save_tf, tf2redis
>>> train_data, data_info = DatasetFeat.build_trainset(...)
>>> model = DIN(...)
>>> model.fit(...) # train model
>>> path = "tf_model" # specify model saving directory
>>> save_tf(path, model, version=1) # save model in json format
>>> tf2redis(path, host="localhost", port=6379, db=0) # load json from path and save model to redis
The directory of SavedModel
format for a DIN
model has the following structure and note
that 1 is the version number:
din/
1/
variables/
variables.data-?????-of-?????
variables.index
saved_model.pb
We can inspect the saved DIN
model by using SavedModel CLI
described in
official doc.
By default, it is bundled with TensorFlow. The following command will output:
$ saved_model_cli show --dir tf_model/din/1 --all
MetaGraphDef with tag-set: 'serve' contains the following SignatureDefs:
signature_def['predict']:
The given SavedModel SignatureDef contains the following input(s):
inputs['dense_values'] tensor_info:
dtype: DT_FLOAT
shape: (-1, 1)
name: Placeholder_6:0
inputs['item_indices'] tensor_info:
dtype: DT_INT32
shape: (-1)
name: Placeholder_1:0
inputs['sparse_indices'] tensor_info:
dtype: DT_INT32
shape: (-1, 5)
name: Placeholder_5:0
inputs['user_indices'] tensor_info:
dtype: DT_INT32
shape: (-1)
name: Placeholder:0
inputs['user_interacted_len'] tensor_info:
dtype: DT_FLOAT
shape: (-1)
name: Placeholder_3:0
inputs['user_interacted_seq'] tensor_info:
dtype: DT_INT32
shape: (-1, 10)
name: Placeholder_2:0
The given SavedModel SignatureDef contains the following output(s):
outputs['logits'] tensor_info:
dtype: DT_FLOAT
shape: (-1)
name: Reshape_4:0
Method name is: tensorflow/serving/predict
The above result shows this DIN
model needs 6 inputs, i.e. user_indices
, item_indices
,
sparse_indices
, dense_values
, user_interacted_seq
, user_interacted_len
.
But this only applies to DIN
and other models may have different inputs.
For
NCF
model, onlyuser_indices
anditem_indices
are needed since it’s a collaborative-filtering algorithm.For
WideDeep
,FM
,DeepFM
,AutoInt
, since they don’t use behavior sequence information, 4 inputs are needed:user_indices
,item_indices
,sparse_indices
,dense_values
.Finally,
YouTubeRanking
has same inputs asDIN
. They both use behavior sequence information.
However, these are just general cases. Suppose your data doesn’t have any sparse feature,
then it would be a mistake to feed the sparse_indices
input, so these matters should
be taken into account. This is exactly where a library fits in, and LibRecommender can
dynamically handle these different feature situations. So as a library user, all you
need to do is specifying the correct model path.
Using SavedModel CLI
, we can even pass some inputs to the model and get outputs
(note the inputs num should match the model requirement):
$ inputs="user_indices=np.int32([2,3]);item_indices=np.int32([2,3]);sparse_indices=np.int32([[1,1,1,1,1],[1,1,1,1,1]]);dense_values=np.float32([[1],[2]]);user_interacted_len=np.float32([2,3]);user_interacted_seq=np.int32([[1,2,3,4,5,6,7,8,9,10],[1,2,3,4,5,6,7,8,9,10]])"
$ saved_model_cli run --dir tf_model/din/1 --tag_set serve --signature_def predict --input_exprs $inputs
Result for output key logits:
[-0.51893234 -0.569685 ]
Now let’s start TensorFlow Serving service through docker. Note that the MODEL_NAME
should be
lowercase of the model class name. For instance, DIN
-> din
, YouTubeRanking
-> youtuberanking
, WideDeep
-> widedeep
.
$ MODEL_NAME=din
$ MODEL_PATH=tf_model
$ sudo docker run --rm -t -p 8501:8501 --mount type=bind,source=$(pwd),target=$(pwd) -e MODEL_BASE_PATH=$(pwd)/${MODEL_PATH} -e MODEL_NAME=${MODEL_NAME} tensorflow/serving:2.8.2
Get model status from TensorFlow Serving service using RESTful API:
$ curl http://localhost:8501/v1/models/din
{
"model_version_status": [
{
"version": "1",
"state": "AVAILABLE",
"status": {
"error_code": "OK",
"error_message": ""
}
}
]
}
Make predictions for two samples from TensorFlow Serving service:
$ curl -d '{"signature_name": "predict", "inputs": {"user_indices": [1, 2], "item_indices": [3208, 2], "sparse_indices": [[1, 19, 32, 59, 71], [1, 19, 32, 59, 71]], "dense_values": [22.0, 56.0], "user_interacted_seq": [[996, 1764, 2083, 520, 2759, 334, 304, 1110, 2013, 1415],[996, 1764, 2083, 520, 2759, 334, 304, 1110, 2013, 1415]], "user_interacted_len": [3, 10]}}' -X POST http://localhost:8501/v1/models/din:predict
{
"outputs": [
-0.65978992,
-0.759211063
]
}
Now we can start the corresponding sanic server. According to the official doc, the input tensors can use either row format or column format. In tf_deploy.py we use column format since it’s more compact.
$ sanic sanic_serving.tf_deploy:app --dev --access-logs -v --workers 1 # run sanic server
# make requests
$ python request.py --user 1 --n_rec 10 --algo tf
$ curl -d '{"user": 1, "n_rec": 10}' -X POST http://127.0.0.1:8000/tf/recommend
# {'Recommend result for user 1': ['1196', '480', '260', '2028', '1198', '1214', '780', '1387', '1291', '1197']}