OpenVINO™ Model Server is a scalable, high-performance solution for serving machine learning models optimized for Intel® architectures. The server provides an inference service via gRPC(Remote Procedure Calls) endpoint or REST API -- making it easy to deploy new algorithms and AI experiments using the same architecture as TensorFlow Serving for any models trained in a framework that is supported by OpenVINO.
The server is implemented as a python service using the gRPC interface library or falcon REST API framework with data serialization and deserialization using TensorFlow, and OpenVINO™ as the inference execution provider. Model repositories may reside on a locally accessible file system (e.g. NFS), Google Cloud Storage (GCS), Amazon S3 or MinIO.
Review the Architecture concept document for more details.
A few key features:
Start using OpenVINO Model Server in 5 Minutes or less:
# Download the latest Model Server image docker pull openvino/ubuntu18_model_server:latest # Download model into a separate directory curl --create-dirs https://download.01.org/opencv/2020/openvinotoolkit/2020.2/open_model_zoo/models_bin/3/face-detection-retail-0004/FP32/face-detection-retail-0004.xml https://download.01.org/opencv/2020/openvinotoolkit/2020.2/open_model_zoo/models_bin/3/face-detection-retail-0004/FP32/face-detection-retail-0004.bin -o model/face-detection-retail-0004.xml -o model/face-detection-retail-0004.bin # Start the container serving gRPC on port 9000 docker run -d -v $(pwd)/model:/models/face-detection/1 -e LOG_LEVEL=DEBUG -p 9000:9000 openvino/ubuntu18_model_server /ie-serving-py/start_server.sh ie_serving model --model_path /models/face-detection --model_name face-detection --port 9000 --shape auto # Download the example client script curl https://raw.githubusercontent.com/openvinotoolkit/model_server/master/example_client/client_utils.py -o client_utils.py https://raw.githubusercontent.com/openvinotoolkit/model_server/master/example_client/face_detection.py -o face_detection.py https://raw.githubusercontent.com/openvinotoolkit/model_server/master/example_client/client_requirements.txt -o client_requirements.txt # Download an image to be analyzed curl --create-dirs https://raw.githubusercontent.com/openvinotoolkit/model_server/master/example_client/images/people/people1.jpeg -o images/people1.jpeg # Install client dependencies pip install -r client_requirements.txt # Create a folder for results mkdir results # Run inference and store results in the newly created folder python face_detection.py --batch_size 1 --width 600 --height 400 --input_images_dir images --output_dir results
A more detailed description of the steps above can be found here.
More complete guides to using Model Server in various scenarios can be found here:
Using FPGA (TBD)
OpenVINO™ Model Server gRPC API is documented in the proto buffer files in tensorflow_serving_api. Note: The implementations for Predict, GetModelMetadata and GetModelStatus function calls are currently available. These are the most generic function calls and should address most of the usage scenarios.
predict function spec has two message definitions: PredictRequest and PredictResponse.
get_model_metadata function spec has three message definitions: SignatureDefMap, GetModelMetadataRequest, GetModelMetadataResponse. A function call GetModelMetadata accepts model spec information as input and returns Signature Definition content in the format similar to TensorFlow Serving.
get model status function spec can be used to report all exposed versions including their state in their lifecycle.
Refer to the example client code to learn how to use this API and submit the requests using the gRPC interface.
Using the gRPC interface is recommended for optimal performance due to its faster implementation of input data deserialization. gRPC achieves lower latency, especially with larger input messages like images.
OpenVINO™ Model Server RESTful API follows the documentation from tensorflow serving rest api.
Both row and column format of the requests are implemented. Note: Just like with gRPC, only the implementations for Predict, GetModelMetadata and GetModelStatus function calls are currently available.
Only the numerical data types are supported.
Review the exemplary clients below to find out more how to connect and run inference requests.
REST API is recommended when the primary goal is in reducing the number of client side python dependencies and simpler application code.
OpenVINO™ model server accepts 3 logging levels:
The default setting is INFO, which can be altered by setting environment variable
The captured logs will be displayed on the model server console. While using docker containers or Kubernetes the logs
can be examined using
docker logs or
kubectl logs commands respectively.
It is also possible to save the logs to a local file system by configuring an environment variable
LOG_PATH with the absolute path pointing to a log file.
Please see the example below for usage details.
docker run --name ie-serving --rm -d -v /models/:/opt/ml:ro -p 9001:9001 --env LOG_LEVEL=DEBUG --env LOG_PATH=/var/log/ie_serving.log \ ie-serving-py:latest /ie-serving-py/start_server.sh ie_serving config --config_path /opt/ml/config.json --port 9001 docker logs ie-serving
OpenVINO™ Model Server loads all defined models versions according to set version policy. A model version is represented by a numerical directory in a model path, containing OpenVINO model files with .bin and .xml extensions.
Below are examples of incorrect structure:
models/ ├── model1 │ ├── 1 │ │ ├── ir_model.bin │ │ └── ir_model.xml │ └── 2 │ ├── somefile.bin │ └── anotherfile.txt └── model2 ├── ir_model.bin ├── ir_model.xml └── mapping_config.json
In the above scenario, the server will detect only version
2 does not contain valid OpenVINO model files, so it won't
be detected as a valid model version.
model2, there are correct files, but they are not in a numerical directory.
The server will not detect any version in
When a new model version is detected, the server loads the model files and starts serving new model version. This operation might fail for the following reasons:
In all those situations, the root cause is reported in the server logs or in the response from a call to GetModelStatus function.
Detected but not loaded model version will not be served and will report status
LOADING with error message:
Error occurred while loading version.
When model files become accessible or fixed, server will try to
load them again on the next version update
At startup, the server will enable gRPC and REST API endpoint, after all, configured models and detected model versions are loaded successfully (in the AVAILABLE state).
The server will fail to start if it can not list the content of configured model paths.
When the model server starts successfully and all the models are imported, there could be a couple of reasons for errors in the request handling. The information about the failure reason is passed to the gRPC client in the response. It is also logged on the model server in the DEBUG mode.
The possible issues could be:
RAM consumption might depend on the size and volume of the models configured for serving. It should be measured experimentally, however, it can be estimated that each model will consume RAM size equal to the size of the model weights file (.bin file). Every version of the model creates a separate inference engine object, so it is recommended to mount only the desired model versions.
OpenVINO™ model server consumes all available CPU resources unless they are restricted by the operating system, docker or Kubernetes capabilities.
It is possible to track the usage of the models including processing time while DEBUG mode is enabled. With this setting model server logs will store information about all the incoming requests. You can parse the logs to analyze: volume of requests, processing statistics and most used models.
Model server employs configurable serialization function.
The default implementation starting from 2020.1 version is _prepare_output_with_make_tensor_proto. It employs TensorFlow function make_tensor_proto. For most of the models it returns TensorProto response with inference results serialized to string via a numpy.toString call. This method achieves low latency, especially for models with big size of the output.
Prior 2020.1 version, serialization was using function _prepare_output_as_AppendArrayToTensorProto. Contrary to make_tensor_proto, it returns the inference results as TensorProto object containing a list of numerical elements.
In both cases, the results can be deserialized on the client side with make_ndarray.
If you're using tensorflow's
make_ndarray to read output
in your client application, then the transition between those methods is transparent.
Add environment variable
SERIALIZATION_FUNCTION=_prepare_output_as_AppendArrayToTensorProto to enforce the usage
of legacy serialization method.
All contributed code must be compatible with the Apache 2 license.
All changes needs to have passed style, unit and functional tests.
All new features need to be covered by tests.
Docker image with OpenVINO Model Server can be built with several options:
make docker_build_bin dldt_package_url=<url>- using Intel Distribution of OpenVINO binary package (ubuntu base image)
make docker_build_apt_ubuntu- using OpenVINO apt packages with ubuntu base image
make docker_build_ov_base- using public image of OpenVINO runtime base image
make docker_build_clearlinux- using clearlinux base image with DLDT package
Note: Images based on ubuntu include OpenVINO 2020.1.
In clearlinux based image, it is 2019.3 - to be upgraded later soon.
Running the tests requires python3.6 and docker installed (testing scripts are validated on ubuntu18.04).
Account running the tests must be in a
make style to run linter tests
make unit to execute unit tests (it requires OpenVINO installation followed by
Alternatively unit tests can be executed in a container by running the script
make test to execute full set of functional tests (it requires building the docker image in advance).
Submit Github issue to ask question, request a feature or report a bug.
* Other names and brands may be claimed as the property of others.