basetenlabs/python_backend
forked from triton-inference-server/python_backend
Captured source
source ↗basetenlabs/python_backend
Description: Triton backend that enables pre-process, post-processing and other logic to be implemented in Python.
License: BSD-3-Clause
Stars: 0
Forks: 0
Open issues: 0
Created: 2024-01-09T17:51:19Z
Pushed: 2024-01-11T13:59:19Z
Default branch: main
Fork: yes
Parent repository: triton-inference-server/python_backend
Archived: no
README:
Python Backend
The Triton backend for Python. The goal of Python backend is to let you serve models written in Python by Triton Inference Server without having to write any C++ code.
User Documentation
- [Python Backend](#python-backend)
- [User Documentation](#user-documentation)
- [Quick Start](#quick-start)
- [Building from Source](#building-from-source)
- [Usage](#usage)
- [
auto_complete_config](#auto_complete_config) - [
initialize](#initialize) - [
execute](#execute) - [Default Mode](#default-mode)
- [Error Handling](#error-handling)
- [Request Cancellation Handling](#request-cancellation-handling)
- [Decoupled mode](#decoupled-mode)
- [Use Cases](#use-cases)
- [Known Issues](#known-issues)
- [Request Rescheduling](#request-rescheduling)
- [
finalize](#finalize) - [Model Config File](#model-config-file)
- [Inference Request Parameters](#inference-request-parameters)
- [Managing Python Runtime and Libraries](#managing-python-runtime-and-libraries)
- [Building Custom Python Backend Stub](#building-custom-python-backend-stub)
- [Creating Custom Execution Environments](#creating-custom-execution-environments)
- [Important Notes](#important-notes)
- [Error Handling](#error-handling-1)
- [Managing Shared Memory](#managing-shared-memory)
- [Multiple Model Instance Support](#multiple-model-instance-support)
- [Running Multiple Instances of Triton Server](#running-multiple-instances-of-triton-server)
- [Business Logic Scripting](#business-logic-scripting)
- [Using BLS with Decoupled Models](#using-bls-with-decoupled-models)
- [Model Loading API](#model-loading-api)
- [Using BLS with Stateful Models](#using-bls-with-stateful-models)
- [Limitation](#limitation)
- [Interoperability and GPU Support](#interoperability-and-gpu-support)
- [
pb_utils.Tensor.to_dlpack() -> PyCapsule](#pb_utilstensorto_dlpack---pycapsule) - [
pb_utils.Tensor.from_dlpack() -> Tensor](#pb_utilstensorfrom_dlpack---tensor) - [
pb_utils.Tensor.is_cpu() -> bool](#pb_utilstensoris_cpu---bool) - [Input Tensor Device Placement](#input-tensor-device-placement)
- [Frameworks](#frameworks)
- [PyTorch](#pytorch)
- [PyTorch Determinism](#pytorch-determinism)
- [TensorFlow](#tensorflow)
- [TensorFlow Determinism](#tensorflow-determinism)
- [Custom Metrics](#custom-metrics)
- [Examples](#examples)
- [AddSub in NumPy](#addsub-in-numpy)
- [AddSubNet in PyTorch](#addsubnet-in-pytorch)
- [AddSub in JAX](#addsub-in-jax)
- [Business Logic Scripting](#business-logic-scripting-1)
- [Preprocessing](#preprocessing)
- [Decoupled Models](#decoupled-models)
- [Model Instance Kind](#model-instance-kind)
- [Auto-complete config](#auto-complete-config)
- [Custom Metrics](#custom-metrics-1)
- [Running with Inferentia](#running-with-inferentia)
- [Logging](#logging)
- [Reporting problems, asking questions](#reporting-problems-asking-questions)
Quick Start
1. Run the Triton Inference Server container.
docker run --shm-size=1g --ulimit memlock=-1 -p 8000:8000 -p 8001:8001 -p 8002:8002 --ulimit stack=67108864 -ti nvcr.io/nvidia/tritonserver:-py3
Replace \ with the Triton version (e.g. 21.05).
2. Inside the container, clone the Python backend repository.
git clone https://github.com/triton-inference-server/python_backend -b r
3. Install example model.
cd python_backend mkdir -p models/add_sub/1/ cp examples/add_sub/model.py models/add_sub/1/model.py cp examples/add_sub/config.pbtxt models/add_sub/config.pbtxt
4. Start the Triton server.
tritonserver --model-repository `pwd`/models
5. In the host machine, start the client container.
docker run -ti --net host nvcr.io/nvidia/tritonserver:-py3-sdk /bin/bash
6. In the client container, clone the Python backend repository.
git clone https://github.com/triton-inference-server/python_backend -b r
7. Run the example client.
python3 python_backend/examples/add_sub/client.py
Building from Source
1. Requirements
- cmake >= 3.17
- numpy
- rapidjson-dev
- libarchive-dev
- zlib1g-dev
pip3 install numpy
On Ubuntu or Debian you can use the command below to install rapidjson, libarchive, and zlib:
sudo apt-get install rapidjson-dev libarchive-dev zlib1g-dev
2. Build Python backend. Replace \ with the GitHub branch that you want to compile. For release branches it should be r\ (e.g. r21.06).
mkdir build cd build cmake -DTRITON_ENABLE_GPU=ON -DTRITON_BACKEND_REPO_TAG= -DTRITON_COMMON_REPO_TAG= -DTRITON_CORE_REPO_TAG= -DCMAKE_INSTALL_PREFIX:PATH=`pwd`/install .. make install
The following required Triton repositories will be pulled and used in the build. If the CMake variables below are not specified, "main" branch of those repositories will be used. \ should be the same as the Python backend repository branch that you are trying to compile.
- triton-inference-server/backend:
-DTRITON_BACKEND_REPO_TAG= - triton-inference-server/common:
-DTRITON_COMMON_REPO_TAG= - triton-inference-server/core:
-DTRITON_CORE_REPO_TAG=
Set -DCMAKE_INSTALL_PREFIX to the location where the Triton Server is installed. In the released containers, this location is /opt/tritonserver.
3. Copy example model and configuration
mkdir -p models/add_sub/1/ cp examples/add_sub/model.py models/add_sub/1/model.py cp examples/add_sub/config.pbtxt models/add_sub/config.pbtxt
4. Start the Triton Server
/opt/tritonserver/bin/tritonserver --model-repository=`pwd`/models
5. Use the client app to perform inference
python3 examples/add_sub/client.py
Usage
In order to use the Python backend, you need to create a Python file that has a structure similar to below:
import triton_python_backend_utils as pb_utils class TritonPythonModel: """Your Python model must use the same class name. Every Python model that is created must have "TritonPythonModel" as the class name. """ @staticmethod def auto_complete_config(auto_complete_model_config): """`auto_complete_config` is called only once when loading the model assuming the server was not started with `--disable-auto-complete-config`. Implementing this…
Excerpt shown — open the source for the full document.