ForkBasetenBasetenpublished Jan 9, 2024seen 5d

basetenlabs/python_backend

forked from triton-inference-server/python_backend

Open original ↗

Captured source

source ↗
published Jan 9, 2024seen 5dcaptured 8hhttp 200method plain

basetenlabs/python_backend

Description: Triton backend that enables pre-process, post-processing and other logic to be implemented in Python.

License: BSD-3-Clause

Stars: 0

Forks: 0

Open issues: 0

Created: 2024-01-09T17:51:19Z

Pushed: 2024-01-11T13:59:19Z

Default branch: main

Fork: yes

Parent repository: triton-inference-server/python_backend

Archived: no

README:

Python Backend

The Triton backend for Python. The goal of Python backend is to let you serve models written in Python by Triton Inference Server without having to write any C++ code.

User Documentation

  • [Python Backend](#python-backend)
  • [User Documentation](#user-documentation)
  • [Quick Start](#quick-start)
  • [Building from Source](#building-from-source)
  • [Usage](#usage)
  • [auto_complete_config](#auto_complete_config)
  • [initialize](#initialize)
  • [execute](#execute)
  • [Default Mode](#default-mode)
  • [Error Handling](#error-handling)
  • [Request Cancellation Handling](#request-cancellation-handling)
  • [Decoupled mode](#decoupled-mode)
  • [Use Cases](#use-cases)
  • [Known Issues](#known-issues)
  • [Request Rescheduling](#request-rescheduling)
  • [finalize](#finalize)
  • [Model Config File](#model-config-file)
  • [Inference Request Parameters](#inference-request-parameters)
  • [Managing Python Runtime and Libraries](#managing-python-runtime-and-libraries)
  • [Building Custom Python Backend Stub](#building-custom-python-backend-stub)
  • [Creating Custom Execution Environments](#creating-custom-execution-environments)
  • [Important Notes](#important-notes)
  • [Error Handling](#error-handling-1)
  • [Managing Shared Memory](#managing-shared-memory)
  • [Multiple Model Instance Support](#multiple-model-instance-support)
  • [Running Multiple Instances of Triton Server](#running-multiple-instances-of-triton-server)
  • [Business Logic Scripting](#business-logic-scripting)
  • [Using BLS with Decoupled Models](#using-bls-with-decoupled-models)
  • [Model Loading API](#model-loading-api)
  • [Using BLS with Stateful Models](#using-bls-with-stateful-models)
  • [Limitation](#limitation)
  • [Interoperability and GPU Support](#interoperability-and-gpu-support)
  • [pb_utils.Tensor.to_dlpack() -> PyCapsule](#pb_utilstensorto_dlpack---pycapsule)
  • [pb_utils.Tensor.from_dlpack() -> Tensor](#pb_utilstensorfrom_dlpack---tensor)
  • [pb_utils.Tensor.is_cpu() -> bool](#pb_utilstensoris_cpu---bool)
  • [Input Tensor Device Placement](#input-tensor-device-placement)
  • [Frameworks](#frameworks)
  • [PyTorch](#pytorch)
  • [PyTorch Determinism](#pytorch-determinism)
  • [TensorFlow](#tensorflow)
  • [TensorFlow Determinism](#tensorflow-determinism)
  • [Custom Metrics](#custom-metrics)
  • [Examples](#examples)
  • [AddSub in NumPy](#addsub-in-numpy)
  • [AddSubNet in PyTorch](#addsubnet-in-pytorch)
  • [AddSub in JAX](#addsub-in-jax)
  • [Business Logic Scripting](#business-logic-scripting-1)
  • [Preprocessing](#preprocessing)
  • [Decoupled Models](#decoupled-models)
  • [Model Instance Kind](#model-instance-kind)
  • [Auto-complete config](#auto-complete-config)
  • [Custom Metrics](#custom-metrics-1)
  • [Running with Inferentia](#running-with-inferentia)
  • [Logging](#logging)
  • [Reporting problems, asking questions](#reporting-problems-asking-questions)

Quick Start

1. Run the Triton Inference Server container.

docker run --shm-size=1g --ulimit memlock=-1 -p 8000:8000 -p 8001:8001 -p 8002:8002 --ulimit stack=67108864 -ti nvcr.io/nvidia/tritonserver:-py3

Replace \ with the Triton version (e.g. 21.05).

2. Inside the container, clone the Python backend repository.

git clone https://github.com/triton-inference-server/python_backend -b r

3. Install example model.

cd python_backend
mkdir -p models/add_sub/1/
cp examples/add_sub/model.py models/add_sub/1/model.py
cp examples/add_sub/config.pbtxt models/add_sub/config.pbtxt

4. Start the Triton server.

tritonserver --model-repository `pwd`/models

5. In the host machine, start the client container.

docker run -ti --net host nvcr.io/nvidia/tritonserver:-py3-sdk /bin/bash

6. In the client container, clone the Python backend repository.

git clone https://github.com/triton-inference-server/python_backend -b r

7. Run the example client.

python3 python_backend/examples/add_sub/client.py

Building from Source

1. Requirements

  • cmake >= 3.17
  • numpy
  • rapidjson-dev
  • libarchive-dev
  • zlib1g-dev
pip3 install numpy

On Ubuntu or Debian you can use the command below to install rapidjson, libarchive, and zlib:

sudo apt-get install rapidjson-dev libarchive-dev zlib1g-dev

2. Build Python backend. Replace \ with the GitHub branch that you want to compile. For release branches it should be r\ (e.g. r21.06).

mkdir build
cd build
cmake -DTRITON_ENABLE_GPU=ON -DTRITON_BACKEND_REPO_TAG= -DTRITON_COMMON_REPO_TAG= -DTRITON_CORE_REPO_TAG= -DCMAKE_INSTALL_PREFIX:PATH=`pwd`/install ..
make install

The following required Triton repositories will be pulled and used in the build. If the CMake variables below are not specified, "main" branch of those repositories will be used. \ should be the same as the Python backend repository branch that you are trying to compile.

  • triton-inference-server/backend: -DTRITON_BACKEND_REPO_TAG=
  • triton-inference-server/common: -DTRITON_COMMON_REPO_TAG=
  • triton-inference-server/core: -DTRITON_CORE_REPO_TAG=

Set -DCMAKE_INSTALL_PREFIX to the location where the Triton Server is installed. In the released containers, this location is /opt/tritonserver.

3. Copy example model and configuration

mkdir -p models/add_sub/1/
cp examples/add_sub/model.py models/add_sub/1/model.py
cp examples/add_sub/config.pbtxt models/add_sub/config.pbtxt

4. Start the Triton Server

/opt/tritonserver/bin/tritonserver --model-repository=`pwd`/models

5. Use the client app to perform inference

python3 examples/add_sub/client.py

Usage

In order to use the Python backend, you need to create a Python file that has a structure similar to below:

import triton_python_backend_utils as pb_utils

class TritonPythonModel:
"""Your Python model must use the same class name. Every Python model
that is created must have "TritonPythonModel" as the class name.
"""

@staticmethod
def auto_complete_config(auto_complete_model_config):
"""`auto_complete_config` is called only once when loading the model
assuming the server was not started with
`--disable-auto-complete-config`. Implementing this…

Excerpt shown — open the source for the full document.