scaleway/ai-pulse-nvidia-trt-llm
Python
Captured source
source ↗scaleway/ai-pulse-nvidia-trt-llm
Description: Sources and datasets to deploy Nvidia TRT -LLM on Scaleway Ecosystem
Language: Python
License: Apache-2.0
Stars: 1
Forks: 1
Open issues: 0
Created: 2023-11-14T14:36:24Z
Pushed: 2023-12-29T16:41:17Z
Default branch: main
Fork: no
Archived: no
README: 
Efficient deployment and inference of GPU-accelerated LLMs
Introduction
NVIDIA TensorRT-LLM, which will be part of NVIDIA AI Enterprise, is an open-source software that delivers state-of-the-art performance for LLM serving using NVIDIA GPUs. It consists of the TensorRT deep learning compiler and includes optimized kernels, pre- and post-processing steps, and multi-GPU/multi-node communication primitives. In this repository, you will find sources that have been used by Nvidia to introduce Tensort-LLM during Scaleway AI Pulse 1st edition .
Guide Presentation
The Workshop aims to introduce TensorRT-LLM features and capabilities and walk through steps needed to build and run your model in TensorRT-LLM on both single GPU and multi-GPUs. We also use Triton Inference Server and TensorRT-LLM Backend to deploy the engines generated by TensorRT-LLM.
Getting Started
Let's start by setting-up the Scaleway prerequisites and the complete environment. Go to [Setup](./docs/01-setup.md).