RepoScalewayScalewaypublished Nov 14, 2023seen 5d

scaleway/ai-pulse-nvidia-trt-llm

Python

Open original ↗

Captured source

source ↗

scaleway/ai-pulse-nvidia-trt-llm

Description: Sources and datasets to deploy Nvidia TRT -LLM on Scaleway Ecosystem

Language: Python

License: Apache-2.0

Stars: 1

Forks: 1

Open issues: 0

Created: 2023-11-14T14:36:24Z

Pushed: 2023-12-29T16:41:17Z

Default branch: main

Fork: no

Archived: no

README: ![ai pulse banner](./docs/images/common/ai-pulse-banner.jpeg)

Efficient deployment and inference of GPU-accelerated LLMs

Introduction

NVIDIA TensorRT-LLM, which will be part of NVIDIA AI Enterprise, is an open-source software that delivers state-of-the-art performance for LLM serving using NVIDIA GPUs. It consists of the TensorRT deep learning compiler and includes optimized kernels, pre- and post-processing steps, and multi-GPU/multi-node communication primitives.​ In this repository, you will find sources that have been used by Nvidia to introduce Tensort-LLM during Scaleway AI Pulse 1st edition .

Guide Presentation

The Workshop aims to introduce TensorRT-LLM features and capabilities and walk through steps needed to build and run your model in TensorRT-LLM on both single GPU and multi-GPUs. We also use Triton Inference Server and TensorRT-LLM Backend to deploy the engines generated by TensorRT-LLM.

Getting Started

Let's start by setting-up the Scaleway prerequisites and the complete environment. Go to [Setup](./docs/01-setup.md).