What does this fork signal mean?

Baseten forked basetenlabs/TensorRT-Model-Optimizer (forked from NVIDIA/Model-Optimizer). This fork signal points to upstream code the lab may be inspecting, patching, or building on. High-signal details: repo basetenlabs/TensorRT-Model-Optimizer · parent NVIDIA/Model-Optimizer · Fork with 1 star, trivial traction. onlylabs links this event to 1 captured evidence page and 6 related fork signals.

Baseten Fork: basetenlabs/TensorRT-Model-Optimizer

Captured source

source ↗

GitHub/github.com/basetenlabs/TensorRT-Model-Optimizer

basetenlabs/TensorRT-Model-Optimizer repository metadata

Source ↗

published Mar 11, 2025seen 5dcaptured 11hhttp 200method plain

basetenlabs/TensorRT-Model-Optimizer

Description: A unified library of state-of-the-art model optimization techniques such as quantization, pruning, distillation, speculative decoding, etc. It compresses deep learning models for downstream deployment frameworks like TensorRT-LLM or TensorRT to optimize inference speed on NVIDIA GPUs.

Language: Python

License: Apache-2.0

Stars: 1

Forks: 0

Open issues: 4

Created: 2025-03-11T16:30:08Z

Pushed: 2025-12-15T22:23:42Z

Default branch: main

Fork: yes

Parent repository: NVIDIA/Model-Optimizer

Archived: no

README:

______________________________________________________________________

NVIDIA Model Optimizer (referred to as Model Optimizer, or ModelOpt) is a library comprising state-of-the-art model optimization [techniques](#techniques) including quantization, distillation, pruning, speculative decoding and sparsity to accelerate models.

[Input] Model Optimizer currently supports inputs of a Hugging Face, PyTorch or ONNX model.

[Optimize] Model Optimizer provides Python APIs for users to easily compose the above model optimization techniques and export an optimized quantized checkpoint. Model Optimizer is also integrated with NVIDIA NeMo, Megatron-LM and Hugging Face Accelerate for training required inference optimization techniques.

[Export for deployment] Seamlessly integrated within the NVIDIA AI software ecosystem, the quantized checkpoint generated from Model Optimizer is ready for deployment in downstream inference frameworks like SGLang, TensorRT-LLM, TensorRT, or vLLM.

Latest News

[2025/12/11] BLOG: Top 5 AI Model Optimization Techniques for Faster, Smarter Inference
[2025/12/08] NVIDIA TensorRT Model Optimizer is now officially rebranded as NVIDIA Model Optimizer.
[2025/10/07] BLOG: Pruning and Distilling LLMs Using NVIDIA Model Optimizer
[2025/09/17] BLOG: An Introduction to Speculative Decoding for Reducing Latency in AI Inference
[2025/09/11] BLOG: How Quantization Aware Training Enables Low-Precision Accuracy Recovery
[2025/08/29] BLOG: Fine-Tuning gpt-oss for Accuracy and Performance with Quantization Aware Training
[2025/08/01] BLOG: Optimizing LLMs for Performance and Accuracy with Post-Training Quantization
[2025/06/24] BLOG: Introducing NVFP4 for Efficient and Accurate Low-Precision Inference
[2025/05/14] NVIDIA TensorRT Unlocks FP4 Image Generation for NVIDIA Blackwell GeForce RTX 50 Series GPUs
[2025/04/21] Adobe optimized deployment using Model-Optimizer + TensorRT leading to a 60% reduction in diffusion latency, a 40% reduction in total cost of ownership
[2025/04/05] NVIDIA Accelerates Inference on Meta Llama 4 Scout and Maverick. Check out how to quantize Llama4 for deployment acceleration [here](./examples/llm_ptq/README.md#llama-4)
[2025/03/18] World's Fastest DeepSeek-R1 Inference with Blackwell FP4 & Increasing Image Generation Efficiency on Blackwell
[2025/02/25] Model Optimizer quantized NVFP4 models available on Hugging Face for download: DeepSeek-R1-FP4, Llama-3.3-70B-Instruct-FP4, Llama-3.1-405B-Instruct-FP4
[2025/01/28] Model Optimizer has added support for NVFP4. Check out an example of NVFP4 PTQ [here](./examples/llm_ptq/README.md#model-quantization-and-trt-llm-conversion).
[2025/01/28] Model Optimizer is now open source!

Previous News

[2024/10/23] Model Optimizer quantized FP8 Llama-3.1 Instruct models available on Hugging Face for download: 8B, 70B, 405B.
[2024/09/10] Post-Training Quantization of LLMs with NVIDIA NeMo and Model Optimizer.
[2024/08/28] Boosting Llama 3.1 405B Performance up to 44% with Model Optimizer on NVIDIA H200 GPUs
[2024/08/28] Up to 1.9X Higher Llama 3.1 Performance with Medusa
[2024/08/15] New features in recent releases: [Cache Diffusion](./examples/diffusers/cache_diffusion), QLoRA workflow with NVIDIA NeMo, and more. Check out [our…

Excerpt shown — open the source for the full document.

Notability

notability 1.0/10

Fork with 1 star, trivial traction