What does this repo signal mean?

Fireworks AI published fw-ai/benchmark (Python). This repository signal exposes tooling, eval, infrastructure, or model-adjacent work before it may appear in a launch post. High-signal details: repo fw-ai/benchmark · language Python · New benchmark repo with moderate traction (105 stars).. onlylabs links this event to 1 captured evidence page and 6 related repo signals.

Fireworks AI Repo: fw-ai/benchmark

Captured source

source ↗

GitHub/github.com/fw-ai/benchmark

fw-ai/benchmark repository metadata

Source ↗

published Nov 3, 2023seen Jun 5captured Jun 11http 200method plain

fw-ai/benchmark

Description: Benchmark suite for LLMs from Fireworks.ai

Language: Python

License: Apache-2.0

Stars: 105

Forks: 39

Open issues: 21

Created: 2023-11-03T18:00:04Z

Pushed: 2026-06-06T19:04:33Z

Default branch: main

Fork: no

Archived: no

README:

Benchmark / Load-testing Suite by Fireworks.ai

LLM benchmarking

The load test is designed to simulate continuous production load and minimize effect of model generation behavior:

variation in generation parameters
continuous request stream with varying distribution and load levels
force generation of exact number of output tokens (for most providers)
specified load test duration

Supported providers and API flavors:

OpenAI API compatible endpoints:
Fireworks.ai public or private deployments
VLLM
Anyscale Endpoints
OpenAI
Text Generation Inference (TGI) / HuggingFace Endpoints
Together.ai
NVidia Triton server:
Legacy HTTP endpoints (no streaming)
LLM-focused endpoints (with or without streaming)

Supported API types:

Chat completions (/v1/chat/completions)
Text completions (/v1/completions)
Embeddings (/v1/embeddings)
Rerank (/v1/rerank)

Captured metrics:

Overall latency
Number of generated tokens
Sustained requests throughput (QPS)
Time to first token (TTFT) for streaming
Per token latency for streaming

Metrics summary can be exported to CSV. This way multiple configuration can be scripted over. CSV file can be imported to Google Sheets/Excel or Jupyter for further analysis.

Local Setup

The fastest way to get started is with uv:

bash scripts/setup.sh

This will install uv (if needed), create a .venv with Python 3.11, and install all dependencies.

Then activate the environment:

source .venv/bin/activate

Usage

See [llm_bench](llm_bench) folder for detailed usage.

See [llm_bench/benchmark_suite.ipynb](llm_bench/benchmark_suite.ipynb) for a detailed example of how to use the load test script and run different types of benchmark suites.

Notability

notability 6.0/10

New benchmark repo with moderate traction (105 stars).