groq/openbench v0.5.3
groq/openbench
Captured source
source ↗published Dec 9, 2025seen 5dcaptured 12hhttp 200method plain
v0.5.3
Repository: groq/openbench
Tag: v0.5.3
Published: 2025-12-09T00:50:23Z
Prerelease: no
Release notes:
0.5.3 (2025-12-08)
Features
- add --max-tasks option for concurrent task execution in eval command (#279) (241e653)
- add bbq benchmark (#255) (46f4744)
- add ChartQAPro (#289) (677f7c7)
- add configurable HuggingFace Hub config naming (#261) (8abe2ae)
- add DocVQA benchmark (#297) (0dd0edf)
- add fuzzy match suggestion for misspelled evals (#303) (625a7b3)
- add ifbench benchmark (#326) (bd730c2)
- add math EvalGroup (#263) (e0f4a9b)
- add MathVista benchmark (#298) (5c50a8f)
- add MMLU-Redux benchmark from lighteval (#321) (d22a587)
- add MMVet V2 benchmark (#296) (66689de)
- add OCRBench V2 benchmark (#295) (71f3589)
- add optional extras for simpleqa and toxicity (#266) (2450ddf)
- add sealqa benchmark (#283) (06b39e4)
- add SMT 2024 benchmarks (#239) (5d9b475)
- add tau bench, pass^k metric (#294) (2bb1242)
- agentdojo: port agentdojo benchmark (#223) (1cf174c)
- cli: added export command to exposrt specific logs to hf (#265) (62e8d8c)
- cvebench: added auto prepare env set up for cvebench (#259) (db238a3)
- deepresearch-bench: add deepresearch bench (#288) (d2b4622)
- docs: docs for unsupported providers (#312) (3a3d4b8)
- docs: search capability benchmarks feature page (#287) (9dd27c1)
- evals: add GSM8K benchmark with shared grade school math scorer (#322) (4559a67)
- evals: add QA benchmarks and shared scorer (#323) (0ea3733)
- factscore: added support for factscore (#258) (13aafd7)
- gpt_oss: add GPT-OSS AIME benchmark, make --epochs optional and stop default 1 from being forced down (#284) (815f51b)
- groq: implement configurable timeout for GroqAPI client (#271) (be492b6)
- groq: streaming support (#313) (c1a20be)
- m2s: added support for single turn conversion of 3 multi turn jailbreak datasets (mhj, safeMT, cosafe) (#222) (6b8f2b1)
- PolygloToxicityPrompts: add multilingual toxicity evaluation (#262) (46de7ee)
- provider: add helicone support (#275) (de6ab04)
- provider: add SiliconFlow provider support (#269) (ce14070)
- providers: add W&B…
Excerpt shown — open the source for the full document.
Notability
notability 3.0/10Minor version release of benchmarking tool