RepoNVIDIANVIDIApublished Feb 23, 2026seen 5d

NVIDIA/GFMBench-api

Python

Open original ↗

Captured source

source ↗
published Feb 23, 2026seen 5dcaptured 9hhttp 200method plain

NVIDIA/GFMBench-api

Description: GFMBench-API offers a unified middleware that decouples genomic foundation model architectures from diverse tasks, standardizes data streams and metrics, & enables users to submit models through a single interface for reproducible evaluation and reporting

Language: Python

License: Apache-2.0

Stars: 7

Forks: 2

Open issues: 3

Created: 2026-02-23T08:20:20Z

Pushed: 2026-06-10T23:28:19Z

Default branch: main

Fork: no

Archived: no

README:

GFMBench-API

GFMBench-API is an extensible benchmarking suite for assessing genomic foundation models (GFMs) across a diverse set of downstream tasks, including classification, variant effect prediction, and zero-shot evaluation.

Quick Start

Installation

GFMBench-API separates package dependencies from model dependencies. Model runtimes (Evo2, Nucleotide Transformer, DNABERT2, etc.) are maintained by their own projects and are not bundled into the core package.

1. Create a virtual environment (choose one option):

Option A: Using pip (venv)

python -m venv gfmbench_env
source gfmbench_env/bin/activate # On Windows: gfmbench_env\Scripts\activate

Option B: Using conda

conda create -n gfmbench_env python=3.11
conda activate gfmbench_env

2. Install dependencies for your model first, following that model's own environment setup. Examples:

| Model | Dependency source | |-------|-------------------| | Evo2 | evo2 `pyproject.toml` | | Nucleotide Transformer (NTv3) | nucleotide-transformer `setup.py` | | DNABERT2 | DNABERT_2 `requirements.txt` |

3. Install GFMBench-API core dependencies on top of your model environment:

pip install -r basic_requirements.txt

basic_requirements.txt contains only what the gfmbench_api package needs (tasks, metrics, data I/O, etc.) — no model-specific libraries.

---

What’s Included?

GFMBench-API provides:

  • A core API package for unified GFM evaluation (gfmbench_api/).
  • A suite of standard benchmark tasks covering supervised and zero-shot scenarios.
  • Consistent interfaces for models and tasks, enabling both out-of-the-box use and customized evaluation pipelines.
  • Example scripts and templates (usage_examples/) to get started quickly or for rapid prototyping.

---

Repository Organization

gfmbench_api_rep/
├── gfmbench_api/ # Main API package
│ ├── benchmark_report/ # CSV report utilities
│ ├── metrics/ # Built-in metrics (AUROC, AUPRC, etc.)
│ ├── tasks/ # Task definitions
│ │ ├── base/ # Base task/model classes
│ │ └── concrete/ # 20+ ready-to-use tasks
│ └── utils/ # Misc utilities (data I/O, download helpers, inference cache)
├── usage_examples/ # Getting started scripts and toy models
│ ├── run_benchmark.py
│ ├── trainers/
│ └── sanity_models/
├── logs/ # Logs (autocreated)
└── basic_requirements.txt # GFMBench-API core only (model-agnostic)

---

Supported Benchmarks & Tasks

GFMBench-API supports evaluation on 20 unique tasks, grouped as:

Supervised Classification & Variant Prediction

| Task Class | Description | | --------------------------------- | --------------------------------------- | | GuePromoterAllTask | Binary classification of promoter vs non-promoter DNA sequences. | | GueSpliceSiteTask | Three-class classification of splice sites as donor, acceptor, or non-splice. | | GueTranscriptionFactorTask | Binary classification of transcription factor binding sites from ChIP-seq data. | | VariantBenchmarksCodingTask | Binary classification of coding variants as benign or pathogenic. | | VariantBenchmarksNonCodingTask | Binary classification of non-coding variants as benign or pathogenic. | | VariantBenchmarksExpressionTask | Binary classification of variants affecting gene expression. | | VariantBenchmarksCommonVsRareTask | Binary classification distinguishing common variants from synthetic rare controls. | | VariantBenchmarksMEQTLTask | Binary classification of variants affecting DNA methylation rates. | | VariantBenchmarksSQTLTask | Binary classification of variants affecting alternative splicing. | | LRBCausalEqtlTask | Binary classification of variants causally influencing gene expression with tissue context. |

Zero-Shot Variant Effect Prediction

| Task Class | Description | | -------------------------------------|--------------------------------------------| | VepevalClinvarTask | Zero-shot pathogenicity prediction for ClinVar SNVs using embedding-distance scoring. | | IndelClinvarTask | Zero-shot pathogenicity prediction for ClinVar insertions and deletions. | | BendVEPExpression | Zero-shot prediction of expression effects for non-coding variants. | | BendVEPDisease | Zero-shot prediction of disease effects for non-coding variants. | | SonglabClinvarTask | Zero-shot pathogenicity prediction for ClinVar SNVs using likelihood-based scoring. | | BRCA1Task | Zero-shot prediction of functional impact for BRCA1 variants (LOF, intermediate, functional). | | TraitGymComplexTask | Zero-shot prediction of complex trait-associated variants. | | TraitGymMendelianTask | Zero-shot prediction of Mendelian disease-associated variants. | | LrbVariantEffectPathogenicOmimTask | Zero-shot prediction of pathogenic variants associated with Mendelian diseases. | | LoleveCausalEqtlTask | Zero-shot prediction of causal expression-modulating variants (indels) in promoters. |

Several zero-shot variant effect prediction tasks repeat the same reference sequence across variants. For tasks where that pattern is common, reference sequences are cached during evaluation to avoid redundant forward passes and improve efficiency. Caching is applied only where it offers a meaningful memory and latency tradeoff. To disable caching (e.g. due to memory limits), set "disable_cache": True in task_config.

---

How Model & Task Interfaces Work

Task API

All task classes expose a consistent interface (see gfmbench_api/tasks/base/):

  • get_task_name()
  • get_task_attributes() — metadata (e.g. number of labels, dataset splits)
  • get_finetune_dataset()
  • eval_test_set(model)
  • eval_validation_set(model)
  • eval_cross_validation_fold(model, train_indices)

Model Integration

Simply implement the methods below in your model class; inheritance is…

Excerpt shown — open the source for the full document.

Notability

notability 3.0/10

Low traction, routine new repo

NVIDIA has a repo signal matching data demand, evals and quality.