What does this model signal mean?

NVIDIA published nvidia/NV-KERMT-70M-v2. This model signal is evidence of what shipped on model infrastructure and how the release is positioned. High-signal details: license other · 0 HF downloads · NVIDIA's efficient recurrent memory transformer, 70M parameters. onlylabs links this event to 1 captured evidence page and 6 related model signals.

NVIDIA Model: nvidia/NV-KERMT-70M-v2

Captured source

source ↗

Hugging Face/huggingface.co/nvidia/NV-KERMT-70M-v2

nvidia/NV-KERMT-70M-v2 model card

Source ↗

published Jun 8, 2026seen Jun 12captured Jun 12http 200method plaintask graph-mllicense otherlibrary bionemodownloads 0likes 7

> Source code, training scripts, and inference utilities for this model: > [github.com/NVIDIA-BioNeMo/KERMT](https://github.com/NVIDIA-BioNeMo/KERMT) > (v2.0 branch / v2.0.0 release tag)

Model Overview

Description:

Contrastive KERMT (Kinetic GROVER Multi-Task) is a graph-transformer foundation model pretrained to learn chemically meaningful molecular representations for downstream ADMET (absorption, distribution, metabolism, excretion, toxicity) property prediction in drug discovery. The model encodes a 2D molecular graph into a latent representation under a single joint probabilistic objective that combines SMILES reconstruction, in-batch contrastive discrimination, and chemistry-specific self-supervision (atom-context, bond-context, and functional group prediction), all formulated as unit-weighted log-probability factors. The released checkpoint was pretrained for 100 epochs on a corpus combining an 11M-molecule ZINC15+ChEMBL base pool (following the pretraining-data protocol of Rong et al. 2020) with Biogen ADMET, ExpansionRX, and ChEMBL-MT (~125K additional molecules), and is intended as a starting point for downstream multi-task ADMET fine-tuning. Contrastive KERMT was developed by NVIDIA as part of the KERMT v2.0 release. This model is ready for commercial or non-commercial use.

License/Terms of Use:

The source code is made available under Apache License, Version 2.0. See LICENSE in the source repository at https://github.com/NVIDIA-BioNeMo/KERMT.

The model weights are made available under the NVIDIA Open Model License.

Deployment Geography:

Global

Use Case:

Computational chemistry and machine-learning researchers in drug discovery — particularly those working on ADMET / Drug Metabolism and Pharmacokinetics (DMPK) prediction — who need a pretrained molecular graph encoder that can be fine-tuned on multi-endpoint ADMET datasets, used as a feature extractor for property-prediction pipelines, or studied as a baseline in molecular-representation-learning research. The released checkpoint is a pretrained backbone; users are expected to fine-tune it on their own labeled datasets for specific ADMET endpoints before using predictions in downstream workflows.

Release Date:

NGC 06/10/2026 via https://catalog.ngc.nvidia.com/orgs/nvidia/teams/clara/resources/kermt-contrastive

Hugging Face 06/10/2026 via https://huggingface.co/nvidia/NV-KERMT-70M-v2

References(s):

Adrian, M., Chung, Y., Boyd, K., Paliwal, S., Veccham, S.P., Cheng, A.C. *Multitask finetuning and acceleration of chemical pretrained models for small molecule drug property prediction.* arXiv:2510.12719 (2025). https://arxiv.org/abs/2510.12719 — KERMT (the v1 baseline this work extends).
Rong, Y. et al. *Self-Supervised Graph Transformer on Large-Scale Molecular Data.* NeurIPS 33, 12559–12571 (2020). https://papers.nips.cc/paper/2020/hash/3fe230348e9a12c13120749e3f9fa4cd-Abstract.html — GROVER, the underlying graph-transformer architecture.
Sterling, T., Irwin, J. J. *ZINC 15 – Ligand Discovery for Everyone.* J. Chem. Inf. Model. 55(11), 2324–2337 (2015). DOI: 10.1021/acs.jcim.5b00559 — ZINC15 base corpus.
Mendez, D. et al. *ChEMBL: towards direct deposition of bioassay data.* Nucleic Acids Research 47(D1), D930–D940 (2019). — ChEMBL base corpus.
Fang, C., Wang, Y., Grater, R. et al. *Prospective Validation of Machine Learning Algorithms for ADMET Prediction.* J. Chem. Inf. Model. 63(11), 3263–3274 (2023). — Biogen ADMET dataset (in-domain augmentation + finetune benchmark).
Contrastive KERMT manuscript (in preparation; arXiv URL to be added on publication).

Model Architecture:

Architecture Type: Transformer (graph-transformer with local message passing + global self-attention)

Network Architecture: KERMT graph-transformer encoder (extension of GROVER) with a probabilistic latent head, an in-batch contrastive auxiliary variable, a SMILES-reconstruction transformer decoder, and chemistry-specific vocabulary prediction heads. Encoder: hidden size 800, 6 message-passing-plus-attention layers, 4 attention heads per layer, 1 multi-task (MT) block, PReLU activation, dropout 0.1. Decoder: 3 transformer layers, 8 attention heads, 512 hidden / latent dimension, FFN hidden 2048, rotary positional encoding (RoPE). Latent dimension 512.

This model was developed based on KERMT (Adrian et al. 2025, arXiv:2510.12719), in turn based on GROVER (Rong et al. 2020).

Number of model parameters: 7.06 × 10^7

Input(s):

Input Type(s): Text (SMILES string representing a 2D molecular structure)

Input Format(s): UTF-8 SMILES (Simplified Molecular Input Line Entry System)

Input Parameters: One-Dimensional (1D) text

Other Properties Related to Input: The input is a canonical SMILES string parseable by RDKit (an open-source cheminformatics toolkit); molecules are internally featurized into 2D atom-and-bond graphs prior to encoding. Recommended maximum sequence length for the SMILES decoder is 512 tokens (the value used at pretraining time); molecules whose canonical SMILES exceed this length should be truncated or omitted. Inputs are not text in the natural-language sense and are not subject to natural-language preprocessing (no tokenization in the human-language sense; characters are mapped via a chemistry-specific tokenizer matching the bundled SMILES vocabulary).

Output(s)

Output Type(s): Numerical tensors (molecular embeddings) and, when downstream task-specific heads are present, scalar ADMET property predictions. Optionally, generated SMILES strings via the pretraining-time SMILES decoder.

Output Format(s):

Molecular embeddings: float tensors of shape (batch_size, hidden_size=800) for atom-level and bond-level readouts; (batch_size, latent_dim=512) for the cMIM projected latent.

Property predictions (after finetune): float tensors of shape (batch_size, num_endpoints) — values are continuous regression outputs per ADMET endpoint.

Generated SMILES (pretrain-time decoder only): UTF-8 SMILES string.

Output Parameters: One-Dimensional (1D) embedding / prediction vectors.

Other Properties Related to Output: Embeddings are intended as inputs to downstream property-prediction heads, similarity...

Excerpt shown — open the source for the full document.

Notability

notability 5.0/10

NVIDIA released a small 70M model, no traction data.