ModelArcee AIArcee AIpublished Jan 27, 2026seen 5d

arcee-ai/Trinity-Large-TrueBase

Open original ↗

Captured source

source ↗
published Jan 27, 2026seen 5dcaptured 9hhttp 200method plaintask text-generationlicense otherlibrary transformersparams 399Bdownloads 226likes 67

Trinity-Large-TrueBase

Introduction

Trinity-Large-TrueBase is a base pretraining checkpoint from Arcee AI's Trinity Large training run. It is a 398B-parameter sparse Mixture-of-Experts (MoE) model with approximately 13B active parameters per token. The checkpoint was captured after 10 trillion tokens of pretraining, prior to learning-rate annealing and before any instruction tuning or reinforcement learning.

This checkpoint is intended for research, probing, ablation studies, and downstream fine-tuning and comes without any pre-baked alignment, instruction formatting, or preference optimization.

More details on the training of Trinity Large are available in the technical report.

Model Variants

The Trinity Large family consists of three checkpoints from the same training run:

  • Trinity-Large-TrueBase (this release): 10T-token pre-anneal checkpoint with no instruction data
  • [Trinity-Large-Thinking](https://huggingface.co/arcee-ai/Trinity-Large-Thinking): Reasoning-optimized, agentic post-training with extended chain-of-thought
  • [Trinity-Large-Base](https://huggingface.co/arcee-ai/Trinity-Large-Base): Full 17T-token pretrained foundation model with mid-training anneals
  • [Trinity-Large-Preview](https://huggingface.co/arcee-ai/Trinity-Large-Preview): Lightly post-trained, chat-ready model undergoing active RL

Architecture

Trinity-Large-TrueBase uses a sparse MoE configuration designed to maximize efficiency while maintaining large-scale capacity.

| Hyperparameter | Value | |:---|:---:| | Total parameters | ~398B | | Active parameters per token | ~13B | | Experts | 256 | | Active experts | 4 | | Routing strategy | 4-of-256 (1.56% sparsity) | | Dense layers | 6 | | Pretraining context length | 8,192 | | Architecture | Sparse MoE (AfmoeForCausalLM) |

Note: Extended context support (e.g., 512k) was introduced after this checkpoint and is not available in TrueBase.

Benchmark Results

| Benchmark | N-shot | Metric | Score | Stderr | |-------------------------------|--------|-------------------------------|--------|---------| | arc_challenge_0shot | 0 | acc_norm,none | 0.6237 | ±0.0142 | | bbh_fewshot | 3 | exact_match,remove_whitespace | 0.5784 | ±0.0054 | | gpqa_diamond_5shot | 5 | acc_norm,none | 0.4091 | ±0.0350 | | gpqa_diamond_generative_5shot | 5 | exact_match,flexible-extract | 0.3788 | ±0.0346 | | gsm8k_8shot | 8 | exact_match,flexible-extract | 0.8036 | ±0.0109 | | gsm8k_cot | 8 | exact_match,flexible-extract | 0.8044 | ±0.0109 | | hellaswag_5shot | 5 | acc_norm,none | 0.8813 | ±0.0032 | | humaneval_plus | 0 | pass@1,create_test | 0.5183 | ±0.0391 | | leaderboard_math_hard | 4 | exact_match,none | 0.2696 | ±0.0113 | | mbpp_plus | 3 | pass_at_1,none | 0.8095 | ±0.0202 | | minerva_math500 | 4 | math_verify,none | 0.4820 | ±0.0224 | | mmlu_5shot | 5 | acc,none | 0.7845 | ±0.0033 | | mmlu_generative_5shot | 5 | exact_match,get_response | 0.7848 | ±0.0033 | | mmlu_pro | 5 | exact_match,custom-extract | 0.5160 | ±0.0044 | | triviaqa_5shot | 5 | exact_match,remove_whitespace | 0.8096 | ±0.0029 | | winogrande_5shot | 5 | acc,none | 0.8145 | ±0.0109 |

Training Configuration

Pretraining

  • Training tokens: 10 trillion
  • Checkpoint type: Pre-anneal
  • Instruction data: None
  • RLHF or post-training: None

This checkpoint branches from the main Trinity Large run at the 10T-token mark, prior to learning-rate decay or post-training phases.

Optimizers

Optimizer learning rates after WSD warm-up:

  • Adam learning rate: 2e-4
  • Muon learning rate: 8e-4

Muon was used to support larger critical batch sizes in a highly sparse MoE regime.

Infrastructure

  • Hardware: 2,048 NVIDIA B300 GPUs
  • Parallelism: HSDP + Expert Parallelism
  • Compute partner: Prime Intellect
  • Data partner: Datology

Intended Use

  • Studying emergent behavior from large-scale pretraining
  • Sparse MoE routing and load-balancing research
  • Interpretability, probing, and ablation studies
  • Domain-specific fine-tuning from a clean base
  • Academic and industrial foundation model research

Rationale for Release

Most base model releases include instruction data, annealed training dynamics, or early alignment stages. Trinity-Large-TrueBase excludes these, providing an opportunity to study what large-scale models learn from pretraining data alone. This checkpoint is intended as a foundation for research rather than as a finished conversational assistant.

Known Limitations

  • Not aligned for safety, helpfulness, or conversational tone
  • Requires substantial compute and expertise to fine-tune
  • May exhibit raw or unstable behaviors typical of unaligned models
  • No extended-context tuning beyond the 8K pretraining window

License

Trinity-Large-TrueBase is released under the OpenMDW License, version 1.1 (OpenMDW-1.1).

Notability

notability 3.0/10

Low-download open-source model release