OpenBMB/Scicore-Omics
Python
Captured source
source ↗OpenBMB/Scicore-Omics
Description: SciCore-Omics the first tri-modal foundation model linking histology images, spatial transcriptomics, and biological language.
Language: Python
License: Apache-2.0
Stars: 8
Forks: 2
Open issues: 0
Created: 2025-10-09T11:04:30Z
Pushed: 2026-06-03T01:02:56Z
Default branch: main
Fork: no
Archived: no
README:
---
News
- 2026-06: SciCore-Omics model weights are publicly available on Hugging Face: `openbmb/SciCore-Omics`.
- 2026-06: The online demo is available through Hugging Face Spaces: `Alkaidxxy/SciCore-Omics`.
- 2026-06: Training and inference code has been released in this repository.
---
Overview
SciCore-Omics is a gene-aware multimodal foundation model for joint reasoning over histology images, spatial transcriptomic profiles, and biological language.
Built on a MiniCPM-V-style multimodal language-model stack, SciCore-Omics introduces a dedicated transcriptomic branch that encodes gene-expression profiles with NicheFormer, compresses gene representations through a Gene Q-Former, and projects them into the language-model token space through a Gene Projector.
The model is designed for spatial biology and pathology scenarios where tissue morphology and molecular states should be interpreted together rather than treated as isolated modalities.
---
Highlights
- Tri-modal foundation model for spatial biology
SciCore-Omics links histology images, spatial transcriptomics, and biological language in a unified autoregressive modeling framework.
- Dedicated gene branch
The model uses NicheFormer, a Gene Q-Former, and a Gene Projector to transform transcriptomic profiles into LLM-compatible embeddings.
- Image-gene-text reasoning
SciCore-Omics supports image-only, gene-only, and image-gene joint inputs, enabling morphology-aware and molecule-aware biological reasoning.
- Staged training pipeline
The repository provides separate training stages for gene-bridge distillation, Swift-based CPT/SFT, and GSPO/PPO-style reinforcement learning refinement.
- Public release
Model weights, online demo, local inference code, and training entrypoints are publicly available.
---
What Can SciCore-Omics Do?
SciCore-Omics can be used for research tasks such as:
- histology-conditioned biological description generation;
- transcriptome-conditioned biological description generation;
- joint image-gene reasoning over spatial omics spots;
- spatial domain recognition;
- gene-expression-related reasoning;
- pathology and tissue-level question answering;
- preliminary case-level molecular interpretation from histology images.
> Note: SciCore-Omics is released for research use. It is not a standalone clinical diagnostic system.
---
Model Release
| Item | Status | Link | | --------------- | --------- | ---------------------------------------------------------------------------------- | | Model weights | Available | `openbmb/SciCore-Omics` | | Online demo | Available | `Alkaidxxy/SciCore-Omics` | | Source code | Available | `OpenBMB/Scicore-Omics` | | License | Available | [Apache-2.0](LICENSE) | | Training code | Available | train-distill-gene/, train-swift-cpt-sft/, train-rl/ | | Local inference | Available | eval/ |
---
Repository Structure
Scicore-Omics/ ├── model/ # Core model, processor, tokenizer, and gene branch definitions ├── eval/ # Minimal inference and evaluation examples ├── figs/ # Figures used in README and documentation ├── train-distill-gene/ # Gene bridge distillation scripts ├── train-swift-cpt-sft/ # Swift-based CPT/SFT training scripts ├── train-rl/ # GSPO/PPO-style RL refinement pipeline ├── environment.yml # Conda environment specification ├── LICENSE # Apache-2.0 license └── README.md # Project documentation
If you are new to the codebase, the recommended reading order is:
1. eval/ — run the model first; 2. model/ — understand the architecture; 3. train-distill-gene/ — understand gene-branch alignment; 4. train-swift-cpt-sft/ — understand CPT/SFT training; 5. train-rl/ — understand reinforcement learning refinement.
---
Quick Start
Option 1: Try the Online Demo
The fastest way to try SciCore-Omics is through the Hugging Face Space:
👉 https://huggingface.co/spaces/Alkaidxxy/SciCore-Omics
The demo supports scientific questions with uploaded gene-expression files and/or histology images.
---
Option 2: Run Local Inference
1. Clone the repository
git clone https://github.com/OpenBMB/Scicore-Omics.git cd Scicore-Omics
2. Create the environment
conda env create -f environment.yml conda activate OMICS
The reference environment was developed on Linux with NVIDIA A800-SXM4-80GB GPUs.
flash-attn can be sensitive to CUDA, PyTorch, and GPU versions. If installation fails, please adjust the flash-attn version according to your local environment.
3. Download the model weights
You can download the public weights from Hugging Face:
huggingface-cli download openbmb/SciCore-Omics \ --local-dir ./weights/SciCore-Omics
Alternatively, you can directly load the model by setting:
model_path = "openbmb/SciCore-Omics"
4. Run a minimal example
python eval/example.py \ --model_path ./weights/SciCore-Omics \ --image_path examples/assets/example.png \ --gene_path examples/assets/example.h5ad \ --prompt "Please describe the tissue morphology and molecular state of this sample."
Expected output:
The model generates a natural-language response describing tissue morphology, transcriptomic context, and potentially relevant biological processes.
---
Before You Run
Please make sure you have:
- a CUDA-enabled GPU environment;
- the SciCore-Omics model weights from Hugging Face;
- a histology image in
.png,.jpg, or.jpegformat; - a spatial transcriptomics file in
.h5adformat; - gene names compatible with the gene tokenizer resources under
model/gene_tokenizer/.
---
Examples
Image + Gene Input
python eval/example.py \ --model_path openbmb/SciCore-Omics \ --image_path examples/assets/example.png \ --gene_path…
Excerpt shown — open the source for the full document.
Notability
notability 3.0/10Low-starred new repo from known lab