WritingGoogle (DeepMind / Gemini)Google (DeepMind / Gemini)published Jul 9, 2025seen 6d

MedGemma: Our most capable open models for health AI development

Open original ↗

Captured source

source ↗

MedGemma: Our most capable open models for health AI development

Skip to main content

MedGemma: Our most capable open models for health AI development

July 9, 2025 Daniel Golden, Engineering Manager, and Rory Pilgrim, Product Manager, Google Research

We’re announcing new multimodal models in the MedGemma collection, our most capable open models for health AI development.

Quick links

MedGemma technical report

HAI-DEF

MedGemma on Hugging Face

MedGemma on GitHub

MedSigLIP on GitHub

Share

Copy link

×

Healthcare is increasingly embracing AI to improve workflow management, patient communication, and diagnostic and treatment support. It’s critical that these AI-based systems are not only high-performing, but also efficient and privacy-preserving. It’s with these considerations in mind that we built and recently released Health AI Developer Foundations (HAI-DEF). HAI-DEF is a collection of lightweight open models designed to offer developers robust starting points for their own health research and application development. Because HAI-DEF models are open, developers retain full control over privacy, infrastructure and modifications to the models. In May of this year, we expanded the HAI-DEF collection with MedGemma , a collection of generative models based on Gemma 3 that are designed to accelerate healthcare and lifesciences AI development. Today, we’re proud to announce two new models in this collection. The first is MedGemma 27B Multimodal, which complements the previously-released 4B Multimodal and 27B text-only models by adding support for complex multimodal and longitudinal electronic health record interpretation. The second new model is MedSigLIP, a lightweight image and text encoder for classification, search, and related tasks. MedSigLIP is based on the same image encoder that powers the 4B and 27B MedGemma models. MedGemma and MedSigLIP are strong starting points for medical research and product development. MedGemma is useful for medical text or imaging tasks that require generating free text, like report generation or visual question answering. MedSigLIP is recommended for imaging tasks that involve structured outputs like classification or retrieval. All of the above models can be run on a single GPU, and MedGemma 4B and MedSigLIP can even be adapted to run on mobile hardware. Full details of MedGemma and MedSigLIP development and evaluation can be found in the MedGemma technical report .

MedGemma: A multimodal generative model for health

The MedGemma collection includes variants in 4B and 27B sizes, both of which now accept image and text inputs and produce text outputs. MedGemma 4B Multimodal : MedGemma 4B scores 64.4% on MedQA , which ranks it among the best very small (<8B) open models. In an unblinded study, 81% of MedGemma 4B–generated chest X-ray reports were judged by a US board certified radiologist to be of sufficient accuracy to result in similar patient management compared to the original radiologist reports. It additionally achieves performance on medical image classification tasks that is competitive with task-specific state-of-the-art models.

MedGemma 27B Text and MedGemma 27B Multimodal : Based on internal and published evaluations, the MedGemma 27B models are among the best performing small open models (<50B) on the MedQA medical knowledge and reasoning benchmark; the text variant scores 87.7%, which is within 3 points of DeepSeek R1 , a leading open model, but at approximately one tenth the inference cost. The MedGemma 27B models are competitive with larger models across a variety of benchmarks, including retrieval and interpretation of electronic health record data.

On MedQA, MedGemma 4B and 27B are among the best performing models of their size. Note that in this plot, cost estimates are made based on legacy.lmarena.ai price analysis and together.ai/pricing . For models not present on the leaderboard, we used price data from the models from which they were derived.

Based on review by a US board-certified cardiothoracic radiologist, we found that 81% of MedGemma chest X-ray reports would lead to similar patient management compared to the original radiologist reports.

We developed these models by training a medically optimized image encoder (independently released as MedSigLIP, described below), followed by training the corresponding 4B and 27B versions of the Gemma 3 model on medical data. We took care to retain the general (non-medical) capabilities of Gemma throughout this process. This allows MedGemma to perform well on tasks that mix medical and non-medical information and preserve instruction-following and capabilities in non-English languages.

A key aspect of these models is their adaptability. For instance, after fine-tuning, MedGemma 4B is able to achieve state-of-the-art performance on chest X-ray report generation, with a RadGraph F1 score of 30.3. The straightforward ability for developers to improve performance on their target applications highlights the value of MedGemma as a starting point for developers looking to build AI for healthcare.

MedSigLIP: A specialized image encoder for healthcare

MedSigLIP is a lightweight image encoder of only 400M parameters that uses the Sigmoid loss for Language Image Pre-training (SigLIP) architecture. MedSigLIP was adapted from SigLIP via tuning with diverse medical imaging data, including chest X-rays, histopathology patches, dermatology images, and fundus images , allowing the model to learn nuanced features specific to these modalities. Importantly, we also took care to ensure that MedSigLIP retains strong performance on the natural images on which the original SigLIP model was trained, maintaining its versatility. MedSigLIP is designed to bridge the gap between medical images and medical text by encoding them into a common embedding space. MedSigLIP achieves similar or improved classification performance compared to task-specific vision embedding models while being far more versatile across medical imaging domains. MedSigLIP is ideal for: Traditional image classification: Build performant models to classify medical images.

Zero-shot image classification: Classify images without specific training examples by comparing image embeddings to the embeddings of textual class labels.

Semantic image retrieval: Find visually or semantically similar images from large medical image databases.

The power of open models

Because the MedGemma collection is open, the models can be downloaded,…

Excerpt shown — open the source for the full document.

Notability

notability 8.0/10

Major lab's new health AI models, likely high impact.