WritingOpenAIOpenAIpublished Jul 22, 2025seen 6d

Pioneering an AI clinical copilot with Penda Health

Open original ↗

Captured source

source ↗

Pioneering an AI clinical copilot with Penda Health | OpenAI

July 22, 2025

Pioneering an AI clinical copilot with Penda Health

Study of 40,000 patient visits finds clinicians using AI copilot made fewer errors.

Loading…

Share

AI systems have the potential to improve human health globally—to make reliable health information universally available, help clinicians deliver better care, and empower people to better understand and advocate for their health.

Large language model (LLM) performance and safety in health continue to advance. OpenAI model performance on HealthBench⁠ doubled from GPT‑4o to o3, and frontier models often outperform experts on tasks like diagnostic reasoning⁠ and clinical summarization⁠. Yet adoption towards solving real-world patient and clinician challenges remains slow. To realize the potential of LLMs in health, the ecosystem will need to close the model-implementation gap—the chasm between what models can do and how they are used in practice.

To advance research on real-world implementation, OpenAI partnered with Penda Health⁠, a primary care provider operating in Nairobi, Kenya since 2012, to conduct a novel study of Penda’s LLM-powered clinician copilot. Penda built their copilot, AI Consult, to provide clinicians with LLM-written recommendations at key points during a patient visit. AI Consult acts as a real-time safety net that activates only when there might be an error, keeping clinicians fully in control.

In a study of 39,849 patient visits across 15 clinics, clinicians with AI Consult had a 16% relative reduction in diagnostic errors and a 13% reduction in treatment errors compared to those without.

We believe this outcome was the result of three key factors:

  • Capable model: Penda’s copilot used GPT‑4o from August 2024, and models have improved rapidly since. Model performance is no longer the limiting factor.
  • Clinically-aligned implementation: The copilot was co-developed with clinical users to ensure it genuinely supported—rather than disrupted—the flow of care.
  • Active deployment: Penda put considerable effort into helping clinicians understand why and how to use the copilot, which was crucial for uptake.

Today, we are publishing the study findings alongside a closer look at Penda’s successful implementation, offering the ecosystem an early template for the safe and effective use of LLMs to support clinicians.

We engaged extensively with local stakeholders for the study. This quality improvement research project was approved by the AMREF Health Africa Ethical and Scientific Review Committee, the Kenyan Ministry of Health, Digital Health Agency, and the Nairobi County Department of Health, and conducted under a research license from Kenya’s National Commission for Science, Technology and Innovation.

Primary care, Penda Health, and AI Consult

AI systems could be especially useful in primary care. Primary care clinicians see patients across every age group, organ system, and disease type, often in the same day, requiring a vast breadth of knowledge. This complexity makes medical errors common: the WHO reports⁠ that patient harm in primary care is both common and preventable.

Penda Health is a social enterprise that aims to provide high-quality affordable care. Penda has 16 clinics that each provide primary care, urgent care, laboratory services, and a pharmacy. These clinics are open 24/7 and receive nearly half a million patient visits each year. Penda maintains a uniquely strong focus on quality of care, with an active clinician training and quality program, and has developed and tested previous iterations⁠ of copilot systems.

After ChatGPT’s release, Penda’s Chief Medical Officer, Dr. Robert Korom, recognized how LLMs could enable higher-quality decision support by covering a broader range of conditions and potential errors than previously possible. In response, Penda built one of the earliest LLM clinical copilots, enabling clinicians to seek a second opinion from an LLM when desired. In an internal audit, Penda reviewed 100 LLM outputs from real patient encounters, and found many cases where LLM output was helpful and none where it was harmful. However, this early version of AI Consult achieved limited uptake because it required clinicians to actively request help and interrupted the flow of the patient interaction.

Iterating towards clinically-aligned implementation

In early 2025, Penda developed a new version of AI Consult that acts as a real-time safety net in a clinician’s workflow. This copilot is integrated into the electronic health record Penda clinicians use every day and runs in the background during every visit. As clinicians interact with patients and document patient visits, documentation without patient identifiers is sent to the OpenAI API at key points. AI Consult then provides any needed feedback to clinicians based on the clinical interaction so far. There are three types of responses that can be returned:

  • Green: indicates no concerns; appears as a green checkmark.
  • Yellow: indicates moderate concerns; appears as a yellow ringing bell that clinicians can choose whether to view.
  • Red: indicates safety-critical issues; appears as a pop-up that clinicians are required to view before continuing.

Penda designed AI Consult to ensure patient safety. The copilot acts as a safety net, identifying potential errors for a clinician to verify rather than taking actions on behalf of clinicians. Importantly, clinicians drive the workflow at every step: when the copilot identifies potential errors, clinicians can choose whether to modify their decisions based on the feedback, and the final decision belongs to the clinician. AI Consult was tailored to Penda’s context, with prompts including Kenyan epidemiological context, guidance on local clinical guidelines, and standard procedures at Penda’s clinics.

AI Consult flags an important missing diagnosis of iron deficiency anemia, leading the clinician to add this diagnosis so it can be appropriately treated.

Initial documentation

History and Clinical Notes (20 lines)

Investigations conducted: Full Haemogram (FHG): * WBC: 12.38 * HGB: 9.90 * HCT: 30.70 * Plt: 248.00 * RBC (Full Haemogram): 5.26 * MCV: 58.30 * MCH: 18.80 * MCHC: 32.20 Streptococcus A Antigen Test: * Result: Negative Diagnosis: tonsilitis, acute bacterial

Contribute to AI Consult response

AI Consult response:

Reasoning: The clinical documentation shows the presence of microcytic anemia…

Excerpt shown — open the source for the full document.

Notability

notability 7.0/10

Notable healthcare AI deployment by OpenAI