Understanding AI and learning outcomes
Captured source
source ↗New tools for understanding AI and learning outcomes | OpenAI
March 4, 2026
New tools for understanding AI and learning outcomes
Advancing how AI’s impact is measured across learning environments
Share
Education is one of AI’s most promising frontiers. With tools like ChatGPT, personalized learning support can be available to any student, anywhere, at any time.
But the education sector is still early in its understanding of the impact of AI on learning outcomes. Last year, our team set out to study the use of tools like study mode and found promising gains in student performance. But our research also raised an important question: how can we assess how AI influences a learner's progress over time, not just on a final exam?
This is a broader ecosystem challenge. To-date, most research methods focus on narrow performance signals—such as test scores—and lack the ability to assess how students actually learn with AI in real-world settings, and how that use shapes outcomes over time.
To address this gap, we developed the Learning Outcomes Measurement Suite, a framework created with Estonia’s University of Tartu and the SCALE Initiative at the Stanford Accelerator for Learning to support longitudinal measurement of learning outcomes across different educational contexts.
Extensive validation is underway through a randomized controlled trial, and further research is planned with founding organizations in the Learning Lab, OpenAI’s learning research ecosystem, including researchers from Arizona State University, UCL Knowledge Lab, and MIT Media Lab (building on prior collaborative studies).
Today, we’re sharing an overview of how the measurement suite works and why it matters. Over time, we intend to publish more research and release the measurement suite as a public resource for schools, universities, and education systems worldwide.
> “This research allows us to learn quickly while also laying the groundwork for a deeper understanding of how AI can be thoughtfully integrated into schools in ways that truly matter. We want to understand how these tools can support rigorous academic learning while also cultivating higher-order thinking, creativity, curiosity, and students’ confidence in themselves as learners.”
–Susanna Loeb, Professor of Education and Faculty Director, SCALE Initiative at Stanford University
Summary of takeaways
- Today’s research methods on the impact of AI on learning show promising signals about performance, but don’t capture the full picture of how AI affects learning outcomes over time.
- The Learning Outcomes Measurement Suite will, for the first time, provide a standard framework for longitudinal studies that help educators, researchers, and institutions understand how AI shapes learning and outcomes across different contexts.
- OpenAI’s Learning Lab is a new research ecosystem focused on advancing this work. OpenAI will publish findings alongside a range of partners as the field continues to develop.
Origins and early research
When students use AI tools to study and learn, it can mean many different things—from going to AI for quick answers to using it to work through problems step by step with tutor-like guidance. To encourage users to engage with ChatGPT in ways that support deeper understanding and skill-building, OpenAI introduced study mode last year. Under the hood, study mode is powered by custom system instructions we’ve written in collaboration with teachers, scientists, and pedagogy experts to reflect a core set of behaviors that support true learning, not just answers—using scaffolding, checks for understanding, and guided practice.
To test whether this kind of pedagogically aligned AI interaction style translates into better learning outcomes, we ran a randomized study with over 300 college students preparing for neuroscience and microeconomics exams. While analysis is still underway, early results give us confidence that a pedagogically aligned AI interaction style, encouraged through features like study mode, can improve learning outcomes. But this research also surfaced an important reality: what really matters is whether the gains and associated productive behaviors remain durable over time.
Study design
Participants were assigned to one of three groups: a control group studied using traditional online resources such as Google Search and YouTube, with AI generated overview features disabled, while two additional groups were given access to one of two study mode variants designed to guide students through the learning process in slightly different ways. Baseline quizzes and onboarding surveys were collected ahead of time to adjust for differences in prior coursework exposure, study habits, academic confidence, and familiarity with AI tools. Students completed timed study mode sessions before each exam, with the two study mode variants counterbalanced across subjects.
This setup was designed to reflect real world study conditions rather than a tightly controlled lab environment. Participation was not tied to exam performance, and not all students used study mode to the same extent during the nominal 40 minute sessions. This allowed us to measure and report intention-to-treat (ITT) effects, the impact of being provided access to the tool under realistic rollout conditions—in other words, the causal impact of being offered study mode, acknowledging that engagement can vary in practice.
Findings
We measured performance on each exam separately. In our randomized study, improvements were not uniform across subjects, and levels of engagement with study mode varied across participants.
- Neuroscience (primary ITT): We observed directionally positive differences for study mode relative to control, but results were not distinguishable from students studying with traditional online resources. Some onboarding and technical issues impacted time spent studying among students using study mode.
- Microeconomics (primary ITT): We observed meaningful gains in exam performance among students assigned access to study mode vs the no-AI control group—roughly a 15% higher score relative.
Study mode (variants A & B) vs Control (no AI group): Adjusted mean exam scores
The effect remains consistent when we compare each study mode variant separately with the control.
While this reflects real world variation, it highlighted a deeper limitation in how learning outcomes are typically measured.
Most existing evaluation…
Excerpt shown — open the source for the full document.
Notability
notability 3.0/10Educational blog post, no traction data