WritingMicrosoftMicrosoftpublished May 8, 2026seen 5d

Building realistic electric transmission grid dataset at scale: a pipeline from open dataset

Open original ↗

Captured source

source ↗

Building realistic electric transmission grid dataset at scale: a pipeline from open dataset - Microsoft Research

Skip to main content

Research

Publications Code & data People Microsoft Research blog

Artificial intelligence Audio & acoustics Computer vision Graphics & multimedia Human-computer interaction Human language technologies Search & information retrieval

Data platforms and analytics Hardware & devices Programming languages & software engineering Quantum computing Security, privacy & cryptography Systems & networking

Algorithms Mathematics

Ecology & environment Economics Medical, health & genomics Social sciences Technology for emerging markets

Academic programs Events & academic conferences Microsoft Research Forum

Behind the Tech podcast Microsoft Research blog Microsoft Research Forum Microsoft Research podcast

About Microsoft Research Careers & internships People Emeritus program News & awards Microsoft Research newsletter

Africa AI for Science AI Frontiers Asia-Pacific Cambridge Health Futures India Montreal New England New York City Redmond

Applied Sciences Mixed Reality & AI - Cambridge Mixed Reality & AI - Zurich

Register: Research Forum

Microsoft Security Azure Dynamics 365 Microsoft 365 Microsoft Teams Windows 365

Microsoft AI Azure Space Mixed reality Microsoft HoloLens Microsoft Viva Quantum computing Sustainability

Education Automotive Financial services Government Healthcare Manufacturing Retail

Find a partner Become a partner Partner Network Microsoft Marketplace Software companies

Blog Microsoft Advertising Developer Center Documentation Events Licensing Microsoft Learn Microsoft Research

View Sitemap

Return to Blog Home Microsoft Research Blog

At a glance

We construct geographically grounded, electrically coherent power grid models entirely from publicly available data and release a dataset spanning 48 U.S. states and multi-state interconnections.

The models support AC optimal power flow (AC‑OPF) analysis, enabling physics-based study of congestion, capacity, and demand siting without restricted data.

We demonstrate applications including transmission expansion potential, targeted line upgrades, and placement of large datacenter loads.

Microsoft Research is excited to release an open dataset of approximate transmission topology of the U.S. power grid derived from publicly available data.

The ability to study transmission-level power grid behavior is essential for modern power systems research. Analyses of congestion, transmission expansion, demand growth, and system resilience all depend on network models with realistic topology, electrical parameters, and geographic grounding.

In most of the world, including the United States, realistic transmission-level grid data is classified as critical infrastructure information and subject to strict access controls. These restrictions exist for good reasons, but the resulting lack of realistic grid models is increasingly exacerbating the challenges power systems face. Decisions about where new load can be added – and how additional transmission assets can be deployed to support it – are often gated behind lengthy and opaque processes that can take years. For researchers developing new tools and algorithms, access typically requires long approval cycles, strict non-redistribution agreements, or costly commercial licenses.

Spotlight: Microsoft research newsletter

Microsoft Research Newsletter

Stay connected to the research community at Microsoft.

Subscribe today

Opens in a new tab

As a result, many are left choosing between small “toy” networks with dozens of buses, or synthetic models that do not correspond to real infrastructure. This lack of realistic, shareable models is particularly limiting for data-driven and AI-based approaches, which require large volumes of physically plausible grid data for training and evaluation methods for grid analysis and planning.

Against this backdrop, a natural question arises:

Can we meaningfully understand how the U.S. power grid responds to modern stresses – and facilitate the development of actionable solutions for the system – using only open data?

In this work, we introduce an open-data-derived pipeline for constructing large-scale, transmission-level power grid models that realistically approximate existing networks without relying on proprietary or restricted datasets. We provide an open dataset derived from this process, consisting of transmission-level models spanning 48 U.S. states as well as interconnection-scale networks, ranging in size from small systems with as few as 11 buses to the full Eastern Interconnection grid connecting 21,697 buses. The pipeline has been validated across the continental United States, where sufficient open geographic, energy, and demographic data are available, and is designed to generalize to other regions with comparable public data sources.

Using only publicly accessible datasets, the pipeline produces geographically grounded, electrically coherent transmission models at state, multi-state, and interconnection scales. These models preserve the geographic structure of transmission corridors, substations, and generators inferred from open data, while explicitly accounting for uncertainty where detailed operational parameters are unavailable through transparent feasibility reporting.

Importantly, these are not toy networks or abstract benchmarks. The resulting models support alternating current optimal power flow (AC-OPF) analysis across a wide range of scales, enabling physics-based investigation of questions such as where transmission capacity is physically constrained; where new demand can be absorbed; and how infrastructure changes propagate through realistic network layouts – using only open data.

In this post, we describe the approach at a high level and highlight the system level questions it enables.

How the pipeline works

The pipeline turns publicly available geographic and energy data into transmission-level grid models that are geographically grounded and usable for power flow analysis.

The starting point is OpenStreetMap (opens in new tab) , which encodes the physical layout of transmission corridors, substations, and power plants. This geographic skeleton is then augmented with open datasets describing generation capacity, fuel mix, demand, and operational boundaries (including U.S. EIA energy statistics and U.S. Census data), allowing the models to go beyond topology and represent…

Excerpt shown — open the source for the full document.

Notability

notability 5.0/10

Substantive research dataset pipeline, limited traction

Microsoft has a writing signal matching data demand, infrastructure.