Building realistic electric transmission grid dataset at scale: a pipeline from open dataset
Captured source
source ↗Building realistic electric transmission grid dataset at scale: a pipeline from open dataset - Microsoft Research
Skip to main content
Research
Publications Code & data People Microsoft Research blog
Artificial intelligence Audio & acoustics Computer vision Graphics & multimedia Human-computer interaction Human language technologies Search & information retrieval
Data platforms and analytics Hardware & devices Programming languages & software engineering Quantum computing Security, privacy & cryptography Systems & networking
Algorithms Mathematics
Ecology & environment Economics Medical, health & genomics Social sciences Technology for emerging markets
Academic programs Events & academic conferences Microsoft Research Forum
Behind the Tech podcast Microsoft Research blog Microsoft Research Forum Microsoft Research podcast
About Microsoft Research Careers & internships People Emeritus program News & awards Microsoft Research newsletter
Africa AI for Science AI Frontiers Asia-Pacific Cambridge Health Futures India Montreal New England New York City Redmond
Applied Sciences Mixed Reality & AI - Cambridge Mixed Reality & AI - Zurich
Register: Research Forum
Microsoft Security Azure Dynamics 365 Microsoft 365 Microsoft Teams Windows 365
Microsoft AI Azure Space Mixed reality Microsoft HoloLens Microsoft Viva Quantum computing Sustainability
Education Automotive Financial services Government Healthcare Manufacturing Retail
Find a partner Become a partner Partner Network Microsoft Marketplace Software companies
Blog Microsoft Advertising Developer Center Documentation Events Licensing Microsoft Learn Microsoft Research
View Sitemap
Return to Blog Home Microsoft Research Blog
At a glance
We construct geographically grounded, electrically coherent power grid models entirely from publicly available data and release a dataset spanning 48 U.S. states and multi-state interconnections.
The models support AC optimal power flow (AC‑OPF) analysis, enabling physics-based study of congestion, capacity, and demand siting without restricted data.
We demonstrate applications including transmission expansion potential, targeted line upgrades, and placement of large datacenter loads.
Microsoft Research is excited to release an open dataset of approximate transmission topology of the U.S. power grid derived from publicly available data.
The ability to study transmission-level power grid behavior is essential for modern power systems research. Analyses of congestion, transmission expansion, demand growth, and system resilience all depend on network models with realistic topology, electrical parameters, and geographic grounding.
In most of the world, including the United States, realistic transmission-level grid data is classified as critical infrastructure information and subject to strict access controls. These restrictions exist for good reasons, but the resulting lack of realistic grid models is increasingly exacerbating the challenges power systems face. Decisions about where new load can be added – and how additional transmission assets can be deployed to support it – are often gated behind lengthy and opaque processes that can take years. For researchers developing new tools and algorithms, access typically requires long approval cycles, strict non-redistribution agreements, or costly commercial licenses.
Spotlight: Microsoft research newsletter
Microsoft Research Newsletter
Stay connected to the research community at Microsoft.
Subscribe today
Opens in a new tab
As a result, many are left choosing between small “toy” networks with dozens of buses, or synthetic models that do not correspond to real infrastructure. This lack of realistic, shareable models is particularly limiting for data-driven and AI-based approaches, which require large volumes of physically plausible grid data for training and evaluation methods for grid analysis and planning.
Against this backdrop, a natural question arises:
Can we meaningfully understand how the U.S. power grid responds to modern stresses – and facilitate the development of actionable solutions for the system – using only open data?
In this work, we introduce an open-data-derived pipeline for constructing large-scale, transmission-level power grid models that realistically approximate existing networks without relying on proprietary or restricted datasets. We provide an open dataset derived from this process, consisting of transmission-level models spanning 48 U.S. states as well as interconnection-scale networks, ranging in size from small systems with as few as 11 buses to the full Eastern Interconnection grid connecting 21,697 buses. The pipeline has been validated across the continental United States, where sufficient open geographic, energy, and demographic data are available, and is designed to generalize to other regions with comparable public data sources.
Using only publicly accessible datasets, the pipeline produces geographically grounded, electrically coherent transmission models at state, multi-state, and interconnection scales. These models preserve the geographic structure of transmission corridors, substations, and generators inferred from open data, while explicitly accounting for uncertainty where detailed operational parameters are unavailable through transparent feasibility reporting.
Importantly, these are not toy networks or abstract benchmarks. The resulting models support alternating current optimal power flow (AC-OPF) analysis across a wide range of scales, enabling physics-based investigation of questions such as where transmission capacity is physically constrained; where new demand can be absorbed; and how infrastructure changes propagate through realistic network layouts – using only open data.
In this post, we describe the approach at a high level and highlight the system level questions it enables.
How the pipeline works
The pipeline turns publicly available geographic and energy data into transmission-level grid models that are geographically grounded and usable for power flow analysis.
The starting point is OpenStreetMap (opens in new tab) , which encodes the physical layout of transmission corridors, substations, and power plants. This geographic skeleton is then augmented with open datasets describing generation capacity, fuel mix, demand, and operational boundaries (including U.S. EIA energy statistics and U.S. Census data), allowing the models to go beyond topology and represent…
Excerpt shown — open the source for the full document.
Notability
notability 5.0/10Substantive research dataset pipeline, limited traction
Microsoft has a writing signal matching data demand, infrastructure.