WritingDatabricks (DBRX)Databricks (DBRX)published Jun 25, 2026seen 1d

From test bench to lakehouse: how AVL modernizes measurement data analytics with Impulse

Open original ↗

Captured source

source ↗

From test bench to lakehouse: how AVL modernizes measurement data analytics with Impulse | Databricks Blog Skip to main content

Summary

Impulse is an open-source Databricks Labs framework that lets domain engineers analyze sensor data on Databricks with simple Python expressions.

Impulse scales time-series analytics to hundreds of terabytes of measurement data, while keeping analyses reproducible, shareable across teams and governed by Unity Catalog.

AVL replaced its legacy on-premise platform with Impulse on Databricks, cutting analysis time from days to minutes and standardizing measurement data analytics across the organization.

1. Introduction - Impulse: time-series analytics for measurement data A single automotive test campaign produces hundreds of thousands of measurement recordings and hundreds of terabytes of time-series sensor data. This data is stored in binary formats like ASAM MDF4 and is traditionally analyzed with desktop tools such as NI DIAdem or MATLAB. Domain engineers like these tools for a good reason. They can focus on the actual analysis, deciding which signals to compare and which conditions define a critical event, without becoming experts in big-data frameworks and distributed computing. But the tools don't scale, analyses based on isolated scripts are hard to reproduce, and the data sits outside the governance the rest of a modern enterprise relies on. Impulse is a Python-based analytics library, published as a Databricks Labs project, that closes this gap on the Databricks Intelligence Platform. At its core (Figure 1), Impulse provides three key ingredients: A declarative Time Series Analytics Language (TSAL) that lets engineers express signal arithmetic, event conditions, and aggregations in natural Python without requiring Spark expertise. A pluggable query engine that compiles TSAL expressions into distributed Spark execution across thousands of recordings stored in any input data layout. Domain-aware abstractions that map directly onto how engineers think about their data, including measurement containers, sensor channels, operating events, and duration- and distance-weighted aggregations.

In this blog post, we show how Impulse powers AVL's Lakehouse for Measurement Data on Databricks. AVL is a world-leading mobility technology company that specializes in the development, simulation, and testing of vehicle and energy systems. They work with measurement and simulation data to validate designs, understand system behavior, and accelerate data-driven product development from virtual models to real-world testing. We walk through the lakehouse architecture, three complementary usage modes that serve domain engineers, data engineers and data scientists alike, and the impact AVL has seen in production. Impulse builds on a hierarchical Silver-layer data model co-developed with Mercedes-Benz and described in our previous blog post .

Figure 1 – Architecture of Impulse. The framework comprises three components. TSAL is a declarative Python DSL for expressing signals, events, and aggregations without requiring Spark expertise. The pluggable Query Engine compiles TSAL expressions into distributed Spark execution plans and executes queries on Silver layer data. Domain-aware Aggregations include duration- and distance-weighted 1D/2D histograms and event-scoped statistics. Impulse eventually writes results to a Gold-layer star schema.

2. The architecture - a lakehouse for measurement data AVL’s platform follows the Medallion Architecture, with Unity Catalog providing governance across all layers and Databricks Workflows orchestrating the pipeline (see Figure 2). 1. Source and Ingestion: Raw measurement files (e.g in ASAM MDF4 format) are ingested into the Bronze layer using a Databricks Solution Accelerator. AVL extended this accelerator to work with AVL Concerto , their measurement data management system that supports multiple proprietary file formats. Contextual metadata (vehicle IDs, software versions, project tags, etc.) is ingested alongside the recorded files. 2. Silver Layer: Bronze data is transformed into the hierarchical data model for measurement data. The model organizes data around containers (i.e. individual files) and channels (sensor signals), each enriched with container-level and channel-level attributes/tags and metrics. The silver layer stores validated and quality-assured data prepared for analytical processing. Data quality-assurance rules are implemented using the Databricks DQX framework and are fully configurable and customizable to meet specific downstream analytics needs. Please see our previously published blog post for more details on the silver layer data model. 3. + 4. From Silver to Gold: The Silver layer feeds into Impulse, which translates declarative analysis logic into distributed Spark execution. Outputs can be a Gold-layer star schema for reporting, ad-hoc DataFrames for exploration, or feature matrices for ML (see Section 5). 5. Serve and Analysis: BI tools like Databricks Dashboards or Lakehouse Apps consume Gold-layer data via SQL Warehouses, enabling interactive exploration without touching the compute pipeline.

Figure 2 – High-level reference architecture of the Lakehouse for Measurement Data. (1) Raw measurement files are ingested into the Bronze layer. (2) Data is transformed into the standardized Silver layer data model. (3+4) Impulse translates declarative analysis logic into distributed execution and produces Gold-layer outputs. (5) BI tools and Lakehouse Apps serve the results to end users. See text for details.

3. Putting Impulse to work: a complete analysis in 10 lines of Python The best way to understand Impulse is to see it in action. In this section, we walk through a minimal but realistic example: selecting battery temperature sensors, defining a thermal runaway risk event based on those sensors, and calculating a duration-weighted histogram, all using the Time Series Analytics Language (TSAL). Selecting physical channels & defining virtual channels The starting point for any analysis is selecting the physical sensor channels of interest. The QueryBuilder searches the Silver-layer metadata tables and returns a TSAL expression. In the example below, we retrieve the highest and lowest cell temperatures from our EV platform and compute the temperature imbalance (delta):

Note that the single line for defining the virtual channel encodes a non-trivial computation. The framework automatically...

Excerpt shown — open the source for the full document.

Notability

notability 1.0/10

Case study, not AI model release