Enabling Evolutionary Database Development: database branching with Lakebase, continued
Captured source
source ↗Enabling Evolutionary Database Development: database branching with Lakebase, continued | Databricks Blog Skip to main content
This series revisits the methodolgy of Evolutionary Database Design, twenty years later. A key constraint to database-changes-as-code has always been around shared database resources. With copy-on-write branching in Databricks Lakebase, a one-second, zero-storage-at-creation branch of a terabyte-scale production database is now an O(1) operation, and the constraint that kept Practice #4 ( everybody gets their own database instance) aspirational has lifted. In this series, the authors describes what changes when the constraint lifts: not the methodology, that holds, but the practices that emerge for the first time, the team-scale governance that becomes automatic, the role evolution for the DBA, and the new capability that agents share with their human counterparts. Jen is the developer character from Evolutionary Database Design . In that essay she implemented a database refactoring, splitting an inventory_code field into location_code , batch_number , and serial_number , as a routine user story, illustrating that DBAs and developers can collaborate, schemas can evolve in small increments, and migrations carry the change forward safely. The series picks up with Jen twenty years later. The methodology she follows is the same one she followed in 2003. What's new is the technical capability underneath her workflow, enabled by the lakebase architecture : copy-on-write database branching, which makes the practices she has been reading about operationally real at production scale. Across the three parts of this series she is the same Jen at three scopes, her day (Part 1), her new playbook (Part 2), and her team (Part 3). Part 1 walked Jen through one feature. The practices she followed were described in the 2003 Evolutionary Database Design essay, expanded in the 2006 Refactoring Databases book , and brought into the CI/CD pipeline in the 2010 Continuous Delivery book (Chapter 12). Original Seven Practices The 2003 essay named seven practices. Five of the seven had limitations in their application until 2026. DBAs collaborate closely with developers. All database artifacts are version controlled with application code. All database changes are migrations. Everybody gets their own database instance. Developers continuously integrate database changes. All database changes are database refactorings. Developers can update their databases on demand.
Limitations in their application Practice #1 (DBA collaboration). Every schema change had production-scale consequences if it got loose, so DBA review remained synchronous and gating. Collaboration was constrained by the DBA's calendar. Practice #4 (Everybody gets their own database instance). Licensing costs, infrastructure costs, DBA time. Aspirational on most teams. Most teams fell back to shared development databases and accepted the contention. Practice #5 (Continuous integration of database changes). The 2010 Continuous Delivery wave brought migrations into the pipeline, but the pipeline ran migrations against shared target databases. Per-pipeline isolation was missing. Practice #6 (All database changes are refactorings). Applying each refactoring required practice spaces (test databases) that most teams did not have at PR granularity. Practice #7 (Developers update on demand). Developers could run migrations against shared environments on demand, but could not safely experiment, because their experiments would affect others.
What Changed The technology introduced by Databricks Lakebase removes the roadblocks to the implementation of the above practices. Databricks Lakebase is a managed Postgres database that uses the same object storage layer (the data lake) that the rest of the Databricks lakehouse runs on. The database's data lives in shared, durable storage – effectively S3 buckets; the Postgres engine runs as a separate compute layer above it. Compute and storage scale independently. The engine can scale up under load, down when traffic drops, and to zero when idle. Lakebase is integrated with the unity catalog allowing for unified governance across multiple environments. Copy-on-write database branching is what the decoupled architecture makes practical at scale. A branch creates a new pointer into the same shared storage with a divergence marker. Until the branch writes, it shares all pages with its parent. When the branch writes, only the modified pages diverge; the parent stays untouched. Branching is a metadata operation, no data copy required, completing in roughly one second regardless of parent size. The branches maintain data in the changed pages only. When the technical cost of a branch is decoupled from the size of the data inside it, the constraint behind Practice #4 (everybody gets their own database instance) is unblocked. Per-developer, per-PR, per-experiment branches become routine. The compensating layer above can come out. Mocks come out of the test loop, replaced by real Postgres on a per-test branch. Shared staging stops being the only place to test schema changes. In-memory database substitutes (H2, SQLite) come out of the unit-test layer. The devops tax to create docker based containers to run local databases is not necessary, DBA ticket queues for provisioning shrink because branches are self-service. The technology is what enables methodology optimizations and completes the goal of the practices behind the original 2003 post. Emerging practices for 2026 Lakebase copy-on-write database branching lifts the previous limitations on the original practices and enables four additional ones for elaboration. DBAs collaborate closely with developers. With the schema diff posted on every PR, the DBA reviews async, like any other code reviewer. Since the provisioning tax is negligible the DBA’s now have the bandwidth to review and maybe even work with the developers to create the solutions in the first place instead of reviewing post implementation. All database artifacts are version controlled with application code. The schema diff, the database migrations and migration test results now join the artifact set. All database changes are migrations. Plus a new authorship rule: idempotency. Allowing all merged migrations to be deployed to downstream environments like QA, staging and Production automatically. Everybody gets their own database instance. Operational at…
Excerpt shown — open the source for the full document.
Notability
notability 4.0/10Routine blog post, notable company but not a major release.