What are Slowly Changing Dimensions (SCD)?

TL;DR:

Slowly Changing Dimensions, or SCDs, are the techniques data warehouses use to handle attributes that change over time. A customer moves house, a product gets reclassified, an employee changes departments. The question SCDs answer is: should the warehouse remember the old value, the new value, or both? Different SCD types make different trade-offs between historical accuracy, storage cost, and query complexity.

Why SCDs Matter

Dimension attributes change. A customer's address updates. A product's category gets reorganized. A store changes region. The way the warehouse handles that change determines whether historical analytics stay accurate.

Imagine a sales report for last year, broken down by region. If a store moved from "Northern Region" to "Central Region" in March, which region should its January sales appear under? The original "Northern," preserving point-in-time accuracy? Or the current "Central," reflecting today's structure? Both answers are valid. SCD types are how you encode that choice in the data model.

The decision compounds across every dimension, every attribute, every reporting layer. Get the SCD strategy wrong and the warehouse quietly produces misleading reports. Get it right and analysts trust the numbers.

The Main SCD Types

Ralph Kimball, the same architect behind star schema, defined the classic SCD types. Most teams use a mix of them rather than picking one.

Type 0, Fixed. Attribute never changes. Useful for things like date of birth or original signup date.
Type 1, Overwrite. Old value is replaced with the new one. No history kept. Simple to implement, but historical reports will silently use the new value as if it had always applied.
Type 2, Add new row. A new dimension row is inserted with the updated values, the old row is closed out with an end date, and a new surrogate key is generated. Full history preserved. This is the most common type for attributes where time-accuracy matters.
Type 3, Add new column. A "previous value" column tracks the prior state alongside the current one. Limited history (only one change), but useful for tracking the most recent flip without inflating row counts.
Type 4, History table. Current values live in the main dimension; historical values go in a separate history table. Keeps the active dimension small.
Type 5, Mini-dimension. A rarely used variation that offloads volatile or high-cardinality attributes into a smaller secondary dimension, keeping the main dimension lean.
Type 6, Hybrid (1+2+3). Combines overwrite, new row, and new column techniques in one dimension. Maximum flexibility, maximum implementation complexity.

In practice, most warehouses run on a mix of Type 1 for attributes nobody analyzes historically (display preferences, sort orders) and Type 2 for anything that affects reporting accuracy (geography, segmentation, hierarchy).

The Engineering Cost

SCDs sound straightforward in theory. In production they generate some of the most tedious work in a data engineering backlog:

Effective date management. Every Type 2 row needs valid_from and valid_to columns, and the logic to close out the previous row when a new one lands.
Surrogate keys. Type 2 means the same business entity has multiple dimension rows, so facts need to join to the version of the dimension that was active when the event happened, not the current one.
Late-arriving changes. A customer's address change might land in the warehouse two weeks after it happened. Inserting it correctly means rewriting the affected fact rows, not just appending new ones, and tracking those corrections cleanly depends on solid data lineage.
Schema evolution. When source systems add or rename attributes, the SCD logic has to follow. Miss it and schema drift quietly corrupts your history.

This isn't conceptually hard work. It's just a lot of it, and it has to be right every time, or the historical accuracy that justified the design quietly disappears.

SCDs in a Lakehouse

SCDs aren't tied to traditional warehouses. They show up in medallion architecture too, typically as Type 2 transformations applied between the silver and gold tiers, where conformed entities get turned into time-aware dimensions for analytics.

Modern table formats like Delta Lake, Iceberg, and Hudi have made Type 2 implementations easier with ACID transactions and MERGE syntax. The underlying logic (which rows to close, which to insert, how to handle late arrivals) still has to be authored and maintained for every dimension.

The Maia Advantage

Slowly Changing Dimensions are the textbook example of work that's important, repetitive, and unforgiving. Every dimension needs its own SCD logic. Every change to a source system risks breaking it. Every late-arriving update can corrupt history if it's not handled correctly.

That's where Maia's Context Engine comes in. Encode how each dimension should behave (Type 1 for non-analytical attributes, Type 2 for anything that affects historical accuracy) and Maia Team's AI agents build, run, and maintain the pipelines that keep dimensions current over time. When source schemas change, the model adapts. When late updates land, Maia identifies the affected records and surfaces the correct resolution path, so history stays clean without manual triage. SCDs become part of broader autonomous data engineering: well-understood work that runs itself, so data teams can focus on new questions instead of maintaining the answers to old ones.

SCDs shouldn't be a maintenance burden.

‍

Enjoy the freedom to do more with Maia on your side.

Book a 30-minute live demo

Soft yellow abstract background with smooth gradients and rounded edges.