Overview
The cdm__Archive_upsert notebook is designed to help keep accurate historical records of data. It implements a technique called Slowly Changing Dimension Type 2 (SCD2), which means instead of overwriting old information, it creates a timeline of changes so you can see what was true at any point in time. This method ensures that previous versions are marked as expired while new versions are added as active, giving you a complete history of changes. This approach enhances auditability and reporting reliability.
Why Do We Need It?
Business data changes constantly, If you simply replace old values, you lose the ability to answer questions like:
- What was the price last month?
- Which version of this record was active on 30 October?
The notebook solves this by applying a Slowly Changing Dimension (SCD) Type 2 approach, which preserves history while keeping the current version active.
What Happens Behind the Scenes?
Here’s the process in plain language:
- Find the Right Table
The notebook identifies the correct data table in the Lakehouse environment based on the parameters. - Check for Changes
It compares the latest data with what’s already stored. If something has changed or is new, it prepares to add rows for the new version. - Keep History Intact
Instead of deleting or overwriting, the notebook marks old records as “expired” and inserts the new version as “active”. This way, you have a full history of changes. - Handle Special Cases
For certain tables (like those containing “values”), it also updates unchanged records with an archive date, creating a clear snapshot of the data at that point in time. - Log Everything
Every run is logged with details like status, date, and how many records were updated. This makes monitoring and auditing easy.
Why Is This Important?
- Compliance: Many industries require historical data for audits.
- Analytics: Trend analysis depends on knowing what changed and when.
- Trust: Stakeholders can rely on accurate, time‑aware data.
Key Benefits
- Automated history tracking – no manual versioning.
- Clear audit trail – logs every update.
- Future‑proof – supports evolving data without losing the past.