cdm_Archive_to_STG

Overview The cdm_Archive_to_STG notebook is a critical component in the cdm_today and cdm_Archive pipeline. Its primary role is to create a staging table that represents a point-in-time snapshot of source data, which is then used by the cdm__Archive_upsert notebook to accurately update the main dimension table. This staging layer acts as a buffer between raw … Read more

myOSH_To_Bronze

Overview The myOSH_To_Bronze notebook is designed to handle multiple MyOSH API endpoints dynamically, meaning it can process different endpoints such as records or users without hardcoding logic. For most endpoints (like users), the process is straightforward: it performs a regular API call and writes the response directly to JSON in the source container and to … Read more

My_osh_Archive

Overview The myosh_Archive notebook is designed to maintain a complete and accurate archive of records from the MyOSH API. It ensures that no data is lost by identifying gaps in the current API response and backfilling missing records, then merging everything into a single, consolidated archive stored in the Azure Data Lake container. Why Do … Read more

cdm__Archive_upsert

Overview The cdm__Archive_upsert notebook is designed to help keep accurate historical records of data. It implements a technique called Slowly Changing Dimension Type 2 (SCD2), which means instead of overwriting old information, it creates a timeline of changes so you can see what was true at any point in time. This method ensures that previous … Read more

Refresh SQL Endpoint Metadata

When working with Microsoft Fabric, one common challenge is keeping the SQL endpoint in sync with the Lakehouse endpoint after a pipeline run. Fabric provides two endpoints for every Lakehouse: Why does the SQL endpoint sometimes lag? The SQL endpoint doesn’t automatically refresh its metadata the moment new data lands in the Lakehouse. Instead, a … Read more

Architecture Rules of the Road

A calm, consistent, and clever approach to data engineering. 🔹 1. Look in the Box First Before inventing a workaround, check what already exists.If Microsoft or IFS built it, use it — it’s likely more robust, secure, and supported. 🔹 2. Easy Landings Data should arrive safely and predictably.Keep import containers simple, standardised, and transformation-free.Bronze … Read more

Level 5 Competency Self-Assessment Matrix

Competency Area Element Indicative Score (1–5) Target (6 mo) Evidence / Example Activities Priority Data Management Devise & implement MDM processes (classification, security, quality, ethics, retrieval & retention) 4 5 Introduced soft/hard delete strategy; implemented RPV & last-seen frameworks; designed employee / project / customer domain MDM structures; strong data security awareness (container-level). M Derive … Read more

Roadmap: to Level 5 Readiness

Month Theme / Focus Area Key Actions Tangible Evidence / Deliverables Success Indicators Month 1 Baseline review & planning • Confirm current Level 5 competency scores and priorities• Align with manager/mentor on fellowship expectations• Map your influence across Data Engineering + Analytics disciplines Updated self-assessment & evidence tracker Clear alignment between Data Engineering objectives and … Read more

Proposal: Expanding Data Engineering to Embrace Fabric and AI Engineering Executive Summary

This proposal outlines a phased approach to broaden the scope of our Data Engineering capability, incorporating Microsoft Fabric as a unified analytics platform and evolving our approach to include foundational elements of AI Engineering. The goal is to create an agile, scalable, and intelligent data platform that supports operational analytics, predictive insights, and innovation. 1. … Read more