Issued by: Data & Analytics Team
Applies to: All data integrations, pipelines, and views entering or leaving the Data Warehouse
Status: Approved Draft
Date: [Insert Date]
Version: 1.0
1. Purpose
This standard defines the structure and principles for integrating data from source systems—including ERP (IFS), Dynamics 365, Power Apps, and SharePoint—into the enterprise Data Warehouse.
It ensures that data can be ingested with minimal transformation, remains joinable and stable, and achieves consistent quality, traceability, and trust.
Objectives:
- Enable low-friction ingestion across diverse systems.
- Define stable identifiers and consistent relationships.
- Prevent brittle joins on labels or non-unique values.
- Establish measurable data quality criteria (target score ≥ 7 / 10).
2. Scope
Applies to:
- All source-to-target integrations across the medallion architecture (Bronze → Silver → Gold → Platinum).
- All business domains (Project, Employee, Customer, Supplier, Finance, CRM).
- All environments (Development, Test, Production).
Excludes:
- Operational process redesign within source systems.
- Enforcement of ERP business logic at source.
3. Principles
- Clarity of identity – every record has a stable and auditable key.
- Consistency of structure – like entities use like field names and types.
- Minimal transformation – ingestion logic limited to conformance only.
- Integrity of relationships – links between tables are explicit and testable.
- History preserved – change is versioned, not overwritten.
- Data contract first – structure agreed before load.
- Measurable quality – scored against the Data Quality Rubric.
- Look in the box first – always inspect and respect the source system structures before designing transformations or new views.
- Use the fields, relationships, and identifiers as provided by the source system; do not invent new keys or constructs where suitable ones already exist.
- Validate how data behaves in practice — confirm that joins, lookups, and field values are reliable.
- Document any anomalies or deviations before creating derived tables or mappings.
Principle: Data integrity starts with understanding and respecting the source.
We do not “make it up” — we observe, confirm, and then design.
4. Identity and Keys
4.1 Company Identity
Each record must contain a route to the Company_ID using the ERP/IFS standard code.
This serves as the enterprise partition key.
4.2 Entity Identifiers
| Entity | Business Code | Example |
|---|---|---|
| Project | Project_ID | D012345 |
| Employee | Employee_Number (EMP_NO) | E1234 |
| Customer | Customer_Code (Customer_ID) | CUST-001 |
| Supplier | Supplier_Code (Supplier_ID) | SUP-009 |
| Person | Email_Address | name@uk.bmt.org |
4.3 Source Key and Environment
Retain the native system identifier (e.g. GUID) as Source_Key, plus Source_System and Source_Env for lineage.
Key Stability: If a source key changes, treat it as a rotation event—record lineage but do not overwrite history.
4.4 Warehouse Object Sequence
OBJECT_SEQ Generated from a hash of (Company_ID, Business_Code) for joins in the warehouse.
4.5 Lookup Triplet Requirement
Every lookup must include:
Lookup_Key– system or numeric IDLookup_Code– coded/abbreviated valueLookup_Description– readable label
Rule: Joins must use
Lookup_KeyorLookup_Code, never the description.
4.6 Explicit Relationships
Declare all parent–child relationships in the data contract.
Logical foreign keys must be represented even if not enforced physically.
4.7 Orphan Handling
Where a record does not meet the relationship or join rules (for example, a child record with no valid parent key):
- The record will not be loaded into the conformed layer (Silver or above).
- The dataset is treated as filtered rather than reconciled — such records are simply excluded from the joined output.
- If a table within the dataflow is known to have suspect or incomplete joins, a raw, unjoined version of that table will also be submitted to Bronze for inspection and traceability.
- This ensures visibility of all source data without polluting the conformed dataset.
Note:
Lost records may later reappear once the parent relationship is corrected at source. These will be treated as new valid entries, not reinstatements of previous orphans.
The raw Bronze copy provides an audit trail for review but is not intended for end-user reporting.
5. History and Change Management
5.1 Slowly Changing Dimensions (SCD2)
Use SCD2 for all descriptive data:
Effective_From,Effective_To,Is_Current.
5.2 Status and Ownership
Volatile fields (e.g. project status, owner) captured as separate history tables or change events.
5.3 Soft Deletes & Row Presence Verification
Soft delete when a source row disappears.
Use RPV logic to detect missing/stale rows across loads.
7. Ingestion and Conformance
7.1 Bronze (Raw) Layer
One-to-one with source. Add:
Load_TimestampSource_SystemSource_EnvChecksum/Row_Hash
7.2 Silver (Aligned) Layer
Adds:
- Business codes + surrogate keys
- Lookup triplets
- Explicit relationships
- SCD and RPV scaffolding
No business calculations permitted.
7.3 Gold (Delivery) Layer
Domain-specific marts with metrics and derived logic.
7.4 Platinum (Advanced Analytics)
Curated ML/AI datasets; high-trust aggregates only.
8. Data Quality Gates and Scoring
8.1 Rubric Overview
| Score | Descriptor | Meaning |
|---|---|---|
| 1 | Broken | Report/view does not work, fails to run, or gives unusable results. |
| 2 | Unreliable | Runs but contains serious errors, missing data, or misaligned logic. |
| 3 | Poor | Works but is inaccurate or very incomplete; unsuitable for business use. |
| 4 | Weak | Partial data is correct, but structure or reliability prevents confidence. |
| 5 | Adequate | Meets minimum needs; functional but clunky or limited in scope. |
| 6 | Fair | Generally correct, but with notable gaps, inefficiencies, or usability issues. |
| 7 | Good | Produces reliable and accurate results; some improvements desirable. |
| 8 | Strong | Well-structured, accurate, and trusted; minor refinements could enhance. |
| 9 | Excellent | High-quality, efficient, user-friendly, and widely trusted. |
| 10 | Outstanding | Best-in-class, model of design and usability, requires no improvement. |
8.2 Gate Criteria
| Check | Target | Action |
|---|---|---|
| Company_ID completeness | 100 % | Fail if missing |
| Lookup triplets present | 100 % | Warn if < 100 % |
| FK resolvable | ≥ 99.5 % | Quarantine if below |
| Key stability | No mutation | Audit rotation |
| SCD2 validity | ≥ 99 % | Warn if below |
| RPV freshness | Within SLA | Flag stale |
9. Governance and Change Control
DataMart design and review must follow the Playbook: DataMart & Reporting Development Process to ensure consistency of standards, quality, and approval steps.
- Design Review – mandatory for new or changed integrations.
- Versioning – pipelines and contracts carry semantic versions (e.g. 1.2.0).
- Issue Workflow – Detect → Quarantine → Investigate → Resolve → Re-release.
- Documentation – All artefacts linked in KnowHow and myBMT with version tag.
Exceptions and Waivers
- Exceptions must be formally documented and approved by Data Governance prior to implementation.
- Temporary waivers (e.g. missing lookup codes or unresolved joins) must include a remediation plan and target resolution date.
- Exceptions and Waivers identified during the DataMart view design stage may cause the project to be formally blocked until an acceptable resolution or mitigation is agreed with Data Governance.
- Waivers older than 90 days will be reviewed for closure or escalation.
10. Responsibilities
| Role | Responsibility |
|---|---|
| Data Engineer | Implement and maintain conformity. |
| Data Owner | Ensure stable identifiers and completeness. |
| App Developer | Expose required keys and lookup values. |
| Data Governance | Review quality scores; approve exceptions. |
| Head of Reporting | Approve Gold-layer release for publication. |
11. References
- DAMA-DMBOK v2 – Data Management Body of Knowledge
- ISO 8000 – Data Quality Standard
- Kimball Group – Dimensional Modelling Techniques
- Microsoft Dataverse Design Guidance
12. Review and Maintenance
- Reviewed annually or after major architecture change.
- Updates submitted via the Data & Analytics Governance Board.