Guide: Developing Attribute Lineage for New Data Sources

This process describes how to take a new data source from landing → Bronze → DataMart view, and fully define its attribute lineage in myBMT.


1. Prepare and review the new data source

  1. Confirm the source and landing
    • Understand what the source is (system, file, API, etc.).
    • Check that it is (or will be) dataflowed into the landing/import area.
  2. Check that the dataflow to landing/import exists
    • Ensure there is a dataflow that:
      • Brings the source into Import/Landing.
      • Has a clear table name and structure that can be catalogued in myBMT.
  3. Review the fields in the source
    Questions to answer:
    • Is it one row per entity (e.g. Project, Employee, Customer)?
    • Or is it an item/fact table (e.g. multiple rows per project, per employee, per customer)?
    • Can each row be mapped to a standard entity (project, employee, customer, busopp) using known identifiers?
  4. Standard identifiers to aim for We are standardising on:
    • IFS Company ID
      • IFS COMPANY ID (1072) for our main company.
    • Project
      • IFS Project ID
    • Customer
      • IFS Customer ID (Customer Company)
      • Customer contact key
        UPPER(LEFT(HASH(MD5, LOWER(CustomerEmail))), 8))
    • Employee
      • IFS EmpNo or
        UPPER(LEFT(HASH(MD5, LOWER(BMTEmail))), 8))
    Important: If these identifiers are not present, you may need to derive them in the dataflow before Bronze. Doing this early makes attribute lineage much easier and more consistent later.

2. Catalogue and land the source

  1. Catalogue the data source in myBMT
    • Create or update the source entry so that:
      • The landing/import table is known.
      • The dataflow from Import/Landing → Bronze is recorded.
    • This step is required so that the dataflow and table appear correctly in myBMT.
  2. Transport to Bronze
    • Ensure the dataflow moves the table from Import/Landing to Bronze.
    • Run the dataflow.
  3. Verify the Bronze table
    • Confirm that the table appears in Bronze with:
      • Expected table name.
      • Expected row counts (roughly).
      • Expected columns and data types.

3. Plan the DataMart view(s)

  1. Identify the correct DataMart domain
    Decide which DataMart the new source belongs to:
    • project, employee, customer, busopp, etc.
  2. Decide whether it appends to an existing view or becomes a new view
    • Ask:
      • Does this source logically extend an existing DataMart view?
        e.g. more attributes for the same entity → may be an append.
      • Or is this a distinct, new slice of data?
        e.g. new type of codes, new metric set → likely a new view.
    • In many cases this will be a new view, either:
      • A major view (e.g. busopp.meta_values_newsource), or
      • A subview feeding another (e.g. busopp.meta_codes).
  3. Respect the Codes / Dates / Values split
    • We separate attributes based on their formatting and usage:
      • *_meta_codes → codes, enums, short identifiers (+ descriptions via joins).
      • *_meta_dates → dates, always ISO YYYY-MM-DD.
      • *_meta_values → numeric and textual values.
    • Decide which category your new attributes belong to and plan view names accordingly.
  4. Create the view definitions in myBMT
    • In myBMT, create the view entries with their correct names
      (e.g. project_meta_codes_newsource, employee_meta_dates_xxx, etc.).
    • Ensure view naming is consistent with existing patterns.

4. Set up attribute lineage in myBMT

For each new view you’ve defined, you now configure attribute lineage.

  1. Open the Attribute Lineage section for the view
    • In myBMT, navigate to the new view.
    • Go to the Attribute Lineage section.

4.1. Define entity identifiers (OBJECT_SEQ)

  1. Create two rows for META_TYPE = OBJECT_SEQ

These define how we uniquely identify the entity the view belongs to.

  1. Row 1: COMPANY_ID
    • META_TYPE: OBJECT_SEQ
    • ATTRIBUTE: COMPANY_ID (or whatever is standardised)
    • Logic: This must resolve to the IFS COMPANY ID, typically 1072, or equivalent for the relevant company.
    • DESC (order field): 0
      • This ordering is used in arrays / key construction.
    • Source table/field: map directly to the company field from the Bronze table.
    • Source status: set to something positive like Direct match.
  2. Row 2: entity_ID
    • META_TYPE: OBJECT_SEQ
    • ATTRIBUTE: (e.g. PROJECT_ID, EMPLOYEE_ID, CUSTOMER_ID, etc., as per domain standard)
    • Logic: must resolve to the standard entity identifier for the domain.
    • DESC: 1 (second in the key order).
    • Source table/field: map to the appropriate identifier in the Bronze table.
    • Source status: again, use a positive / usable status (e.g. Direct match, Transformed match).

You can use JOINS and WHERE clauses in these lineage rows, but they are generally less robust. Prefer a direct field mapping where possible.

  1. Check other parameters on these rows
  • Ensure:
    • View name is correct.
    • Source table name is filled in.
    • Source field expression is accurate.
    • Source status is something that indicates the row should be included in the generated code.
      (Avoid negative or “ignore” style statuses – those are excluded from the code.)

4.2. Define item/fact keys (ITEM_KEY) if applicable

  1. Decide if the view is a fact or item table
  • If the view represents multiple rows per entity (e.g. transactions, periods, activities), it is likely an item/fact table.
  1. Create META_TYPE = ITEM_KEY rows
  • You should define at least:
  1. SEQ_KEY
    • META_TYPE: ITEM_KEY
    • ATTRIBUTE: SEQ_KEY
    • Logic: a composite key built from one or more fields, for example:
      COALESCE_WS('|', key1, key2, key3)
      using the pipe | as separator.
    • This ensures each row can be uniquely identified.
  2. Primary key(s)
    • For example:
      • PERIOD_ID
      • ACTIVITY_ID
      • TRANSACTION_ID
    • Each key gets its own ITEM_KEY row.
    • Use the DESC field to define the order when there are multiple keys
      (e.g. 0 for the main key, 1 for secondary, etc.).
  3. Complete parameters for ITEM_KEY rows
  • As with OBJECT_SEQ, ensure:
    • View name is correct.
    • Source table and field expressions are filled.
    • DESC ordering is set correctly.
    • Source status is a usable type (e.g. Direct match).

4.3. Map all other attributes

Now define the remaining attributes (codes, dates, values) with appropriate META_TYPE and ATTRIBUTE names.

  1. Choose the correct META_TYPE
    Typical types:
  • STATUS
  • COST
  • LIMIT
  • JOURNAL
  • VALUES
  1. Ensure ATTRIBUTE names are unique within a view
  • Each ATTRIBUTE must be unique per view.
  • If you have many fields with the same underlying meaning, such as multiple ACTUAL_COST sources:
    • Avoid ugly names like ACTUAL_COSTe, ACTUAL_COSTt, etc.
    • Instead, consider separating into two views, e.g.:
      • *_meta_values_estimates
      • *_meta_values_totals
    • and then use the convention META_TYPE = COST and ATTRIBUTE = ACTUAL
  1. Standard for Code values
  • Code fields should:
    • Be UPPER CASE, no spaces.
    • Be kept relatively short (target < 24 characters, maximum ~32).
    • Not be used for free text like email, long descriptions, etc.
  • If you need a stable code from free text, use a hash, e.g.:
    UPPER(LEFT(HASH(MD5, LOWER(myfield))), 8))
    and store the free text in the DESC field.
  1. Always add descriptions for codes via JOINs
  • For each code attribute:
    • Add a LEFT JOIN in the lineage to bring in a description field.
    • The description is used to help report designers and should be mapped consistently.
  • Check if this code has been used before:
    • Reuse naming patterns and logic where possible.
    • Keep behaviour consistent across domains and views.
  1. Date attributes
  • All dates must be capable of formatted as YYYY-MM-DD (e.g. 2025-03-31).
  • Ensure your expressions cast/convert dates into this format.
  1. Value attributes
  • Numeric and textual values will be:
    • Passed through as float (for numeric) automatically.
  • Just ensure the expressions in the lineage are correct and consistent.
  • You may wish to create the array statement as a DESC
    {“unit”:”‘ + CURRENCY_CODE + ‘”}

5. Generate and review the SQL

  1. Use the “Create CTE (uks)” button in myBMT
  • When your attribute lineage is defined:
    • Click [Create CTE (uks)].
    • This generates a prettified view of the SQL code for the CTE-based view.
  1. Copy the generated code to VS Code
  • Use the copy button in myBMT.
  • Paste into your VS Code workspace for Local review.
  • Execute/Run the view against Bronze

6. Test and deploy

  1. Test the SQL locally / in DEV
  • Execute the generated view definition in the development environment.
  • Check:
    • Row counts.
    • Key uniqueness (OBJECT_SEQ and ITEM_KEY combinations).
    • Data types (dates, codes, values).
    • JOINs for descriptions (codes resolve correctly).
  1. Fix any issues and regenerate if needed
  • If changes are required:
    • Update the attribute lineage in myBMT.
    • Regenerate the SQL via [Create CTE (uks)].
    • Retest.
  1. Add Purpose and Deployment Change Comment in myBMT (before generating SQL)
  • Before deployment, using Create CTE (uks), the engineer must enter:
    • Purpose
    • Deployment / Change Comment
    • directly into the appropriate fields in myBMT.
  • The generator will then place them automatically in the correct header sections of the SQL output — no manual copying or VS Code editing is required.
  1. Deploy via VS Code to the repository
  • Once validated:
    • Commit the SQL script to the repository (using normal branching/PR process).
    • Follow standard deployment steps to move from Dev → Test → Prod.

Leave a Comment