Skip to main content

Semantics Layer

The Semantics layer is the second hierarchical level in the Denzing Metadata framework. It defines the meaning and context associated with the raw data, ensuring that LLMs can interpret metrics and KPIs correctly based on their intended purpose. This layer builds upon the foundational Schema, and information defined at the Schema level can be overridden here.

Purpose

  • To provide a deep contextual understanding of KPIs and Metrics, going beyond their physical structure.
  • To ensure the LLM interprets KPIs correctly, aligning with their business meaning and definition.
  • To significantly improve the accuracy of usage and enable more context-aware responses from the LLM.

Allowed Top-Level Keys

When defining a semantics file, only the following lowercase, top-level keys are permitted:

  • folder
  • type
  • source
  • metrics
  • attributes

Components of the Semantics Layer

folder (Required)

This key serves as a logical categorization for the semantics definition. It represents a logical grouping or hierarchical structure to which the data belongs, enhancing accessibility and context.

folder: hamro_system

type (Required)

This key specifies the fundamental type of the semantic definition, categorizing it as either a fact (representing measurable quantities or events) or a dimension (representing descriptive attributes that characterize facts).

type: fact

source (Required)

This key indicates the origin of the data elements (columns, metrics, or attributes) being utilized within this semantic definition. It ensures traceability and clarifies where the underlying data comes from.

  • Any columns, metrics, or attributes from other files that are referenced in the current definition must be explicitly listed under the source section.
  • The import path must start with the layer (schema or semantics), followed by a dot (.), and then the file name (without the .yml extension).
Example: Source Imports
source:
schema.sales: # Importing columns from 'sales.yml' in the schema layer
columns:
- sales_amount
- sales_cost
- markdown_amount
- tax_amount
schema.product: # Importing columns from 'product.yml' in the schema layer
columns:
- item_id
semantics.product: # Importing from 'product.yml' in the semantics layer
metrics:
- department

metrics (Optional)

This section is specifically designed to define all compound metrics, which are typically derived calculations or aggregations. Each metric is uniquely identified by an identity name.

  • name (Required): The human-readable and searchable display name for the metric.
  • synonym (Optional): A list of alternative searchable names for the metric.
  • calculation (Required): A string defining the formula or computation for the metric.
    • The calculation value must always be a string.
    • All metrics, attributes, or columns referenced within the calculation must be enclosed in square brackets [].
    • Always reference the identity name of the item inside the brackets, not its display name.
    • Mixing schema columns with semantic metrics/attributes within a single calculation is strictly prohibited and will result in a validation error.
  • data_type (Optional): A string defining the nature of data, such as currency, quantity, ratio, or percentage. Supported types include numeric, text, date, datetime, geo, currency, ratio, percentage, boolean, and url.
Example: Metric Definition
metrics:
sales_amount: # Defines a metric column
name: total amount of product sold
synonym: ["sales", "samt"]
data_type: currency
desc: sales amount
calculation: "[sales_amount] * [price]"

attributes (Optional)

This section is used to define new attributes or override existing ones from the schema. The rules for defining attributes are largely similar to those for metrics, including the use of name, synonym, data_type, and calculation (if the attribute is derived).


Best Practices for Semantics Design

Effective semantics design is crucial for building a robust and understandable metadata layer that empowers LLMs and business intelligence tools.

  • Provide Clear Contextual Meaning:
    • The primary goal of semantics is to define the meaning and context of your data.
    • Use the desc key to provide a brief, clear explanation for users and the LLM.
  • Maintain Naming Consistency and Discoverability:
    • Use descriptive and unambiguous name values for all metrics and attributes.
    • Leverage the synonym key to provide alternative searchable terms. This significantly improves discoverability for natural language queries.
  • Strict Adherence to Calculation Rules:
    • Ensure all calculation definitions are provided as a single string.
    • Always enclose all referenced items within square brackets [] (e.g., "[sales_amount] * [price]").
    • Reference the identity name (the key of the definition) inside the brackets, not its display name.
    • Strictly avoid mixing schema columns with semantic metrics or attributes in a single calculation.
  • Accurate Data Sourcing (source Key):
    • Explicitly define all imported columns, metrics, or attributes under the source key.
    • Follow the correct import path format: schema.<filename> for schema columns and semantics.<filename> for semantic metrics/attributes.
  • Logical Categorization (folder and type):
    • Utilize the folder key to logically group related semantic definitions.
    • Correctly assign the type as either fact or dimension based on the nature of the data.