Semantics Layer
The Semantics layer is the second hierarchical level in the Denzing Metadata framework. It defines the meaning and context associated with the raw data, ensuring that LLMs can interpret metrics and KPIs correctly based on their intended purpose. This layer builds upon the foundational Schema, and information defined at the Schema level can be overridden here.
Purpose
- To provide a deep contextual understanding of KPIs and Metrics, going beyond their physical structure.
 - To ensure the LLM interprets KPIs correctly, aligning with their business meaning and definition.
 - To significantly improve the accuracy of usage and enable more context-aware responses from the LLM.
 
Allowed Top-Level Keys
When defining a semantics file, only the following lowercase, top-level keys are permitted:
foldertypesourcemetricsattributes
Components of the Semantics Layer
folder (Required)
This key serves as a logical categorization for the semantics definition. It represents a logical grouping or hierarchical structure to which the data belongs, enhancing accessibility and context.
folder: hamro_system
type (Required)
This key specifies the fundamental type of the semantic definition, categorizing it as either a fact (representing measurable quantities or events) or a dimension (representing descriptive attributes that characterize facts).
type: fact
source (Required)
This key indicates the origin of the data elements (columns, metrics, or attributes) being utilized within this semantic definition. It ensures traceability and clarifies where the underlying data comes from.
- Any columns, metrics, or attributes from other files that are referenced in the current definition must be explicitly listed under the 
sourcesection. - The import path must start with the layer (
schemaorsemantics), followed by a dot (.), and then the file name (without the.ymlextension). 
source:
  schema.sales: # Importing columns from 'sales.yml' in the schema layer
    columns:
      - sales_amount
      - sales_cost
      - markdown_amount
      - tax_amount
  schema.product: # Importing columns from 'product.yml' in the schema layer
    columns:
      - item_id
  semantics.product: # Importing from 'product.yml' in the semantics layer
    metrics:
      - department
metrics (Optional)
This section is specifically designed to define all compound metrics, which are typically derived calculations or aggregations. Each metric is uniquely identified by an identity name.
name(Required): The human-readable and searchable display name for the metric.synonym(Optional): A list of alternative searchable names for the metric.calculation(Required): A string defining the formula or computation for the metric.- The calculation value must always be a string.
 - All metrics, attributes, or columns referenced within the calculation must be enclosed in square brackets 
[]. - Always reference the identity name of the item inside the brackets, not its display 
name. - Mixing schema columns with semantic metrics/attributes within a single calculation is strictly prohibited and will result in a validation error.
 
data_type(Optional): A string defining the nature of data, such ascurrency,quantity,ratio, orpercentage. Supported types includenumeric,text,date,datetime,geo,currency,ratio,percentage,boolean, andurl.
metrics:
  sales_amount: # Defines a metric column
    name: total amount of product sold
    synonym: ["sales", "samt"]
    data_type: currency
    desc: sales amount
    calculation: "[sales_amount] * [price]"
attributes (Optional)
This section is used to define new attributes or override existing ones from the schema. The rules for defining attributes are largely similar to those for metrics, including the use of name, synonym, data_type, and calculation (if the attribute is derived).
Best Practices for Semantics Design
Effective semantics design is crucial for building a robust and understandable metadata layer that empowers LLMs and business intelligence tools.
- Provide Clear Contextual Meaning:
- The primary goal of semantics is to define the meaning and context of your data.
 - Use the 
desckey to provide a brief, clear explanation for users and the LLM. 
 - Maintain Naming Consistency and Discoverability:
- Use descriptive and unambiguous 
namevalues for all metrics and attributes. - Leverage the 
synonymkey to provide alternative searchable terms. This significantly improves discoverability for natural language queries. 
 - Use descriptive and unambiguous 
 - Strict Adherence to Calculation Rules:
- Ensure all 
calculationdefinitions are provided as a single string. - Always enclose all referenced items within square brackets 
[](e.g.,"[sales_amount] * [price]"). - Reference the identity name (the key of the definition) inside the brackets, not its display name.
 - Strictly avoid mixing schema columns with semantic metrics or attributes in a single calculation.
 
 - Ensure all 
 - Accurate Data Sourcing (
sourceKey):- Explicitly define all imported columns, metrics, or attributes under the 
sourcekey. - Follow the correct import path format: 
schema.<filename>for schema columns andsemantics.<filename>for semantic metrics/attributes. 
 - Explicitly define all imported columns, metrics, or attributes under the 
 - Logical Categorization (
folderandtype):- Utilize the 
folderkey to logically group related semantic definitions. - Correctly assign the 
typeas either fact or dimension based on the nature of the data. 
 - Utilize the