Denzing Datamart Integration Guide
This guide is intended for users and teams building datamarts to connect with the Denzing platform. It outlines best practices around schema design, transformation logic, and preparing the data for semantic and AI-driven use.
1. Table Design Principles
To ensure optimal performance and usability:
- Avoid overly wide tables (with too many columns), as they can be hard to maintain and interpret.
- Avoid overly narrow tables (with too few columns), as they lack context and increase unnecessary joins.
- Aim for balanced tables with rich, useful data for business consumption.
Recommended:
- Group related data points logically.
- Use Star Schema if the number of columns grows substantially:
- Fact tables store measurable events (e.g., sales, transactions).
- Dimension tables store descriptive attributes (e.g., customer, product, time).
2. Data Transformation for KPI Support
Leverage the transformation layer to:
- Extract and clean raw data.
- Compute intermediate columns (e.g., derived metrics like return rate, customer satisfaction scores) as per business needs.
- Pre-calculate columns where possible to support KPI generation in views.
Recommendation: Only expose columns that:
- Add value to business decision-making.
- Are used in KPI formulas or aggregations.
- Enhance semantic understanding of the data.
- Use clear and concise column names in a human-readable format.
3. Data Cleansing & Standardization
Ensure that the data being fed into Denzing is clean, consistent, and usable:
Rule | Description |
---|---|
Uniform Units | Use consistent units (e.g., USD for currency, kg for weight) |
Standard Date Format | YYYY-MM-DD HH:MM:SS , stored in UTC |
Boolean Normalization | Use Y/N instead of 1/0 or TRUE/FALSE |
Remove Redundancy | Eliminate duplicate rows and conflicting values |
Case Standardization | Enforce case rules (e.g., lowercase emails, title case names) |
Consistent Categories | Validate categorical fields against allowed values |
4. Semantic Alignment & Metadata Preparation
Design your schema and transformations to support downstream semantic processing:
- Use non-ambiguous column names that clearly describe the data they hold.
- Align naming conventions across related datasets.
- Ensure business-friendly column descriptions are prepared for metadata ingestion.
- As the end users will be non-technical personnel, ensure all naming is intuitive and easy to understand.
Example: E-commerce Order Agent
Column Name | Description | Value Type |
---|---|---|
order_id | Unique ID of the order | Text |
customer_id | Customer placing the order | Text |
delivery_days | Days taken from order to delivery | Numeric |
return_flag | Whether order was returned (Y/N ) | Boolean (Y/N) |
customer_rating | Customer-provided order rating | Numeric (1-5) |
order_accuracy_score | Derived metric for order accuracy | Float (0.0–1.0) |
5. Final Checklist Before Connecting to Denzing
- Table schema follows balanced design
- Star schema implemented, if appropriate
- Columns support required KPIs and metrics
- Data is cleaned and standardized
- Columns are clearly named and described
- Sample data is tested in Denzing for output consistency
Example
To demonstrate the above guide , we will use a publicly available NBA dataset and go through above steps in upcoming pages.