Building Schema: The Map to Your Logical Structure of Data 🗺️
Think of your data as a vast library of books. The schema is the library's cataloging system—the table of contents and index. It doesn't contain the data itself, but it provides a clear definition of every table (book) and column (chapter) in your data source.
This "map" is generated from your database's DDL (Data Definition Language). While the auto-generated schema is a good start, refining it is the most critical step to ensure the agent can understand and query your data accurately. A well-crafted schema, with clear, unambiguous column names, is the foundation for everything else.
Key Refinements for an Effective Schema​
Here are the key points to focus on when refining your schema, with explanations on why they matter.
1. Defining Keys: The Foundation of Relationships​
Your schema needs to know which columns are primary keys and which are foreign keys. These flags are the bedrock of table relationships.
primary_key: true: Marks the column that uniquely identifies each row in a table (e.g.,GAME_IDin theGAMEStable).foreign_key: true: Marks a column that links to a primary key in another table (e.g.,HOME_TEAM_IDlinks fromGAMESto theTEAMStable).
Why It Matters:
Without these flags, the agent sees your tables as disconnected islands of data. It cannot form relationships or perform joins. A simple question like, "Which games did the Lakers play last season?" would be impossible to answer because the agent wouldn't know how to connect the GAMES table to the TEAMS table to find the team named "Lakers." Correctly setting these keys is what makes multi-table queries possible.
Example:
# In games.yml
GAME_ID:
  name: Game ID
  type: VARCHAR
  column: GAME_ID
  desc: Unique identifier for the game
  primary_key: true # This tells the agent it's the unique ID for a game.
HOME_TEAM_ID:
  name: Home Team ID
  type: VARCHAR
  column: HOME_TEAM_ID
  desc: Identifier for the home team
  foreign_key: true # This tells the agent it can be used to link to another table.
2. Optimizing Queries with fetch: true​
The fetch: true flag is a powerful optimization for columns with categorical or frequently repeated values. It tells the agent to cache the unique values in that column.
Why It Matters: This feature enables intelligent fuzzy matching, which is essential for a smooth conversational experience. Users rarely type things perfectly. A user might ask for "Lakers games," but the data might store the team name as "Los Angeles Lakers."
By setting fetch: true on the HOME_TEAM and VISITOR_TEAM columns, the agent caches all team names. It can then intelligently map "Lakers" to "Los Angeles Lakers," successfully finding the data. Without this, queries that aren't exact matches would likely fail. It makes the agent more forgiving and a lot smarter.
Example:
# In games.yml
GAME_TYPE:
  name: Game Type
  type: VARCHAR
  column: GAME_TYPE
  desc: Type of the game (e.g., regular, playoff)
  fetch: true # Cache values like "regular" and "playoff" for better matching.
HOME_TEAM:
  name: Home Team
  type: VARCHAR
  column: HOME_TEAM
  desc: Name of the home team
  fetch: true # Cache all team names to handle user variations like "Lakers" vs "Los Angeles Lakers".
3. Enabling Complex Questions with joins​
Once your keys are defined, you must specify the joins in the table_info section. This is where you explicitly tell the agent how to connect the tables using those keys.
Why It Matters: Defining joins is the final step that unlocks the ability to ask complex, cross-table questions.
- 
Without Joins, you can only ask questions about a single table:
- "Show me all the game IDs." (Works, 
GAMEStable only) - "List all team names." (Works, 
TEAMStable only) - "Show me games played by the Lakers." (Fails, requires both tables)
 
 - "Show me all the game IDs." (Works, 
 - 
With Joins, you can ask rich, analytical questions:
- "What was the score of the last game between the Lakers and the Celtics?"
 - "Show me all the playoff games from the 2023 season where the home team scored over 120 points."
 
 
The join instructions are the agent's guide to combining information from different sources to answer a single, complex question.
Example:
# In games.yml
table_info:
  - table: GAMES
    joins:
      # This line tells the agent how to connect GAMES to TEAMS for the home team
      - join: TEAMS
        on: TEAMS.ID = GAMES.HOME_TEAM_ID
      # This line tells the agent how to connect GAMES to TEAMS again for the visitor team
      - join: TEAMS
        on: TEAMS.ID = GAMES.VISITOR_TEAM_ID
The Complete Picture: Schema Template and Example​
When you apply all these refinements, you get a robust schema that serves as a solid foundation for your semantics.
📄 Schema Template Reference​
<SCHEMA_NAME>:
  subject_area: <Domain>
  table_info:
    - table: <TABLE_NAME>
      joins:
        - join: <OTHER_TABLE>
          on: <JOIN_CONDITION>
  columns:
    <COLUMN_NAME>:
      name: <Readable Name>
      type: <Data Type>
      column: <DB Column>
      desc: <Description>
      primary_key: <true|false>
      foreign_key: <true|false>
      fetch: <true|false>
Full Example: games.yml​
GAMES:
  subject_area: Sports
  table_info:
    - table: GAMES
      joins:
        - join: TEAMS
          on: TEAMS.ID = GAMES.HOME_TEAM_ID
        - join: TEAMS
          on: TEAMS.ID = GAMES.VISITOR_TEAM_ID
  columns:
    GAME_ID:
      name: Game ID
      type: VARCHAR
      column: GAME_ID
      desc: Unique identifier for the game
      primary_key: true
    GAME_DATE:
      name: Game Date
      type: DATE
      column: GAME_DATE
      desc: Date when the game was played
    SEASON:
      name: Season
      type: VARCHAR
      column: SEASON
      desc: Season during which the game took place
    GAME_TYPE:
      name: Game Type
      type: VARCHAR
      column: GAME_TYPE
      desc: Type of the game (e.g., regular, playoff)
      fetch: true
    STATUS:
      name: Status
      type: VARCHAR
      column: STATUS
      desc: Current status of the game
      fetch: true
    HOME_TEAM_ID:
      name: Home Team ID
      type: VARCHAR
      column: HOME_TEAM_ID
      desc: Identifier for the home team
      foreign_key: true
    HOME_TEAM:
      name: Home Team
      type: VARCHAR
      column: HOME_TEAM
      desc: Name of the home team
      fetch: true
    HOME_SCORE:
      name: Home Score
      type: NUMBER
      column: HOME_SCORE
      desc: Score of the home team
    VISITOR_TEAM_ID:
      name: Visitor Team ID
      type: VARCHAR
      column: VISITOR_TEAM_ID
      desc: Identifier for the visitor team
      foreign_key: true
    VISITOR_TEAM:
      name: Visitor Team
      type: VARCHAR
      column: VISITOR_TEAM
      desc: Name of the visitor team
      fetch: true
    VISITOR_POINT:
      name: Visitor Points
      type: NUMBER
      column: VISITOR_POINT
      desc: Points scored by the visitor team