SQL Eval Case Manager
The SQL Eval Case Manager agent helps you build and maintain high-quality SQL evaluation sets for your data products. It can create new question/SQL pairs, generate evaluation questions in bulk, and audit existing cases for quality and correctness.
These evaluation sets are what you use to measure and improve a text-to-SQL agent’s accuracy. See SQL Evaluations for the broader workflow.
How it works
Section titled “How it works”The agent follows this workflow:
- Reviews existing evaluation cases to avoid duplicates and understand current terminology.
- Explores the data product’s schema and sample values to ground questions in real business context.
- Generates diverse questions across complexity categories (filters, joins, column transformations, business jargon, complex operators).
- Validates each SQL query by running it through a SQL agent against the data product.
- Saves the evaluation cases and presents them for review.
- For auditing tasks, reviews existing cases for SQL correctness, clarity, coverage, and duplicates.
Input parameters
Section titled “Input parameters”Required:
message(string): The request, e.g. “Generate 10 new evaluation questions” or “Audit existing cases”.agent_config_id(string): The ID of the SQL agent used to generate and validate SQL.data_product_id(string): The data product the cases are created against.
Optional:
data_product_version(string): A specific version of the data product.agent_input_payload(object): A custom input payload when calling sub-agents.
Available tools
Section titled “Available tools”- Get Data Schema: Reads the schema and sample values to ground questions.
- Get SQL Evaluation Set and Get SQL Evaluation Case: Retrieve existing cases to avoid duplicates.
- Run SQL Agent and Run SQL Eval Case: Generate and validate SQL for candidate questions.
- Create / Update / Delete SQL Evaluation Case: Manage the evaluation set.
Behavior notes
Section titled “Behavior notes”- Questions should read like realistic business inquiries, not trivial queries such as “how many rows are in my data”.
- The agent generates a bounded number of questions at a time and keeps them concise.
- Edits and deletions are confirmed before they are applied.