Revise Data Product
The Revise Data Product agent improves data product semantic layers through iterative refinement. It analyzes SQL evaluation results, identifies failures, updates data product specifications, and verifies improvements through re-evaluation.
How it works
Section titled “How it works”The agent follows this workflow:
- Runs SQL evaluation on the data product’s evaluation set and analyzes results
- Provides a summary of evaluation results, identifying patterns in failed cases
- Fetches the current data product specification (critical step to preserve all fields)
- Modifies only the specific fields that need changes while preserving all other fields
- Updates the data product with the complete modified specification
- Re-runs evaluation to verify improvements and compares with previous results
- Iterates the process if needed based on user feedback
Input parameters
Section titled “Input parameters”Required:
message(string): Instructions or questions about the data product evaluationdata_product_id(string): The ID of the data product to evaluate and improve
Output format
Section titled “Output format”The agent produces a series of thinking, tool call, tool return, and text blocks as it works through the user request.
The final message, assuming no errors, is a string with evaluation results and improvement summary.
Available tools
Section titled “Available tools”The agent has access to three tools:
Run SQL evaluation
Section titled “Run SQL evaluation”Runs SQL evaluation on the data product’s evaluation set. Returns cached results if the data product specification hasn’t changed, otherwise triggers a fresh evaluation run.
Returns:
- Overall execution accuracy
- Passed cases (successful question/SQL pairs)
- Failed cases with detailed reasoning
Get data product raw specification
Section titled “Get data product raw specification”Retrieves the complete raw data product specification in the exact format expected by the update API. This differs from the schema tool by providing the raw JSON specification without sample values or simplification.
Key fields in specification:
product.en.description: Natural language descriptionproduct.deliverySystems: Database connection informationproduct.recordSets: Table definitionsx-metrics: Custom metric definitionsx-derivedColumns: Computed column definitionsrelationships: Join relationship definitions
Update an existing data product
Section titled “Update an existing data product”Updates the data product specification with modifications. Requires the complete specification with all fields preserved.
Critical requirements:
- Must include the complete specification (missing fields will be removed)
- Must preserve exact field names and structure from the original spec
- Never construct specifications from scratch
Behavior notes
Section titled “Behavior notes”- Always fetch before update: The agent must call Get Data Product Raw Specification before calling Update Data Product
- Complete specifications required: Updates must include the complete specification; missing fields will be removed
- Preserve structure: Field names and structure from the original spec must be maintained exactly
- Verify improvements: Changes are only kept if they improve evaluation scores
- Iterative approach: The agent can make multiple rounds of refinements based on evaluation feedback
- Cached results: If the data product hasn’t changed, evaluation returns cached results for efficiency
Common use cases
Section titled “Common use cases”Improving descriptions
Section titled “Improving descriptions”Updates product.en.description or table/column descriptions to better capture domain terminology and improve semantic understanding.
Refining metrics
Section titled “Refining metrics”Modifies x-metrics definitions to accurately represent business calculations and aggregations.
Defining relationships
Section titled “Defining relationships”Updates relationships to correctly specify join conditions between tables.
Adding derived columns
Section titled “Adding derived columns”Creates or modifies x-derivedColumns to expose computed fields for natural language queries.