Embed Chat-With-Data in Your Application
Your application’s users want to ask questions of your data without leaving your product. This recipe calls a published agent from your backend over the streaming REST API. The pattern runs in production inside a customer-facing analytics product handling more than 150,000 calls a month.
Two to three metered actions per question. At high volume this dominates your ACU spend, so model it against expected question volume. At 0.25 ACU per metered action; confirm actual spend on the Usage page.
What you’ll use
Section titled “What you’ll use”Build it
Section titled “Build it”-
Build and publish a custom SQL agent
Clone the Query or Data Product Query agent and scope it to one data product. Teams that compared head-to-head found a tuned custom agent answers more accurately than the default agent.
-
Put semantics in metadata, not the prompt
When users say “UK” but the data says “United Kingdom”, add synonyms and definitions to the catalog and data product metadata. One team analyzed 500 real user questions and found metadata fixes outperformed prompt fixes every time.
-
Authenticate machine-to-machine
Create an M2M OAuth client and request a JWT with the client credentials grant. Assign the client the minimum role that can call the AI APIs.
-
Call the streaming endpoint
POST the user’s message to
/ai/api/v1/chats/agent/{id}/streamwith the bearer token and stream the response into your UI. See REST API authentication for the token request and a full example. -
Scope multi-tenant sessions
If one agent serves many of your customers, use
pre_exec_sqlto set per-session context (such as a subscriber ID) before each query, and passmarketplace_idwhere applicable. -
Add evaluations before launch
Build an eval set with at least 10 human-validated question/SQL pairs and re-run it whenever you change the agent or its metadata.
Gotchas
Section titled “Gotchas”- Cloning an agent copies tool parameter bindings as they were; re-check bindings such as
allow_fallback_authbefore pointing production traffic at a clone. - The JWT’s role claims decide API access — a token with the wrong role gets
403, not a helpful error. - Chart-generating agents respond slowly enough to hit client timeouts (Microsoft Teams cuts off around 45 seconds); prefer text answers in latency-sensitive surfaces.
Variations
Section titled “Variations”- Return generated SQL instead of answers, so your application (or the user) executes it under its own controls. Prompt the agent to output SQL and present only that field from the streamed response.
- Call stored procedures by wrapping them in a custom agent prompt.