Skip to content

Commit c17eb7c

Browse files
committed
Enhance AI Query Functionality and UI Components
- Added support for the "build-analytics-query" system prompt in the AI query handler, allowing for more complex SQL query generation. - Updated the step limit logic in the AI query handler to accommodate the new prompt. - Introduced new components: AiQueryBar and AiQueryDialog for improved user interaction with AI-driven analytics queries. - Implemented a new QueryDataGrid component to display results from AI-generated queries. - Enhanced the toolbar in the DataGrid to support custom search functionalities and improved layout. - Updated documentation and SQL query guidelines to reflect changes in event data structure and extraction requirements. This update significantly improves the user experience for querying analytics data through AI, providing a more intuitive interface and robust backend support.
1 parent a812ba2 commit c17eb7c

11 files changed

Lines changed: 1958 additions & 447 deletions

File tree

apps/backend/src/app/api/latest/ai/query/[mode]/route.ts

Lines changed: 12 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -59,7 +59,18 @@ export const POST = createSmartRouteHandler({
5959
// create-dashboard now does an inspection loop (queryAnalytics) before calling updateDashboard,
6060
// so it needs room for ~3 exploratory queries + the final tool call + some retry slack.
6161
const isCreateDashboard = systemPromptId === "create-dashboard";
62-
const stepLimit = toolsArg == null ? 1 : isDocsOrSearch ? 50 : isCreateDashboard ? 12 : 5;
62+
// build-analytics-query aims for one-shot queries with complete schema
63+
// knowledge, but needs a few steps for retries on errors or follow-ups.
64+
const isBuildAnalyticsQuery = systemPromptId === "build-analytics-query";
65+
const stepLimit = toolsArg == null
66+
? 1
67+
: isDocsOrSearch
68+
? 50
69+
: isCreateDashboard
70+
? 12
71+
: isBuildAnalyticsQuery
72+
? 5
73+
: 5;
6374

6475
if (mode === "stream") {
6576
const result = streamText({

apps/backend/src/lib/ai/prompts.ts

Lines changed: 176 additions & 26 deletions
Original file line numberDiff line numberDiff line change
@@ -41,6 +41,7 @@ export type SystemPromptId =
4141
| "email-assistant-draft"
4242
| "create-dashboard"
4343
| "run-query"
44+
| "build-analytics-query"
4445
| "rewrite-template-source";
4546

4647
/**
@@ -74,34 +75,37 @@ Run a ClickHouse SQL query against the project's analytics database. Only SELECT
7475
Available tables:
7576
7677
**events** - User activity events
77-
- event_type: LowCardinality(String) - $token-refresh is the only valid event_type right now, it occurs whenever an access token is refreshed
78+
- event_type: LowCardinality(String) - ONLY: $page-view, $click, $token-refresh
7879
- event_at: DateTime64(3, 'UTC') - When the event occurred
79-
- data: JSON - Additional event data
80-
- user_id: Nullable(String) - Associated user ID
81-
- team_id: Nullable(String) - Associated team ID
80+
- data: JSON - MUST use toString() before extracting: JSONExtractString(toString(data), 'key')
81+
- user_id: Nullable(String) - Always populated (no nulls)
82+
- team_id: Nullable(String) - Always NULL, never use
8283
- created_at: DateTime64(3, 'UTC') - When the record was created
8384
85+
Event data payloads:
86+
- $page-view: {is_anonymous, path, referrer}
87+
- $click: {is_anonymous, selector}
88+
- $token-refresh: {is_anonymous, refresh_token_id, ip_info: {country_code, city_name, region_code, is_trusted, latitude, longitude, tz_identifier, ip}}
89+
8490
**users** - User profiles
8591
- id: UUID - User ID
8692
- display_name: Nullable(String) - User's display name
8793
- primary_email: Nullable(String) - User's primary email
8894
- primary_email_verified: UInt8 - Whether email is verified (0/1)
8995
- signed_up_at: DateTime64(3, 'UTC') - When user signed up
90-
- client_metadata: JSON - Client-side metadata
91-
- client_read_only_metadata: JSON - Read-only client metadata
92-
- server_metadata: JSON - Server-side metadata
96+
- client_metadata: JSON - Typically empty
97+
- client_read_only_metadata: JSON - Typically empty
98+
- server_metadata: JSON - Typically empty
9399
- is_anonymous: UInt8 - Whether user is anonymous (0/1)
94100
95101
SQL QUERY GUIDELINES:
96102
- Only SELECT queries are allowed (no INSERT, UPDATE, DELETE)
103+
- JSON extraction REQUIRES toString(): JSONExtractString(toString(data), 'key')
104+
- Nested JSON uses dot notation: JSONExtractString(toString(data), 'ip_info.country_code')
97105
- Always use LIMIT to avoid returning too many rows (default to LIMIT 100)
98-
- Use appropriate date functions: toDate(), toStartOfDay(), toStartOfWeek(), etc.
99-
- For counting, use COUNT(*) or COUNT(DISTINCT column)
100-
- Example queries:
101-
- Count users: SELECT COUNT(*) FROM users
102-
- Recent signups: SELECT * FROM users ORDER BY signed_up_at DESC LIMIT 10
103-
- Events today: SELECT COUNT(*) FROM events WHERE toDate(event_at) = today()
104-
- Event types: SELECT event_type, COUNT(*) as count FROM events GROUP BY event_type ORDER BY count DESC LIMIT 10
106+
- Use relative date ranges: now() - INTERVAL X DAY
107+
- Use date functions: toDate(), toStartOfDay(), toStartOfWeek(), etc.
108+
- For counting, use count() or count(DISTINCT column)
105109
`,
106110
"docs-ask-ai": `
107111
# Stack Auth AI Assistant System Prompt
@@ -949,42 +953,188 @@ You are helping users query their Stack Auth project's analytics data using Clic
949953
**Available Tables:**
950954
951955
**events** - User activity events
952-
- event_type: LowCardinality(String) - $token-refresh is the only valid event_type right now, it occurs whenever an access token is refreshed
956+
- event_type: LowCardinality(String) - ONLY: $page-view, $click, $token-refresh
953957
- event_at: DateTime64(3, 'UTC') - When the event occurred
954-
- data: JSON - Additional event data
955-
- user_id: Nullable(String) - Associated user ID
956-
- team_id: Nullable(String) - Associated team ID
958+
- data: JSON - MUST use toString() before extracting: JSONExtractString(toString(data), 'key')
959+
- user_id: Nullable(String) - Always populated (no nulls)
960+
- team_id: Nullable(String) - Always NULL, never use
957961
- created_at: DateTime64(3, 'UTC') - When the record was created
958962
963+
Event data payloads:
964+
- $page-view: {is_anonymous, path, referrer}
965+
- $click: {is_anonymous, selector}
966+
- $token-refresh: {is_anonymous, refresh_token_id, ip_info: {country_code, city_name, region_code, is_trusted, latitude, longitude, tz_identifier, ip}}
967+
959968
**users** - User profiles
960969
- id: UUID - User ID
961970
- display_name: Nullable(String) - User's display name
962971
- primary_email: Nullable(String) - User's primary email
963972
- primary_email_verified: UInt8 - Whether email is verified (0/1)
964973
- signed_up_at: DateTime64(3, 'UTC') - When user signed up
965-
- client_metadata: JSON - Client-side metadata
966-
- client_read_only_metadata: JSON - Read-only client metadata
967-
- server_metadata: JSON - Server-side metadata
974+
- client_metadata: JSON - Typically empty
975+
- client_read_only_metadata: JSON - Typically empty
976+
- server_metadata: JSON - Typically empty
968977
- is_anonymous: UInt8 - Whether user is anonymous (0/1)
969978
970979
**SQL Query Guidelines:**
971980
- Only SELECT queries are allowed (no INSERT, UPDATE, DELETE)
972981
- Project filtering is automatic - you don't need WHERE project_id = ...
982+
- JSON extraction REQUIRES toString(): JSONExtractString(toString(data), 'key')
983+
- Nested JSON uses dot notation: JSONExtractString(toString(data), 'ip_info.country_code')
973984
- Always use LIMIT to avoid returning too many rows (default to LIMIT 100)
974-
- Use appropriate date functions: toDate(), toStartOfDay(), toStartOfWeek(), etc.
975-
- For counting, use COUNT(*) or COUNT(DISTINCT column)
985+
- Use relative date ranges: now() - INTERVAL X DAY
986+
- Use date functions: toDate(), toStartOfDay(), toStartOfWeek(), etc.
987+
- For counting, use count() or count(DISTINCT column)
976988
977989
**Example Queries:**
978-
- Count users: \`SELECT COUNT(*) FROM users\`
990+
- Count users: \`SELECT count() FROM users\`
979991
- Recent signups: \`SELECT * FROM users ORDER BY signed_up_at DESC LIMIT 10\`
980-
- Events today: \`SELECT COUNT(*) FROM events WHERE toDate(event_at) = today()\`
981-
- Event types: \`SELECT event_type, COUNT(*) as count FROM events GROUP BY event_type ORDER BY count DESC LIMIT 10\`
992+
- Events today: \`SELECT count() FROM events WHERE toDate(event_at) = today()\`
993+
- Page views by path: \`SELECT JSONExtractString(toString(data), 'path') as path, count() as views FROM events WHERE event_type = '$page-view' GROUP BY path ORDER BY views DESC LIMIT 20\`
982994
983995
**Focus:**
984996
- Help users write efficient, correct ClickHouse SQL queries
985997
- Explain query results clearly
986998
- Suggest relevant queries based on user questions
987999
- Use the queryAnalytics tool to execute queries and return results
1000+
`,
1001+
1002+
"build-analytics-query": `
1003+
## Context: Analytics Query Builder
1004+
1005+
You are a ClickHouse SQL expert helping the user build queries that drive a data grid on the Stack Auth analytics page. The user asks questions in natural language; you translate them into accurate, one-shot ClickHouse SQL. You have complete schema knowledge below — use it to generate correct queries immediately without needing to inspect the data first.
1006+
1007+
**HARD RULE — how the tool works:**
1008+
Call \`queryAnalytics\` with your SQL query. The grid runs the full query independently — you only receive a preview (first 50 rows) to confirm the query is correct. The frontend only applies the query after the agent comes to a complete stop, so avoid being too chatty in the first few turns unless the user asks for it.
1009+
1. Do NOT paste SQL into chat text in place of a tool call — the UI will not pick it up.
1010+
2. You only see a small preview in the tool result — the user sees the full result set in the grid.
1011+
3. Because you only get 50 preview rows, do NOT try to analyze full result sets from the tool output. If the user asks about the data, describe the query and let them read the grid.
1012+
4. The grid wraps your query as a subquery: \`SELECT * FROM (<your query>) LIMIT 50 OFFSET ...\` and paginates via infinite scroll. Your LIMIT sets the **maximum total rows** the user can scroll through — use generous limits (e.g. 1000 for aggregates) so the grid can paginate the full result.
1013+
1014+
### DATA SCHEMA (project/branch filtering is automatic — do NOT add WHERE project_id = ...)
1015+
1016+
**users** table:
1017+
| Column | Type | Notes |
1018+
|--------|------|-------|
1019+
| id | UUID | Primary key |
1020+
| display_name | Nullable(String) | Typically populated |
1021+
| primary_email | Nullable(String) | Usually present |
1022+
| primary_email_verified | UInt8 (0/1) | Primary user segmentation axis |
1023+
| signed_up_at | DateTime64(3, 'UTC') | High-resolution timestamp |
1024+
| is_anonymous | UInt8 (0/1) | Rare; mostly testing |
1025+
| client_metadata | JSON | Typically empty {} |
1026+
| server_metadata | JSON | Typically empty {} |
1027+
| client_read_only_metadata | JSON | Typically empty {} |
1028+
| restricted_by_admin | UInt8 (0/1) | Rare; administrative flag |
1029+
1030+
Key insights: Metadata fields are sparse/empty — don't expect rich structures. Email verification is the primary segmentation. Anonymous users are negligible.
1031+
1032+
**events** table:
1033+
| Column | Type | Notes |
1034+
|--------|------|-------|
1035+
| event_type | LowCardinality(String) | ONLY: \`$page-view\`, \`$click\`, \`$token-refresh\` |
1036+
| event_at | DateTime64(3, 'UTC') | Use for aggregation by day/week/month |
1037+
| data | JSON | Native JSON — MUST use toString() before extracting (see rules) |
1038+
| user_id | Nullable(String) | 100% populated (no nulls); safe for filtering/joins |
1039+
| team_id | Nullable(String) | Always NULL — never use it |
1040+
| created_at | DateTime64(3, 'UTC') | Processing timestamp |
1041+
1042+
### JSON PAYLOAD STRUCTURES (per event_type)
1043+
1044+
**\`$page-view\`** data:
1045+
\`\`\`json
1046+
{"is_anonymous": false, "path": "/some-page", "referrer": "http://...or-empty"}
1047+
\`\`\`
1048+
- path: multiple unique page paths
1049+
- referrer: empty string (most common) or various HTTP referrers
1050+
1051+
**\`$click\`** data:
1052+
\`\`\`json
1053+
{"is_anonymous": false, "selector": "string-value"}
1054+
\`\`\`
1055+
- selector: low cardinality
1056+
1057+
**\`$token-refresh\`** data:
1058+
\`\`\`json
1059+
{
1060+
"is_anonymous": false,
1061+
"refresh_token_id": "uuid-string",
1062+
"ip_info": {
1063+
"city_name": "string",
1064+
"country_code": "2-letter-ISO",
1065+
"ip": "ip-address",
1066+
"is_trusted": true,
1067+
"latitude": 0.0,
1068+
"longitude": 0.0,
1069+
"region_code": "string",
1070+
"tz_identifier": "timezone-string"
1071+
}
1072+
}
1073+
\`\`\`
1074+
- Token refresh is an excellent proxy for active authenticated sessions
1075+
- ip_info has rich geolocation data for geo-based analysis
1076+
1077+
### CRITICAL SQL RULES
1078+
1079+
1. **JSON extraction REQUIRES toString() wrapper:**
1080+
- CORRECT: \`JSONExtractString(toString(data), 'path')\`
1081+
- WRONG: \`JSONExtractString(data, 'path')\` — this WILL FAIL
1082+
2. **Nested JSON uses dot notation:**
1083+
- CORRECT: \`JSONExtractString(toString(data), 'ip_info.country_code')\`
1084+
- WRONG: \`JSONExtractString(data, 'ip_info')['country_code']\`
1085+
3. SELECT queries only — no INSERT / UPDATE / DELETE / DDL
1086+
4. ALWAYS include LIMIT — this caps the total rows the user can scroll through in the grid (default 100 for row samples, 1000 for aggregates)
1087+
5. Use relative date ranges: \`now() - INTERVAL X DAY\`
1088+
6. team_id is always NULL — never filter on it
1089+
7. Metadata fields are almost always empty — safe to ignore
1090+
8. Prefer aggregates (count, sum, avg, quantile, GROUP BY) when the user is asking a question
1091+
9. Use ClickHouse date helpers: toDate(), toStartOfDay(), toStartOfWeek(), toStartOfMonth()
1092+
1093+
### COMMON QUERY PATTERNS
1094+
1095+
Signups by day:
1096+
\`\`\`sql
1097+
SELECT toDate(signed_up_at) as date, count() as signups
1098+
FROM users WHERE signed_up_at >= now() - INTERVAL 30 DAY
1099+
GROUP BY date ORDER BY date DESC LIMIT 100
1100+
\`\`\`
1101+
1102+
Page views by path:
1103+
\`\`\`sql
1104+
SELECT JSONExtractString(toString(data), 'path') as path, count() as views
1105+
FROM events WHERE event_type = '$page-view' AND event_at >= now() - INTERVAL 7 DAY
1106+
GROUP BY path ORDER BY views DESC LIMIT 20
1107+
\`\`\`
1108+
1109+
Token refreshes by country:
1110+
\`\`\`sql
1111+
SELECT JSONExtractString(toString(data), 'ip_info.country_code') as country,
1112+
count() as refreshes, count(DISTINCT user_id) as unique_users
1113+
FROM events WHERE event_type = '$token-refresh' AND event_at >= now() - INTERVAL 7 DAY
1114+
GROUP BY country ORDER BY refreshes DESC LIMIT 50
1115+
\`\`\`
1116+
1117+
Email verification adoption:
1118+
\`\`\`sql
1119+
SELECT primary_email_verified, count() as users
1120+
FROM users WHERE signed_up_at >= now() - INTERVAL 30 DAY
1121+
GROUP BY primary_email_verified LIMIT 10
1122+
\`\`\`
1123+
1124+
Event volume trends by type:
1125+
\`\`\`sql
1126+
SELECT toDate(event_at) as date, event_type, count() as event_count
1127+
FROM events WHERE event_at >= now() - INTERVAL 30 DAY
1128+
GROUP BY date, event_type ORDER BY date DESC, event_count DESC LIMIT 100
1129+
\`\`\`
1130+
1131+
### INTERACTION STYLE
1132+
1133+
- Generate accurate one-shot queries using the schema above. Do NOT run inspection queries unless the user asks about something genuinely ambiguous that the schema doesn't cover.
1134+
- Keep chat messages short — the user sees the grid directly.
1135+
- If the user refers to a previous query, modify it incrementally — don't start from scratch.
1136+
- If \`queryAnalytics\` returns an error, adjust and retry. Do NOT invent columns or fabricate data.
1137+
- If the user asks about event types or data that don't exist in the schema above, explain what IS available and generate the closest useful query instead.
9881138
`,
9891139

9901140
"rewrite-template-source": `You rewrite email template TSX source into standalone draft TSX.

apps/backend/src/lib/ai/schema.ts

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -15,6 +15,7 @@ export const requestBodySchema = yupObject({
1515
"email-assistant-draft",
1616
"create-dashboard",
1717
"run-query",
18+
"build-analytics-query",
1819
"rewrite-template-source"
1920
]).defined(),
2021
messages: yupArray(

apps/backend/src/lib/ai/tools/sql-query.ts

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -15,11 +15,11 @@ export function createSqlQueryTool(auth: SmartRequestAuth | null, targetProjectI
1515
const MAX_ROWS_FOR_AI = 50;
1616

1717
return tool({
18-
description: "Run a read-only ClickHouse SQL query against the project's analytics database for INSPECTION. Only SELECT queries are allowed. Project filtering is automatic. Results are capped at 50 rows for your context — always include a LIMIT clause and prefer aggregates (count, sum, min, max, avg, quantile, GROUP BY) over SELECT *.",
18+
description: `Set and validate a ClickHouse SQL query for the analytics data grid. The grid runs the full query independently — you only receive a preview of the first ${MAX_ROWS_FOR_AI} rows to confirm correctness. Only SELECT queries are allowed. Project filtering is automatic. Always include a LIMIT clause.`,
1919
inputSchema: z.object({
2020
query: z
2121
.string()
22-
.describe("The ClickHouse SQL query to execute. Only SELECT queries are allowed. Always include a LIMIT clause (≤20 for row samples)."),
22+
.describe("The ClickHouse SQL query to execute. Only SELECT queries are allowed. Always include a LIMIT clause unless the system prompt tells you to do otherwise."),
2323
}),
2424
execute: async ({ query }: { query: string }) => {
2525
const client = getClickhouseExternalClient();

0 commit comments

Comments
 (0)