AI Copilot for pgwatch - starting implementation outside GSoC #1386

mohamadyasser118 · 2026-05-03T05:29:18Z

mohamadyasser118
May 3, 2026

Hi everyone,

I applied for GSoC this year to build an AI Copilot for pgwatch and although the project wasn't selected, I was encouraged to continue contributing outside the program. I'm planning to start the implementation now and wanted to open this discussion to align on scope and approach before writing code.

My plan was to build this in 5 stages:

1- Metric transformation engine
A Go package that reads the two most recent rows per metric series from the sink, scoped by sys_id, and computes rate-of-change, rolling baseline, and anomaly score. Output is a []MetricSignal struct, so the LLM never sees raw counter values.

2- Safety and validation layer
AST-based SQL validation using pg_query_go, read-only session enforcement via pgx RuntimeParams, and a pgmicro in-memory sandbox so LLM-generated queries never touch the production sink.

3- Schema and context builder
Read-only connection to the monitored instance for pg_stat_user_tables, pg_stat_statements, and optional pg_stat_plans / HypoPG with graceful degradation when absent.

4- Prompt assembler and LLM interface
Pluggable LLMProvider interface with OpenAI and Anthropic adapters (maybe other LLMs too), streaming SSE, and a configurable token window policy.

5- CLI integration
pgwatch copilot ask "" subcommand with --sys-id, --provider, --window flags.

Before I start, a few questions I want to confirm:

Should the Copilot live as a new subcommand inside the existing pgwatch binary, or as a separate binary that imports pgwatch as a library?
For the two-connection model (one to the sink and one to the monitored instance for schema context) .. is this the expected pattern for an internal pgwatch component?

Also if there are any suggestions and ideas I will be Happy to adjust scope or approach based on feedback and guidance here before I start.

mohamadyasser118 · 2026-05-05T05:53:35Z

mohamadyasser118
May 5, 2026
Author

@pashagolub

4 replies

pashagolub May 5, 2026
Maintainer

It should be standalone application that operates on data in the sink. That said, you could create a separate repo and implement all needed functionality there.

mohamadyasser118 May 5, 2026
Author

I understand, I will create it and start implementation then I will share it here

Syedowais312 May 7, 2026

If you need any help during implementation, feel free to reach out. I’ve also applied for the project and would be happy to collaborate where i can

mohamadyasser118 May 7, 2026
Author

If you need any help during implementation, feel free to reach out. I’ve also applied for the project and would be happy to collaborate where i can

Thanks for your offer, I appreciate it.

mergisi · 2026-05-06T07:55:24Z

mergisi
May 6, 2026

The five-stage architecture is well thought out. A few observations from working in the NL-to-SQL space:

MetricSignal abstraction (Stage 1)

This is the highest-leverage design decision. Raw counter values are meaningless to an LLM — rates, baselines, and anomaly scores give it something it can actually reason about. One thing worth considering: include the units in the struct (e.g., tps, ms, bytes/s). LLMs are surprisingly bad at inferring units from field names alone, and wrong unit assumptions cascade into wrong recommendations.

SQL safety (Stage 2)

The pg_query_go AST validation + read-only session + in-memory sandbox is a solid defense-in-depth approach. One pattern that works well: maintain a whitelist of allowed pg_catalog and pg_stat_* views rather than trying to blacklist dangerous operations. It's a smaller surface to reason about.

Schema context (Stage 3)

For the schema context builder, consider caching pg_stat_statements query fingerprints and including the top-N most frequent query patterns as context — this gives the LLM a much better mental model of the actual workload than table schemas alone.

NL-to-SQL connection

The copilot will inevitably need to translate natural language questions ("why is this query slow?") into diagnostic SQL against pg_stat_* views. This is essentially a domain-specific NL-to-SQL problem. Tools like ai2sql.io handle the general NL-to-SQL translation, and constraining the LLM to a known schema (which your Stage 3 provides) is the key to getting reliable results.

Disclosure: I work on ai2sql.io — a natural language to SQL tool.

Standalone vs. subcommand

The maintainer's suggestion of a standalone app is the right call. A shared Go module for sink access would give you clean separation without duplicating connection logic.

1 reply

mohamadyasser118 May 7, 2026
Author

Thanks for your useful observations. Through the implementation process I will research each choice we have and share it here then do the best. I will consider your notes for sure.

mohamadyasser118 · 2026-05-07T19:24:16Z

mohamadyasser118
May 7, 2026
Author

I've just set up the project, added config, and made the sink connection.
Here is the repo:
https://github.com/mohamadyasser118/pgwatch-copilot

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AI Copilot for pgwatch - starting implementation outside GSoC #1386

Uh oh!

{{title}}

Uh oh!

Replies: 3 comments 5 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

AI Copilot for pgwatch - starting implementation outside GSoC #1386

Uh oh!

mohamadyasser118 May 3, 2026

Replies: 3 comments · 5 replies

Uh oh!

mohamadyasser118 May 5, 2026 Author

Uh oh!

pashagolub May 5, 2026 Maintainer

Uh oh!

mohamadyasser118 May 5, 2026 Author

Uh oh!

Syedowais312 May 7, 2026

Uh oh!

mohamadyasser118 May 7, 2026 Author

Uh oh!

mergisi May 6, 2026

MetricSignal abstraction (Stage 1)

SQL safety (Stage 2)

Schema context (Stage 3)

NL-to-SQL connection

Standalone vs. subcommand

Uh oh!

mohamadyasser118 May 7, 2026 Author

Uh oh!

mohamadyasser118 May 7, 2026 Author

mohamadyasser118
May 3, 2026

Replies: 3 comments 5 replies

mohamadyasser118
May 5, 2026
Author

pashagolub May 5, 2026
Maintainer

mohamadyasser118 May 5, 2026
Author

mohamadyasser118 May 7, 2026
Author

mergisi
May 6, 2026

mohamadyasser118 May 7, 2026
Author

mohamadyasser118
May 7, 2026
Author