Skip to content

Commit 65ae13e

Browse files
Merge pull request #1536 from joshuawilson/spec
OLS-2882: Add spec files to the projects for AI-assisted development
2 parents 958907b + dc90da4 commit 65ae13e

18 files changed

Lines changed: 1751 additions & 0 deletions

.ai/spec/README.md

Lines changed: 69 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,69 @@
1+
# OpenShift Lightspeed Operator -- Specifications
2+
3+
Machine-readable behavioral and architectural specifications for the OpenShift Lightspeed Operator.
4+
5+
## Structure
6+
7+
This specification uses a two-layer structure:
8+
9+
| Layer | Path | Purpose |
10+
|---|---|---|
11+
| **what/** | `.ai/spec/what/` | Behavioral rules. Defines what the operator must do, its invariants, and its configuration surface. Implementation-agnostic. |
12+
| **how/** | `.ai/spec/how/` | Architecture. Defines how the codebase is organized, how reconciliation is implemented, and how resources are generated. Implementation-specific. |
13+
14+
The separation exists so that behavioral rules remain stable across refactors. An agent fixing a reconciliation bug reads both layers; an agent answering "what happens when X" reads only `what/`.
15+
16+
## Scope
17+
18+
These specs cover the **operator** only. The following are separate projects with their own repositories and specifications:
19+
20+
- **lightspeed-service** -- the Python/FastAPI backend application
21+
- **lightspeed-console** -- the OpenShift Console plugin UI code
22+
- **RAG content pipeline** -- the retrieval-augmented generation data pipeline
23+
- **Jira project data** -- issue tracking lives in the service repo's Jira project (OLS)
24+
25+
## Audience
26+
27+
AI agents (Claude). Content is optimized for precision and machine consumption over human readability.
28+
29+
## Quick Start
30+
31+
| Task | Start here |
32+
|---|---|
33+
| Understand what the operator does | `what/system-overview.md` |
34+
| Fix a reconciliation bug | `what/reconciliation.md` + `how/reconciliation.md` |
35+
| Add a new managed component | `what/system-overview.md` + `how/project-structure.md` |
36+
| Understand the CRD | `what/crd-api.md` |
37+
| Navigate the codebase | `how/project-structure.md` |
38+
| Understand TLS configuration | `what/tls.md` |
39+
| Understand security constraints | `what/security.md` |
40+
| Debug external resource watching | `what/resource-lifecycle.md` + `how/reconciliation.md` |
41+
| Add metrics or alerts | `what/observability.md` |
42+
43+
## Conventions
44+
45+
### Planned changes
46+
47+
Unimplemented behavior is marked with `[PLANNED: OLS-XXXX]` where `OLS-XXXX` is the Jira ticket. These markers appear inline next to the behavioral rule they affect. A summary table of all planned changes appears at the end of each `what/` spec that contains them.
48+
49+
### Configuration field references
50+
51+
User-configurable values are referenced by their CRD field path (e.g., `spec.ols.defaultModel`). Operator startup flags are referenced by their flag name (e.g., `--namespace`).
52+
53+
### Internal constants
54+
55+
Behavioral rules state the rule without embedding the numeric value. For example: "the finalizer cleanup waits for owned resources to be deleted before removing the finalizer" rather than "waits for 3 minutes". The actual value lives in code and may change.
56+
57+
### Rule numbering
58+
59+
Behavioral rules are numbered sequentially within each section. Numbers are stable within a spec version but may be renumbered across major revisions.
60+
61+
## Project History
62+
63+
| Phase | Period | Operator milestones |
64+
|---|---|---|
65+
| Prototype | Q4 2023 | Initial operator scaffold with kubebuilder. Basic OLSConfig CRD. AppServer deployment reconciliation. |
66+
| Early Access | Q1-Q2 2024 | PostgreSQL conversation cache. Console UI plugin integration. LLM secret management. Redis replaced by PostgreSQL. |
67+
| Tech Preview | Q3 2024 | TLS hardening (service-ca integration, custom certs). Prometheus monitoring. Status conditions. Air-gap support (image overrides). |
68+
| GA | Q4 2024 - Q1 2025 | Finalizer-based cleanup. ResourceVersion-based change detection. External resource watcher system. OCP version detection for console plugin image selection. |
69+
| Post-GA | 2025-2026 | MCP server integration. RAG support with vector database. Event-driven reconciliation (removed timer-based). Dataverse exporter. PatternFly 5/6 console image selection. LCore/Llama Stack backend (added then removed). |

.ai/spec/how/README.md

Lines changed: 32 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,32 @@
1+
# Architecture Specifications
2+
3+
Defines how the operator is implemented. Each spec maps behavioral rules from `what/` to code locations, patterns, and structural decisions.
4+
5+
## Spec Index
6+
7+
| Spec | Description |
8+
|---|---|
9+
| `project-structure.md` | Codebase layout: package responsibilities, file naming conventions, import graph, key entry points. Map from concept to file path. |
10+
| `reconciliation.md` | Reconciliation implementation: task registration pattern, error propagation, status update mechanics, watcher configuration, finalizer implementation. |
11+
| `deployment-generation.md` | How Kubernetes resources (Deployments, Services, ConfigMaps, Secrets, PVCs) are generated: builder functions, volume/mount assembly, container spec construction, owner references. |
12+
| `config-generation.md` | How CRD fields are transformed into operand configuration: OLS config YAML generation, PostgreSQL configuration, MCP server configuration, environment variable mapping. |
13+
14+
## When to Read
15+
16+
| Situation | Read |
17+
|---|---|
18+
| Need to find where something is implemented | `project-structure.md` |
19+
| Debugging reconciliation ordering or error handling | `reconciliation.md` |
20+
| Modifying a deployment, service, or volume | `deployment-generation.md` |
21+
| Changing how CRD fields map to operand config | `config-generation.md` |
22+
| Adding a new reconciliation task | `reconciliation.md` + `deployment-generation.md` |
23+
| Understanding watcher behavior | `reconciliation.md` |
24+
25+
## Relationship to what/
26+
27+
The `how/` specs implement the behavioral rules defined in `what/`. Each `how/` spec references the `what/` rules it implements.
28+
29+
- `how/` specs describe code structure, function signatures, and file locations.
30+
- `what/` specs describe invariants, ordering constraints, and expected behavior.
31+
- When implementing a change, read the `what/` spec first to understand the required behavior, then read the `how/` spec to find the implementation location.
32+
- If a `how/` spec contradicts a `what/` spec, the `what/` spec is authoritative and the implementation should be updated to match.

.ai/spec/how/config-generation.md

Lines changed: 218 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,218 @@
1+
# Config Generation
2+
3+
## Module Map
4+
5+
| File | Key Functions | Responsibility |
6+
|---|---|---|
7+
| `internal/controller/appserver/assets.go` | `GenerateOLSConfigMap()`, `buildProviderConfigs()`, `buildOLSConfig()`, `generateMCPServerConfigs()`, `buildToolFilteringConfig()` | OLS config YAML (olsconfig.yaml) |
8+
| `internal/controller/postgres/assets.go` | `GeneratePostgresConfigMap()`, `GeneratePostgresBootstrapSecret()`, `GeneratePostgresSecret()` | PostgreSQL config + bootstrap script + credentials |
9+
| `internal/controller/console/assets.go` | `GenerateConsoleUIConfigMap()` | Nginx config for console plugin |
10+
| `internal/controller/utils/mcp_server_config.go` | `GenerateOpenShiftMCPServerConfigMap()` | MCP server denied-resources config (TOML) |
11+
12+
## Data Flow
13+
14+
### OLS Config (olsconfig.yaml)
15+
```
16+
CR spec -> GenerateOLSConfigMap() -> ConfigMap "olsconfig"
17+
```
18+
19+
Generated YAML structure (marshaled from `utils.AppSrvConfigFile`):
20+
```yaml
21+
llm_providers:
22+
- name: <provider.Name>
23+
type: <provider.Type> # direct from CRD enum: openai, azure_openai, etc.
24+
url: <provider.URL> # non-Azure providers
25+
credentials_path: /etc/apikeys/<secretName> # mount path to secret dir
26+
models:
27+
- name: <model.Name>
28+
url: <model.URL>
29+
context_window_size: <model.ContextWindowSize>
30+
parameters:
31+
max_tokens_for_response: <model.Parameters.MaxTokensForResponse>
32+
tool_budget_ratio: <default 0.25 if zero>
33+
# Azure-specific:
34+
azure_openai_config:
35+
url: <provider.URL>
36+
credentials_path: /etc/apikeys/<secretName>
37+
azure_deployment_name: <deploymentName>
38+
api_version: <apiVersion>
39+
# Watsonx-specific:
40+
project_id: <projectID>
41+
# Fake provider:
42+
fake_provider_config:
43+
url: "http://example.com"
44+
response: "This is a preconfigured fake response."
45+
chunks: 30
46+
sleep: 0.1
47+
stream: false
48+
mcp_tool_call: <fakeProviderMCPToolCall>
49+
50+
ols_config:
51+
default_model: <spec.ols.defaultModel>
52+
default_provider: <spec.ols.defaultProvider>
53+
max_iterations: <spec.ols.maxIterations>
54+
logging:
55+
app_log_level: <spec.ols.logLevel>
56+
lib_log_level: <spec.ols.logLevel>
57+
uvicorn_log_level: <spec.ols.logLevel>
58+
conversation_cache:
59+
type: postgres
60+
postgres:
61+
host: lightspeed-postgres-server.<namespace>.svc
62+
port: 5432
63+
user: postgres
64+
db: postgres
65+
password_path: /etc/credentials/lightspeed-postgres-secret/password
66+
ssl_mode: require
67+
ca_cert_path: /etc/certs/postgres-ca/service-ca.crt
68+
tls_config:
69+
tls_certificate_path: /etc/certs/lightspeed-tls/tls.crt
70+
tls_key_path: /etc/certs/lightspeed-tls/tls.key
71+
reference_content:
72+
indexes:
73+
- path: /app-root/rag/rag-0 # BYOK first (one per spec.ols.rag entry)
74+
index_id: <rag.IndexID>
75+
origin: <rag.Image>
76+
- path: /app-root/vector_db/ocp_product_docs/<major>.<minor> # OCP docs (unless byokRAGOnly)
77+
index_id: ocp-product-docs-<major>_<minor>
78+
origin: "Red Hat OpenShift <major>.<minor> documentation"
79+
embeddings_model_path: /app-root/embeddings_model
80+
user_data_collection:
81+
feedback_disabled: <computed: CRvalue || !dataCollectorEnabled>
82+
feedback_storage: /app-root/ols-user-data/feedback
83+
transcripts_disabled: <computed: CRvalue || !dataCollectorEnabled>
84+
transcripts_storage: /app-root/ols-user-data/transcripts
85+
extra_cas: [<list of cert file paths from kube-root-ca.crt + additional CA CM>]
86+
certificate_directory: /etc/certs/cert-bundle
87+
proxy_config:
88+
proxy_url: <proxyConfig.proxyURL>
89+
proxy_ca_cert_path: /etc/certs/cm-proxycacert/<certKey>
90+
query_filters: [{name, pattern, replace_with}] # if spec.ols.queryFilters set
91+
system_prompt_path: /etc/ols/system_prompt # if spec.ols.querySystemPrompt set
92+
quota_handlers_config: # if spec.ols.quotaHandlersConfig set
93+
storage: <postgres cache config>
94+
scheduler: {period: 300}
95+
limiters_config: [{name, type, initial_quota, quota_increase, period}]
96+
enable_token_history: <bool>
97+
tool_filtering: # if ToolFiltering gate + MCP servers exist
98+
alpha: <default 0.8>
99+
top_k: <default 10>
100+
threshold: <default 0.01>
101+
tools_approval: # always present
102+
approval_type: <default "tool_annotations">
103+
approval_timeout: <default 600>
104+
105+
mcp_servers: # if any MCP servers configured
106+
- name: openshift # if introspectionEnabled
107+
url: http://localhost:<OpenShiftMCPServerPort>
108+
timeout: <mcpKubeServerConfig.timeout or default 60>
109+
headers:
110+
x-kube-auth: "{{KUBERNETES_TOKEN}}"
111+
- name: <user server> # if MCPServer feature gate
112+
url: <url>
113+
timeout: <timeout>
114+
headers:
115+
<name>: <resolved value> # kubernetes -> "{{KUBERNETES_TOKEN}}"
116+
# client -> "{{CLIENT_TOKEN}}"
117+
# secret -> /etc/mcp/headers/<secretName>/header
118+
119+
user_data_collector_config: # if dataCollectorEnabled
120+
data_storage: /app-root/ols-user-data
121+
log_level: <spec.olsDataCollector.logLevel>
122+
```
123+
124+
### PostgreSQL Bootstrap Script
125+
Content is in `utils.PostgresBootStrapScriptContent` constant. Deployed as a Secret (not ConfigMap) named `lightspeed-postgres-bootstrap`.
126+
127+
```bash
128+
#!/bin/bash
129+
cat /var/lib/pgsql/data/userdata/postgresql.conf
130+
131+
_psql () { psql --set ON_ERROR_STOP=1 "$@" ; }
132+
133+
# Create pg_trgm extension in default database (for OLS conversation cache)
134+
echo "CREATE EXTENSION IF NOT EXISTS pg_trgm;" | _psql -d $POSTGRESQL_DATABASE
135+
136+
# Create schemas for isolating different components' data
137+
echo "CREATE SCHEMA IF NOT EXISTS quota;" | _psql -d $POSTGRESQL_DATABASE
138+
echo "CREATE SCHEMA IF NOT EXISTS conversation_cache;" | _psql -d $POSTGRESQL_DATABASE
139+
```
140+
141+
### PostgreSQL Config (postgresql.conf.sample)
142+
Content is in `utils.PostgresConfigMapContent` constant. Deployed as ConfigMap.
143+
```
144+
huge_pages = off
145+
ssl = on
146+
ssl_cert_file = '/etc/certs/tls.crt'
147+
ssl_key_file = '/etc/certs/tls.key'
148+
ssl_ca_file = '/etc/certs/cm-olspostgresca/service-ca.crt'
149+
```
150+
151+
### PostgreSQL Password Secret
152+
Generated via `GeneratePostgresSecret()`: 12 random bytes, base64 encoded, stored in secret key `password` (`utils.PostgresSecretKeyName`).
153+
154+
### Nginx Config (Console UI)
155+
Inline in `GenerateConsoleUIConfigMap()`:
156+
- PID file: `/tmp/nginx/nginx.pid`
157+
- Temp paths: `/tmp/nginx/{client_body,proxy,fastcgi,uwsgi,scgi}` (for read-only root filesystem)
158+
- Serves static files from `/usr/share/nginx/html` on port 9443 with SSL
159+
- TLS cert/key from `/var/cert/tls.crt` and `/var/cert/tls.key`
160+
161+
### MCP Server Config (TOML)
162+
Inline in `utils.OpenShiftMCPServerConfigTOML` constant:
163+
```toml
164+
[[denied_resources]]
165+
group = ""
166+
version = "v1"
167+
kind = "Secret"
168+
169+
[[denied_resources]]
170+
group = "rbac.authorization.k8s.io"
171+
version = "v1"
172+
```
173+
174+
## Key Abstractions
175+
176+
### Credential Injection Pattern
177+
Provider credentials are mounted as files at `/etc/apikeys/<secretName>/`. The OLS config references the directory path as `credentials_path`. The secret key used is `apitoken` by default, overridable by `credentialKey` in the CR.
178+
179+
### External Resource Iteration
180+
`utils.ForEachExternalSecret(cr, callback)` and `utils.ForEachExternalConfigMap(cr, callback)` provide consistent iteration over CR-referenced external resources. Each callback receives `(name, source)` where `source` identifies the reference origin:
181+
- `"llm-provider-<providerName>"` for LLM credential secrets
182+
- `"mcp-<serverName>"` for MCP header secrets
183+
- `"additional-ca"` for additional CA configmaps
184+
- `"proxy-ca"` for proxy CA configmaps
185+
186+
### Config Building Pattern
187+
Config is built programmatically using typed Go structs from the `utils/` package (e.g., `utils.AppSrvConfigFile`) and marshaled with `yaml.Marshal()`. No templates are used.
188+
189+
### PostgreSQL Schema Isolation
190+
PostgreSQL schemas isolate data from different components within the same database:
191+
- `conversation_cache` schema: conversation history
192+
- `quota` schema: token quota tracking
193+
These schemas are created by the bootstrap script.
194+
195+
## Integration Points
196+
197+
| Config Section | Source | Notes |
198+
|---|---|---|
199+
| Provider credentials | CR `spec.llm.providers[].credentialsSecretRef` | File mount at `/etc/apikeys/<secretName>/` |
200+
| Default model/provider | CR `spec.ols.defaultModel`, `spec.ols.defaultProvider` | Required fields |
201+
| Log level | CR `spec.ols.logLevel` | Enum: DEBUG, INFO, WARNING, ERROR, CRITICAL. Default: INFO |
202+
| PostgreSQL connection | `utils/constants.go` | Host built from service name + namespace + ".svc" |
203+
| TLS certs | Service-ca operator or user-provided secret | Path: `/etc/certs/lightspeed-tls/` |
204+
| RAG indexes | CR `spec.ols.rag[]` | File paths in config YAML |
205+
| OpenShift version | Reconciler options | Used for OCP docs RAG index path |
206+
| MCP servers | CR `spec.mcpServers[]` + `spec.ols.introspectionEnabled` | Feature gated by `MCPServer` gate |
207+
| Tool filtering | CR `spec.ols.toolFilteringConfig` | Feature gated by `ToolFiltering` gate; requires MCP servers |
208+
| Proxy config | CR `spec.ols.proxyConfig` | Proxy URL + optional CA cert configmap |
209+
| Query filters | CR `spec.ols.queryFilters[]` | Regex patterns for content filtering |
210+
| Quota config | CR `spec.ols.quotaHandlersConfig` | Rate limiting with scheduler period fixed at 300s |
211+
212+
## Implementation Notes
213+
214+
- Config YAML is built programmatically using Go structs and marshaled with `yaml.Marshal()`, not templates.
215+
- The fake provider config is hardcoded with test response values (`"This is a preconfigured fake response."`).
216+
- PostgreSQL uses `POSTGRESQL_ADMIN_PASSWORD` env var for the admin password (mapped from the generated secret in the deployment spec, not shown in config files).
217+
- Exporter config for data collector uses a separate ConfigMap (`utils.ExporterConfigCmName`) with collection interval of 300 seconds, cleanup after send, and ingress URL to `console.redhat.com`.
218+
- The `OLSSystemPromptFileName` is stored as a separate key in the OLS config ConfigMap when `querySystemPrompt` is set.

0 commit comments

Comments
 (0)