Skip to content

Commit 913cde8

Browse files
authored
Merge pull request #29 from jmbish04/feat/e2e-tests
2 parents f6d2b2c + d44faf1 commit 913cde8

570 files changed

Lines changed: 182798 additions & 8943 deletions

File tree

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

.agent/rules/AGENT_GOVERNANCE.md

Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,24 @@
1+
---
2+
trigger: always_on
3+
---
4+
5+
# AGENT & WORKFLOW GOVERNANCE
6+
7+
## 1. The Manager-Worker Pattern
8+
* **Rule**: Agents (`extends Agent`) **MUST NOT** perform long-running blocking tasks (>10s).
9+
* **Rule**: Long-running tasks (Cloning, Vectorizing, Scraping) **MUST** be offloaded to Cloudflare Workflows (`extends WorkflowEntrypoint`).
10+
* **Rule**: Agents act as "Managers" (State/Decision); Workflows act as "Workers" (Execution).
11+
12+
## 2. Tool Integration (MCP)
13+
* **Constraint**: Do NOT import Node.js-exclusive packages (e.g., `fs`, `child_process`) directly into the Worker.
14+
* **Strategy**: Adapt the *logic* of official tools into the Agents SDK `sql`-backed state or stateless `Octokit` calls.
15+
* **Schema**: All tools must strictly define Zod schemas for the Agents SDK to generate valid MCP interfaces.
16+
17+
## 3. Sandbox Usage
18+
* **Lifecycle**: Sandboxes are ephemeral. Data must be extracted (to R2, D1, or Vectorize) before the `step` completes.
19+
* **Security**: Never pass raw user input directly to `sandbox.exec()`. Sanitize command arguments.
20+
21+
## 4. Vectorization & RAG
22+
* **Chunking**: Code must be chunked (e.g., by function/class) before embedding.
23+
* **Model**: Use `@cf/baai/bge-large-en-v1.5` for embeddings (1024 dimensions).
24+
* **Metadata**: Upserts MUST include `{ repo, filepath, commit_sha }`.

.agent/rules/HEALTH_GOVERNANCE.md

Lines changed: 32 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,32 @@
1+
# Health Check Governance
2+
3+
## Rule: Every New Module Must Register a Health Check
4+
5+
When adding a new domain module under `backend/src/`, you **MUST**:
6+
7+
1. Create a `health.ts` file co-located with the module
8+
2. Export `checkHealth(env: Env): Promise<HealthStepResult>`
9+
3. Register the check in `backend/src/health/coordinator.ts``CODE_CHECKS` array
10+
4. Assign a `HealthCategory` from the union in `backend/src/health/types.ts`
11+
12+
## Rule: Dynamic Tests via D1
13+
14+
Runtime endpoint monitoring uses the `health_test_definitions` table. CRUD is available at:
15+
16+
- `GET /api/health/tests` — list all
17+
- `POST /api/health/tests` — create (Zod-validated)
18+
- `DELETE /api/health/tests/:id` — remove
19+
20+
## Rule: AI Remediation
21+
22+
Failed health checks automatically receive AI-powered remediation hints via `analyzeFailure()`. These are stored in the `ai_suggestion` column of `health_results`.
23+
24+
## Rule: Cron Schedule
25+
26+
Health suite runs weekly via cron `0 3 * * 0` (Sundays 3AM UTC).
27+
On-demand runs are available via `POST /api/health/run`.
28+
29+
## Rule: Frontend Sync
30+
31+
The frontend at `/health` **must** display all categories defined in `CATEGORY_META` in `Health.tsx`.
32+
When adding a new `HealthCategory` to `types.ts`, also add an entry to the frontend registry.

.agent/rules/actions-llm.md

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,11 @@
1+
# GitHub Actions LLM Rules
2+
3+
## 1. Resilience
4+
5+
- Always use `response_format={"type": "json_object"}` when interacting with `gpt-oss-120b` for data extraction.
6+
- Implement `try/except` blocks around the LLM call to prevent a single bad generation from crashing the CI pipeline.
7+
8+
## 2. Dependency Management
9+
10+
- Keep scripts self-contained within the YAML (using `cat <<EOF`) for simple tasks, or use a dedicated `scripts/` folder for complex ones.
11+
- Prioritize standard libraries (`openai`, `pydantic`) over experimental ones to ensure stability in the CI runner.

.agent/rules/logging-standards.md

Lines changed: 30 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,30 @@
1+
# Research Logging Standards
2+
3+
## 1. The "Glass Box" Principle
4+
5+
The user must see HOW the agent arrived at a conclusion.
6+
7+
- **BAD:** Agent returns "I found React."
8+
- **GOOD:**
9+
1. Agent logs: "User asked for frontend frameworks."
10+
2. Agent logs: "Tool 'GoogleSearch' called with query 'best frontend frameworks 2026'."
11+
3. Agent logs: "Tool returned 15 results."
12+
4. Agent logs: "Evaluating 'React' - it matches criteria."
13+
14+
## 2. Structured Metadata
15+
16+
Do not dump JSON into the `content` text field.
17+
18+
- Use the `metadata` JSON column for large payloads (e.g., full HTML body, raw search JSON).
19+
- Keep `content` human-readable (e.g., "Parsing search results...").
20+
21+
## 3. Error Visibility
22+
23+
If a tool fails (e.g., Browser Rendering timeout):
24+
25+
- Log it as `step_type: 'error'`.
26+
- Do not hide it. The user needs to see that the "Search Agent" failed to connect.
27+
28+
## 4. Async Writes
29+
30+
- Use `ctx.waitUntil()` for logging database inserts to prevent blocking the main agent execution thread.
Lines changed: 57 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,57 @@
1+
---
2+
description: Implement Agentic Research Team
3+
---
4+
5+
# Workflow:
6+
7+
## Phase 1: Infrastructure & MCP Layer
8+
1. **Wrangler Config**:
9+
* Edit `wrangler.jsonc` to add `[workflows]`, `[vectorize_indexes]`, and `[send_email]`.
10+
* Run `npx wrangler types` to update `worker-configuration.d.ts`.
11+
2. **Vectorize Setup**:
12+
* Run `npx wrangler vectorize create research-index --dimensions 1024 --metric cosine`.
13+
3. **MCP Adapter**:
14+
* Create `src/mcp/github-official-adapter.ts`.
15+
* Implement standard GitHub tools (`list_files`, `read_file`, `search_repositories`) using `src/octokit` logic but matching official tool names/schemas.
16+
* Import and register this in `src/tools/index.ts` to combine with custom tools.
17+
18+
## Phase 2: The Research Workflow (The Muscle)
19+
1. **Scaffold Workflow**:
20+
* Create `src/workflows/DeepResearchWorkflow.ts` extending `WorkflowEntrypoint`.
21+
2. **Sandbox Integration**:
22+
* Implement `step.do('clone')`:
23+
```typescript
24+
import { Sandbox } from '@cloudflare/sandbox-sdk';
25+
// ...
26+
const sandbox = await Sandbox.create({ assets: env.BROWSER });
27+
await sandbox.run(`git clone ${repoUrl}`);
28+
```
29+
3. **Analysis & RAG**:
30+
* Implement `step.do('process')`:
31+
* Read file tree.
32+
* Split code files.
33+
* `env.AI.run('@cf/baai/bge-large-en-v1.5')`.
34+
* `env.RESEARCH_INDEX.upsert()`.
35+
36+
## Phase 3: The Orchestrator (The Brain)
37+
1. **Create Agent**:
38+
* Create `src/agents/ResearchAgent.ts` extending `Agent`.
39+
2. **Logic Implementation**:
40+
* **Plan**: `onMessage` -> LLM generates research plan.
41+
* **Execute**: Call `env.DEEP_RESEARCH_WORKFLOW.create()`.
42+
* **Monitor**: Expose a `reportProgress` RPC method that the Workflow calls to update the Agent.
43+
* **HITL**: If the plan involves "Create Issue" or "PR", pause and send `type: 'approval_request'` to WebSocket.
44+
45+
## Phase 4: Daily Discovery & Email
46+
1. **Cron Handler**:
47+
* Update `src/index.ts` to export a `scheduled` handler.
48+
* Logic: Fetch "trending" -> Call `DeepResearchWorkflow` with `mode: 'discovery'` -> Aggregate Findings.
49+
2. **Email**:
50+
* Install `mimetext`.
51+
* Generate HTML report.
52+
* Send via `env.EMAIL_SENDER`.
53+
54+
## Phase 5: Verification
55+
1. Deploy: `npx wrangler deploy`.
56+
2. Test MCP: Connect generic MCP client to the Agent.
57+
3. Test Full Loop: Trigger `ResearchAgent` via Chat UI.
Lines changed: 55 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,55 @@
1+
---
2+
description: This plan deploys the self-contained "LLM-as-a-Judge" workflow to GitHub Actions.
3+
---
4+
5+
# Implementation Plan: GitHub Actions Research Judge
6+
7+
This plan deploys the self-contained "LLM-as-a-Judge" workflow to GitHub Actions.
8+
9+
## User Intent
10+
11+
Create a robust GitHub Action that uses Cloudflare Workers AI (`@cf/openai/gpt-oss-120b`) to orchestrate, execute, and evaluate GitHub repository searches before syncing them to a Cloudflare Worker.
12+
13+
## Technical Context
14+
15+
- **Infrastructure**: GitHub Actions (`ubuntu-latest`)
16+
- **Language**: Python 3.11
17+
- **AI Provider**: Cloudflare AI Gateway (OpenAI Compatible Endpoint)
18+
- **Model**: `gpt-oss-120b` (128k context)
19+
20+
## Execution Steps
21+
22+
### 1. Create Workflow File
23+
24+
- **Path**: `.github/workflows/research-judge.yml`
25+
- **Content**: Copy the provided YAML exactly.
26+
- **Key Features**:
27+
- Embeds `research_judge.py` directly (no extra file management).
28+
- Uses `pydantic` for strict JSON schema validation from the LLM.
29+
- Implements a `TinyAgent` class to wrap the OpenAI SDK interactions.
30+
31+
### 2. Configure Secrets (Manual)
32+
33+
You must add the following secrets to your GitHub Repository:
34+
35+
- `CLOUDFLARE_ACCOUNT_ID`: Your CF Account ID.
36+
- `CLOUDFLARE_GATEWAY_ID`: The ID of your AI Gateway.
37+
- `CLOUDFLARE_API_TOKEN`: Token with Workers AI permissions.
38+
- `WORKER_API_KEY`: Token to authenticate with your Hono Worker.
39+
40+
### 3. Usage
41+
42+
- **Manual**: Go to "Actions" -> "Deep Research Judge" -> "Run workflow" -> Enter a prompt.
43+
- **Automated**: Send a POST request from your Worker:
44+
```typescript
45+
await fetch("https://api.github.com/repos/OWNER/REPO/dispatches", {
46+
method: "POST",
47+
body: JSON.stringify({
48+
event_type: "deep-research",
49+
client_payload: {
50+
query: "Find react agents",
51+
callback_url: "https://your.worker/callback",
52+
},
53+
}),
54+
});
55+
```
Lines changed: 38 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,38 @@
1+
---
2+
description: Run the full health suite and review results
3+
---
4+
5+
# Run Health Suite
6+
7+
// turbo-all
8+
9+
## Steps
10+
11+
1. Trigger the health check via API:
12+
13+
```bash
14+
curl -s -X POST https://core-github-api.126colby.workers.dev/api/health/run \
15+
-H "Content-Type: application/json" \
16+
-H "x-api-key: $API_KEY" | jq .
17+
```
18+
19+
2. Check the latest results:
20+
21+
```bash
22+
curl -s https://core-github-api.126colby.workers.dev/api/health/latest \
23+
-H "x-api-key: $API_KEY" | jq '.results[] | {category, name, status, ai_suggestion}'
24+
```
25+
26+
3. View run history:
27+
28+
```bash
29+
curl -s "https://core-github-api.126colby.workers.dev/api/health/history?limit=5" \
30+
-H "x-api-key: $API_KEY" | jq '.runs[] | {id: .run.id, status: .run.status, created: .run.created_at}'
31+
```
32+
33+
4. List dynamic test definitions:
34+
35+
```bash
36+
curl -s https://core-github-api.126colby.workers.dev/api/health/tests \
37+
-H "x-api-key: $API_KEY" | jq .
38+
```

.colbyignore

Lines changed: 25 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,25 @@
1+
# Dependencies
2+
node_modules/
3+
.pnpm-store/
4+
5+
# Cloudflare Wrangler
6+
.wrangler/
7+
.dev.vars
8+
dist/
9+
10+
# Logs
11+
*.log
12+
npm-debug.log*
13+
pnpm-debug.log*
14+
yarn-debug.log*
15+
yarn-error.log*
16+
17+
# IDEs and editors
18+
.idea/
19+
.vscode/
20+
*.swp
21+
*.swo
22+
23+
# OS-specific
24+
.DS_Store
25+
Thumbs.db

.dev.vars.example

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,11 @@
1+
GOOGLE_API_KEY=
2+
GEMINI_API_KEY=
3+
WORKER_API_KEY=
4+
GITHUB_TOKEN=
5+
AI_GATEWAY_URL=
6+
AI_GATEWAY_TOKEN=
7+
CLOUDFLARE_API_TOKEN=
8+
GITHUB_ACTION_CLOUDFLARE_ACCOUNT_ID=
9+
GITHUB_APP_ID=
10+
GITHUB_APP_PRIVATE_KEY=
11+

0 commit comments

Comments
 (0)