Skip to content

Latest commit

 

History

History
972 lines (646 loc) · 25.1 KB

File metadata and controls

972 lines (646 loc) · 25.1 KB

Playground Testing Guide

This document describes how to test every feature in the Agents SDK Playground. Each section covers a demo page with specific test steps and expected results.

Prerequisites

  1. Start the dev server: npm run start
  2. Open http://localhost:5173 in your browser
  3. Verify the home page loads with the feature grid

Core Demos

State Management (/core/state)

Tests real-time state synchronization between server and clients.

Test 1: Connection Status

  • Action: Navigate to /core/state
  • Expected: Connection status shows "Connected" with a green dot

Test 2: Counter Increment

  • Action: Click the +1 button
  • Expected:
    • Counter value increases by 1
    • Event log shows call → increment() followed by result ←
    • "Current State" JSON updates with new counter value

Test 3: Counter Decrement

  • Action: Click the -1 button
  • Expected: Counter value decreases by 1

Test 4: Set Counter (Server)

  • Action: Enter 42 in the number input, click Set (Server)
  • Expected:
    • Counter changes to 42
    • Log shows call → setCounter(42) and result ←

Test 5: Set Counter (Client)

  • Action: Enter 100 in the number input, click Set (Client)
  • Expected:
    • Counter changes to 100
    • Log shows setState → (client-side update, no server call)

Test 6: Add Item

  • Action: Type "Test Item" in the New Item input, click Add
  • Expected:
    • Item appears in the Items list
    • Items count increments

Test 7: Remove Item

  • Action: Click Remove next to an item
  • Expected: Item disappears from the list

Test 8: Reset State

  • Action: Click the red Reset button
  • Expected: Counter returns to 0, items list clears

Test 9: Multi-Tab Sync

  • Action: Open the same URL in a new tab, modify state in one tab
  • Expected: Both tabs show the same state (real-time sync)

Callable Methods (/core/callable)

Tests the @callable decorator and RPC functionality.

Test 1: Math Operations

  • Action: Enter 5 and 3, click add(5, 3)
  • Expected:
    • Log shows call → { method: "add", args: [5, 3] }
    • Result shows 8

Test 2: Multiply

  • Action: Click multiply(5, 3)
  • Expected: Result shows 15

Test 3: Echo

  • Action: Type "Hello World", click Echo
  • Expected: Log shows result "Hello World"

Test 4: Async Operation

  • Action: Set delay to 2000, click slowOperation(2000)
  • Expected:
    • Takes ~2 seconds to complete
    • Result shows "Completed after 2000ms"

Test 5: Error Handling

  • Action: Type "Something broke", click Throw Error
  • Expected:
    • Log shows error entry in red
    • Error message contains "Something broke"

Test 6: Get Timestamp

  • Action: Click getTimestamp()
  • Expected: Returns current ISO timestamp string

Test 7: List Methods

  • Action: Click listMethods()
  • Expected:
    • "Available Methods" card appears
    • Shows all callable methods with descriptions

Streaming RPC (/core/streaming)

Tests streaming responses from agent to client.

Test 1: Stream Numbers

  • Action: Set count to 10, click Stream 10 numbers
  • Expected:
    • Chunks appear one by one: {"number":1}, {"number":2}, etc.
    • Log shows multiple chunk ← entries
    • Final result shows {"total":10}

Test 2: Countdown with Delay

  • Action: Set countdown to 5, click Countdown from 5
  • Expected:
    • Numbers stream in with ~500ms delay between each
    • Shows "5...", "4...", "3...", "2...", "1...", "Liftoff!"
    • Takes approximately 2.5 seconds total

Test 3: Stream with Error

  • Action: Set "Error after" to 3, click Error after 3 chunks
  • Expected:
    • First 3 chunks arrive successfully
    • Error entry appears in log
    • Stream terminates

Test 4: Button State During Stream

  • Action: Start a stream
  • Expected: Button shows "Streaming..." and is disabled until complete

Scheduling (/core/schedule)

Tests delayed and recurring task scheduling.

Test 1: One-Time Task

  • Action: Set delay to 5 seconds, enter a message, click Schedule Task
  • Expected:
    • "Active Schedules" shows the new schedule
    • After 5 seconds, log shows schedule_executed ← with the message

Test 2: Recurring Task

  • Action: Set interval to 10 seconds, enter a label, click Schedule Recurring
  • Expected:
    • Schedule appears in Active Schedules
    • Every 10 seconds, log shows recurring_executed ←

Test 3: Cancel Task

  • Action: Click Cancel next to an active schedule
  • Expected:
    • Schedule disappears from Active Schedules
    • No more executions occur

Test 4: Refresh Schedules

  • Action: Click Refresh link
  • Expected: Active Schedules list updates

Connections (/core/connections)

Tests WebSocket connection management and broadcasting.

Test 1: Connection Count

  • Action: Navigate to /core/connections
  • Expected: Connected Clients shows 1

Test 2: Multi-Tab Count

  • Action: Click Open New Tab (or manually open another tab to same URL)
  • Expected:
    • Both tabs update to show 2 connected clients
    • Log shows connection_count ← 2

Test 3: Broadcast Message

  • Action: Type a message, click Broadcast
  • Expected:
    • Message appears in "Received Broadcasts" on ALL connected tabs
    • Includes timestamp

Test 4: Tab Close

  • Action: Close one of the tabs
  • Expected: Remaining tab updates to show 1 connected client

SQL Queries (/core/sql)

Tests direct SQL interaction with agent's SQLite database.

Test 1: List Tables

  • Action: Navigate to /core/sql
  • Expected: Tables list shows internal tables (e.g., cf_agents_state, cf_agents_schedules)

Test 2: View Table Schema

  • Action: Click on a table name (e.g., cf_agents_state)
  • Expected:
    • Schema card shows columns, types, and nullability
    • Query input updates to SELECT * FROM cf_agents_state LIMIT 10

Test 3: Execute Query

  • Action: Click Execute
  • Expected: Results card shows query results as JSON

Test 4: Insert Custom Data

  • Action: Enter a key (e.g., "test-key") and value (e.g., "test-value"), click Insert
  • Expected:
    • Record appears in the Custom Data list below
    • Tables list now includes playground_data

Routing Strategies (/core/routing)

Tests different agent naming patterns.

Test 1: Per-User Strategy

  • Action:
    1. Note your User ID (e.g., "user-abc123")
    2. Open a new tab with the same User ID
  • Expected:
    • Both tabs connect to the same agent instance
    • Connected Clients shows 2

Test 2: Change User ID

  • Action: Change User ID to something different
  • Expected:
    • Agent Instance name changes
    • Connection count resets to 1 (you're now on a different agent)

Test 3: Shared Strategy

  • Action: Select Shared strategy
  • Expected:
    • Agent instance changes to routing-shared
    • All tabs with Shared strategy connect to the same agent

Test 4: Per-Session Strategy

  • Action: Select Per-Session strategy
  • Expected:
    • Each tab connects to a different agent (based on session ID)
    • Opening new tab creates new session = new agent

Test 5: Strategy Persistence

  • Action: Change User ID, refresh the page
  • Expected: User ID persists (stored in localStorage)

Multi-Agent Demos

Supervisor Pattern (/multi-agent/supervisor)

Tests the manager-child agent pattern using getAgentByName().

Test 1: Connection

  • Action: Navigate to /multi-agent/supervisor
  • Expected: Connection status shows "Connected", stats show 0 children

Test 2: Create Child

  • Action: Click + Create Child
  • Expected:
    • New child card appears with ID like child-abc123
    • Counter shows 0
    • Stats update: Children = 1, Total Counter = 0
    • Log shows call → createChild("child-abc123") and result

Test 3: Increment Single Child

  • Action: Click +1 on a child card
  • Expected:
    • That child's counter increments
    • Total Counter in stats updates
    • Log shows call → incrementChild("child-abc123")

Test 4: Increment All

  • Action: Create multiple children, click +1 to All
  • Expected:
    • All children increment by 1
    • Total Counter updates to sum of all counters
    • Log shows call → incrementAll()

Test 5: Remove Child

  • Action: Click the × button on a child card
  • Expected:
    • Child disappears from the grid
    • Stats update accordingly

Test 6: Clear All

  • Action: Click Clear All link
  • Expected:
    • All children removed
    • Stats reset to 0

Test 7: Persistence

  • Action: Create children, refresh the page
  • Expected:
    • Children are preserved (supervisor tracks IDs in state)
    • Stats match previous values

Chat Rooms (/multi-agent/rooms)

Tests multi-agent chat with Lobby and Room agents.

Test 1: Lobby Connection

  • Action: Navigate to /multi-agent/rooms
  • Expected: Lobby shows "Connected", room list is empty or shows existing rooms

Test 2: Create Room

  • Action: Type "General" in room name, click Create
  • Expected:
    • Room appears in the list with 0 online
    • Log shows call → createRoom("General")

Test 3: Join Room

  • Action: Click on a room in the list
  • Expected:
    • Chat area shows room name
    • Room header shows "0 members" initially, then "1 members"
    • Log shows join_room → General

Test 4: Send Message

  • Action: Type a message, press Enter or click Send
  • Expected:
    • Message appears in chat area
    • Your messages appear on the right with dark background
    • Log shows send → <message>

Test 5: Multi-User Chat

  • Action: Click link to open new tab (or manually open same URL)
    1. In new tab, set different username
    2. Join same room
    3. Send messages from both tabs
  • Expected:
    • Both users see each other's messages in real-time
    • Members list shows both usernames
    • Member count updates in lobby room list

Test 6: Leave Room

  • Action: Click Leave button
  • Expected:
    • Chat area returns to "Select a room to start chatting"
    • Member count decreases for that room

Test 7: Room Persistence

  • Action: Refresh page
  • Expected:
    • Rooms persist (tracked in LobbyAgent state)
    • Messages persist (stored in RoomAgent)

Workers Pattern (/multi-agent/workers)

Documentation-only demo explaining fan-out parallel processing.

Test 1: Page Load

  • Action: Navigate to /multi-agent/workers
  • Expected:
    • Architecture diagram with ManagerAgent → Workers
    • "How It Works" explanation
    • Example code snippet
    • Use cases list

Pipeline Pattern (/multi-agent/pipeline)

Documentation-only demo explaining chain of responsibility.

Test 1: Page Load

  • Action: Navigate to /multi-agent/pipeline
  • Expected:
    • Architecture diagram with linear agent chain
    • "How It Works" explanation
    • Example code snippet
    • Variations section (Linear, Branching, Saga, Async)
    • Considerations notes

Workflow Demos

Workflow Simulation (/workflow/basic)

Interactive demo that simulates multi-step workflow execution with automatic step progression.

Test 1: Connection

  • Action: Navigate to /workflow/basic
  • Expected: Connection status shows "Connected", no workflows running

Test 2: Start Workflow

  • Action: Enter a workflow name (e.g., "Data Processing"), set step count to 4, click Start Workflow
  • Expected:
    • Workflow appears in "Running" section
    • Visual step pipeline shows Step 1 as running (spinner icon)
    • Event log shows startWorkflow →, workflow_started ←

Test 3: Watch Step Progression

  • Action: Wait and observe the workflow
  • Expected:
    • Steps complete one by one (1-2 seconds each)
    • Completed steps show checkmark, current step shows spinner
    • Connection lines turn solid as steps complete
    • Event log shows workflow_step_complete ← for each step
    • When all steps complete, workflow_complete ← appears

Test 4: Start Multiple Workflows

  • Action: Start 2-3 workflows with different names
  • Expected:
    • All workflows appear in Running section
    • Each progresses independently
    • Completed workflows move to History section

Test 5: Cancel Workflow

  • Action: Start a workflow, then click the X button before it completes
  • Expected:
    • Workflow status changes to "cancelled"
    • Workflow moves to History section
    • Event log shows cancelWorkflow →, workflow_cancelled ←

Test 6: Clear History

  • Action: After some workflows complete, click Clear in the History section
  • Expected:
    • Resolved workflows are removed
    • Running workflows remain
    • Event log shows clearWorkflows →, cleared ←

Approval Workflow (/workflow/approval)

Interactive demo that simulates human-in-the-loop approval patterns.

Test 1: Connection

  • Action: Navigate to /workflow/approval
  • Expected: Connection status shows "Connected", no pending approvals

Test 2: Submit Approval Request

  • Action: Enter a title and description, click Submit Request
  • Expected:
    • Request appears in "Pending Approval" section with yellow indicator
    • Shows Approve and Reject buttons
    • Event log shows requestApproval →, approval_requested ←

Test 3: Use Quick Presets

  • Action: Click one of the preset request buttons (e.g., "Deploy v2.0 to Production")
  • Expected:
    • Title and description fields are populated
    • Can submit the preset request

Test 4: Approve Request

  • Action: Click Approve on a pending request
  • Expected:
    • Request moves to History with green indicator
    • Shows "Approved at [time]"
    • Event log shows approve →, approval_approved ←

Test 5: Reject Request

  • Action: Click Reject on a pending request
  • Expected:
    • Reject reason input appears
    • After clicking "Confirm Reject", request moves to History with red indicator
    • Shows "Rejected at [time]" with reason
    • Event log shows reject →, approval_rejected ←

Test 6: Multiple Pending Requests

  • Action: Submit 3-4 requests without approving
  • Expected:
    • All appear in Pending Approval section
    • Can approve/reject each independently

Test 7: Clear History

  • Action: Resolve several requests, then click Clear in History
  • Expected:
    • Resolved requests are removed from History
    • Pending requests remain

Email Demos

Receive Emails (/email/receive)

Tests receiving emails via Cloudflare Email Routing. Requires deployment for real email testing.

Test 1: Connection

  • Action: Navigate to /email/receive
  • Expected: Connection status shows "Connected", empty inbox, stats show 0

Test 2: Local Dev Banner

  • Action: Observe the page when running locally
  • Expected: Warning banner indicates email features require deployment

Test 3: Stats Display

  • Action: Observe the Stats panel
  • Expected: Shows Inbox count and Total received count

Test 4: Receive Email (Deployed Only)

  • Action: Send an email to receive+demo@yourdomain.com
  • Expected:
    • Email appears in Inbox list
    • Stats update (Inbox +1, Total +1)
    • Log shows state_update ←

Test 5: View Email Detail

  • Action: Click on an email in the Inbox
  • Expected:
    • Detail panel shows subject, from, to, date
    • Email body displayed below
    • Headers expandable via details toggle

Test 6: Close Email Detail

  • Action: Click the × button on the detail panel
  • Expected: Detail panel closes

Secure Email Replies (/email/secure)

Tests HMAC-signed email replies for secure routing.

Test 1: Connection

  • Action: Navigate to /email/secure
  • Expected: Connection status shows "Connected", Inbox/Outbox tabs visible, stats show 0

Test 2: Inbox/Outbox Tabs

  • Action: Click between Inbox and Outbox tabs
  • Expected: Tab content switches, counts shown in tab labels

Test 3: Toggle Auto-Reply

  • Action: Toggle the "Auto-reply with signed headers" switch
  • Expected:
    • Log shows toggleAutoReply →
    • Setting persists in agent state

Test 4: Receive Email with Auto-Reply (Deployed Only)

  • Action: Send an email to secure+demo@yourdomain.com with auto-reply enabled
  • Expected:
    • Email appears in Inbox
    • Signed reply appears in Outbox with green checkmark
    • Reply has "Re:" prefix in subject

Test 5: View Signed Reply

  • Action: Switch to Outbox tab, click on a reply
  • Expected:
    • Detail shows the reply body
    • Green "Signed" badge displayed
    • Note about X-Agent-* headers shown

Test 6: Secure Reply Routing (Deployed Only)

  • Action: Reply to a signed email from your email client
  • Expected:
    • Reply is routed back to the same agent instance
    • Email shows lock icon indicating "Secure Reply"

Test 7: Clear Emails

  • Action: Click Clear all emails
  • Expected:
    • Both inbox and outbox are cleared
    • Log shows clearEmails →

Email Setup (Deployment)

To test with real emails:

  1. Deploy: npm run deploy
  2. Set secret: wrangler secret put EMAIL_SECRET
  3. Configure Cloudflare Dashboard → Email → Email Routing
  4. Add routing rule for your domain to this Worker
  5. Send emails to:
    • receive+instanceId@yourdomain.com for ReceiveEmailAgent
    • secure+instanceId@yourdomain.com for SecureEmailAgent

Readonly Connections (/core/readonly)

Tests read-only WebSocket connections that can observe but not modify state.

Test 1: Dual Panel Layout

  • Action: Navigate to /core/readonly
  • Expected: Two side-by-side panels — "Editor (read-write)" on the left, "Viewer (readonly)" on the right

Test 2: Editor Increment

  • Action: Click +1 on the Editor panel
  • Expected:
    • Counter increases on BOTH panels (state syncs to viewer)
    • "Last updated by" shows the update source

Test 3: Viewer Blocked (Callable)

  • Action: Click +1 on the Viewer panel
  • Expected: Error toast appears — readonly connections cannot call methods that write state

Test 4: Viewer Blocked (Client setState)

  • Action: Click +10 on the Viewer panel
  • Expected: Error toast appears — client-side setState is also blocked for readonly connections

Test 5: Check Permissions (Always Allowed)

  • Action: Click Check Permissions on the Viewer panel
  • Expected: Info toast shows canEdit = false — non-mutating RPCs work on readonly connections

Test 6: Toggle Readonly

  • Action: Uncheck the Lock checkbox on the Viewer panel
  • Expected:
    • Badge changes to "Viewer (read-write)"
    • Viewer can now increment and modify state

Retries (/core/retry)

Tests retry operations with exponential backoff and selective retry.

Test 1: Flaky Operation (Succeeds)

  • Action: Set "Succeed on attempt" to 3, click Run Flaky Operation
  • Expected:
    • Log shows attempts 1 and 2 failing
    • Attempt 3 succeeds
    • Result appears in log

Test 2: Flaky Operation (Exhausted)

  • Action: Set "Succeed on attempt" to 10, click Run Flaky Operation
  • Expected:
    • Log shows attempts failing (class default is 4 max attempts)
    • Final error after all retries exhausted

Test 3: Selective Retry (Transient)

  • Action: Set "Failures before success" to 2, leave "Permanent error" unchecked, click Run Filtered Retry
  • Expected:
    • Transient errors are retried
    • Succeeds after 2 failures

Test 4: Selective Retry (Permanent)

  • Action: Check Permanent error, click Run Filtered Retry
  • Expected:
    • shouldRetry returns false immediately
    • No retries — error appears after first attempt

Test 5: Queue with Retry

  • Action: Set "Max attempts" to 3, click Queue Task
  • Expected:
    • Task is queued (log shows queued ID)
    • Retry attempts stream in via log messages
    • Succeeds on last attempt

Test 6: Clear Logs

  • Action: Click Clear Logs
  • Expected: All log entries clear

AI Demos

AI Chat (/ai/chat)

This is a documentation-focused demo explaining AIChatAgent.

Test 1: Page Load

  • Action: Navigate to /ai/chat
  • Expected:
    • Feature cards display (Message Persistence, Stream Resumption, etc.)
    • Setup requirements listed
    • useAgentChat hook properties documented

Client-Side Tools (/ai/tools)

Documentation demo for client-side tool execution.

Test 1: Page Load

  • Action: Navigate to /ai/tools
  • Expected:
    • Explanation of server-side vs client-side tools
    • Example flow with numbered steps
    • Confirm/Cancel button mockup visible

Codemode (/ai/codemode)

Tests AI code generation and execution using the CodeAct pattern.

Test 1: Connection

  • Action: Navigate to /ai/codemode
  • Expected: Connection status shows "Connected", empty state with "Try Codemode" prompt suggestions

Test 2: Send Message

  • Action: Type "What is 17 + 25?" and press Enter
  • Expected:
    • User message appears on the right
    • Assistant responds with a tool card showing code execution
    • Expanding the tool card shows the generated code and result
    • Text response includes the answer

Test 3: Tool Card Expansion

  • Action: Click on a collapsed tool card (e.g., "Ran code")
  • Expected:
    • Card expands to show Code, Result, and Console sections
    • Code section shows the generated JavaScript
    • Result shows the output

Test 4: Streaming

  • Action: Send a message and observe
  • Expected:
    • Send button shows loading spinner during streaming
    • Text streams in progressively
    • Input is disabled while streaming

Test 5: Clear History

  • Action: Click the trash icon
  • Expected: All messages clear, returns to empty state

MCP Demos

MCP Server (/mcp/server)

Documentation for creating MCP servers.

Test 1: Page Load

  • Action: Navigate to /mcp/server
  • Expected:
    • What is MCP explanation
    • Tools/Resources/Prompts feature cards
    • How It Works steps

MCP Client (/mcp/client)

Documentation for connecting to MCP servers.

Test 1: Page Load

  • Action: Navigate to /mcp/client
  • Expected:
    • API method cards (addMcpServer, mcp.listTools, etc.)
    • Connection options code snippet

MCP OAuth (/mcp/oauth)

Documentation for OAuth authentication with MCP.

Test 1: Page Load

  • Action: Navigate to /mcp/oauth
  • Expected:
    • OAuth flow steps listed
    • Server states table (not-connected, authenticating, etc.)
    • Client-side handling code snippet

Global UI Tests

Dark Mode Toggle

Test 1: Toggle Dark Mode

  • Action: Click the theme toggle in the sidebar footer
  • Expected:
    • Cycles through: System → Light → Dark → System
    • UI immediately updates colors
    • Background, cards, inputs, buttons all change

Test 2: Persistence

  • Action: Set to Dark mode, refresh the page
  • Expected: Dark mode persists (stored in localStorage)

Test 3: System Preference

  • Action: Set to System, change OS dark mode setting
  • Expected: App follows system preference

Sidebar Navigation

Test 1: Category Collapse

  • Action: Click on a category header (e.g., "CORE")
  • Expected: Category collapses/expands

Test 2: Active State

  • Action: Click on a demo link
  • Expected: Link highlights with active styling

Test 3: External Links

  • Action: Click GitHub or Docs links in footer
  • Expected: Opens in new tab

Event Log Panel

Present on all interactive demos (State, Callable, Streaming, Schedule, Connections, SQL, Routing, Readonly, Retry, Email Receive, Email Secure).

Test 1: Auto-Scroll

  • Action: Trigger many events rapidly
  • Expected: Log panel auto-scrolls to bottom

Test 2: Clear Logs

  • Action: Click the trash icon
  • Expected: All log entries clear, shows "No events yet"

Test 3: Log Entry Types

  • Expected Colors:
    • (outgoing): Blue background
    • (incoming): Green background
    • (error): Red background
    • (info): No background

Error Scenarios

Connection Failure

Test 1: Server Not Running

  • Action: Stop the dev server, refresh the page
  • Expected: Connection status shows "Connecting..." indefinitely

Invalid Input

Test 1: Empty Item

  • Action: Click Add with empty input
  • Expected: Nothing happens (validation prevents empty items)

Test 2: Non-Numeric Counter

  • Action: Enter "abc" in counter input, click Set
  • Expected: Counter becomes NaN or 0 (depending on parseInt behavior)

Performance Checks

Large State

  • Action: Add 100+ items via the State demo
  • Expected:
    • No UI lag
    • State syncs correctly
    • JSON display remains responsive

Rapid Operations

  • Action: Click increment button rapidly (20+ times)
  • Expected:
    • All operations complete
    • Final count is accurate
    • Log shows all calls