Skip to content

Latest commit

 

History

History
273 lines (222 loc) · 16.1 KB

File metadata and controls

273 lines (222 loc) · 16.1 KB

AGENTS.md

Project Overview

Interview Coder CN (编码面试解题助手) is a desktop application that captures screenshots of coding problems and uses AI (vision models) to generate solutions in real-time. The window is invisible to screen-sharing software, making it suitable for use during coding interviews and online assessments.

Key capabilities:

  • Global shortcuts trigger screenshot capture → AI analysis → streamed solution display
  • Frameless, transparent, always-on-top overlay window invisible to screen-sharing
  • Mouse passthrough mode (window ignores mouse events)
  • Multi-screenshot conversation continuity (append screenshots to existing context)
  • Follow-up questions within the same conversation
  • Real-time speech transcription (DashScope Fun-ASR) — transcribed text is attached to screenshots when sent to AI
  • Configurable AI provider (OpenAI, SiliconFlow, OpenRouter, or any OpenAI-compatible API)

Tech Stack

Layer Technology
Framework Electron 37 (electron-vite 4)
Frontend React 19, TypeScript 5.8
Styling Tailwind CSS v4, shadcn/ui (New York style), Radix primitives
State Zustand 5 (5 stores, 2 with localStorage persistence)
Routing react-router v7 (HashRouter, 3 routes)
AI Vercel AI SDK (ai + @ai-sdk/openai), streaming via streamText()
Build electron-vite (Vite 7), electron-builder 25
Linting ESLint 9 (flat config), Prettier

Directory Structure

src/
├── main/                    # Electron main process
│   ├── index.ts             # App entry: lifecycle, error handling, app.whenReady()
│   ├── main-window.ts       # BrowserWindow creation (frameless, transparent, always-on-top)
│   ├── shortcuts.ts         # Global shortcuts registration + AI streaming orchestration (largest file)
│   ├── ai.ts                # Vercel AI SDK integration, 3 streaming functions
│   ├── settings.ts          # App settings object + IPC handlers
│   ├── state.ts             # App state object + IPC handlers
│   ├── take-screenshot.ts   # desktopCapturer → base64 PNG
│   ├── transcription.ts     # DashScope WebSocket real-time speech-to-text
│   ├── auto-updater.ts      # electron-updater (non-macOS only)
│   ├── prompts.md           # System prompt for AI (copied to build output via vite-plugin-static-copy)
│   └── index.d.ts           # global.mainWindow type declaration
├── preload/
│   ├── index.ts             # contextBridge API: exposes window.api to renderer
│   └── index.d.ts           # Type declarations for window.electron and window.api
└── renderer/
    ├── index.html            # SPA entry
    └── src/
        ├── main.tsx          # React root render
        ├── App.tsx           # Router + settings sync + shortcut init + Toaster
        ├── coder/            # Main page: screenshot display + AI solution stream
        │   ├── index.tsx     # CoderPage layout + state sync + transcription lifecycle
        │   ├── AppHeader.tsx # Draggable title bar with nav buttons
        │   ├── AppContent.tsx# Screenshots gallery + markdown solution + error banner
        │   ├── AppStatusBar.tsx    # Loading indicator, follow-up dialog, shortcut hints
        │   ├── TranscriptionBar.tsx # Absolute-positioned real-time transcription overlay
        │   └── PrerequisitesChecker.tsx  # Modal for API key setup
        ├── settings/         # Settings page
        │   ├── index.tsx     # AI config, coding, appearance, shortcuts, privacy
        │   ├── SelectLanguage.tsx  # Combobox with custom language input
        │   ├── SelectModel.tsx     # Combobox with custom model input
        │   └── CustomShortcuts.tsx # Shortcut key recorder
        ├── help/             # Help page
        │   ├── index.tsx     # Quick start guide, shortcuts, FAQ
        │   ├── Shortcuts.tsx
        │   ├── FAQ.tsx
        │   └── components/index.tsx  # HelpSection wrapper
        ├── components/
        │   ├── MarkdownRenderer.tsx   # react-markdown + remark-gfm + rehype-highlight
        │   ├── ShortcutRenderer.tsx   # Platform-aware shortcut key badges
        │   └── ui/           # shadcn/ui primitives (button, dialog, select, etc.)
        ├── lib/
        │   ├── store/        # Zustand stores
        │   │   ├── app.ts       # ignoreMouse state, synced from main process
        │   │   ├── settings.ts  # API config, model, language, opacity (persisted v4)
        │   │   ├── shortcuts.ts # Shortcut bindings (persisted v5, with migration)
        │   │   ├── solution.ts  # Loading state, solution chunks, screenshots, errors
        │   │   └── transcription.ts # Transcription state: isTranscribing, text, error
        │   ├── utils/
        │   │   ├── index.ts     # cn() helper, getCloneableFields()
        │   │   ├── env.ts       # isMac, platformAlt
        │   │   └── keyboard.ts  # Accelerator string conversion
        │   └── audio-capture.ts # System audio capture via getDisplayMedia for transcription
        └── assets/
            ├── base.css      # Tailwind @import, CSS variables, app layout styles
            └── main.css      # Tailwind + typography plugin + theme variables (oklch)

Architecture

Process Model

┌─────────────────────────────────────────────────────┐
│  Main Process (src/main/)                           │
│  ┌──────────┐  ┌──────────┐  ┌───────────────────┐ │
│  │ settings │  │  state   │  │    shortcuts.ts    │ │
│  │   .ts    │  │   .ts    │  │  (orchestrator)   │ │
│  └────┬─────┘  └────┬─────┘  │  - global hotkeys │ │
│       │              │        │  - AI streaming   │ │
│       │              │        │  - conversation   │ │
│       │              │        │    management     │ │
│       │              │        └──┬───────────┬────┘ │
│       │              │           │           │      │
│       │              │     ┌─────┴──┐  ┌─────┴────┐ │
│       │              │     │ ai.ts  │  │take-     │ │
│       │              │     │        │  │screenshot│ │
│       │              │     └────────┘  └──────────┘ │
│       └──────────────┼───────────┘                  │
│              IPC (ipcMain.handle)                    │
├─────────────────────────────────────────────────────┤
│  Preload (src/preload/)                             │
│  contextBridge → window.api                         │
├─────────────────────────────────────────────────────┤
│  Renderer (src/renderer/)                           │
│  React app with Zustand stores                      │
│  window.api.on*() for events from main              │
│  window.api.*() for invoke calls to main            │
└─────────────────────────────────────────────────────┘

Data Flow: Screenshot → Solution

  1. User presses global shortcut (e.g., Alt+Enter on macOS)
  2. shortcuts.ts callback triggers takeScreenshot()desktopCapturer → base64 PNG
  3. Main sends screenshot-taken and ai-loading-start to renderer
  4. Main calls getSolutionStream(base64Image) → Vercel AI SDK streamText()
  5. Stream chunks sent to renderer via solution-chunk IPC events
  6. Renderer accumulates chunks in useSolutionStore and renders via MarkdownRenderer
  7. On completion: solution-complete; on error: solution-error; on abort: solution-stopped

IPC Channels

Renderer → Main (invoke):

  • getAppSettings / updateAppSettings — settings CRUD
  • updateAppState — sync inCoderPage, ignoreMouse
  • initShortcuts / getShortcuts / updateShortcuts — shortcut management
  • stopSolutionStream — abort current AI stream
  • sendFollowUpQuestion — follow-up within conversation
  • start-transcription / stop-transcription — speech transcription lifecycle
  • get-transcription-text / clear-transcription-text — read/clear accumulated text

Main → Renderer (send):

  • sync-app-state — push state changes (e.g., mouse ignore toggle)
  • screenshot-taken / screenshots-updated — screenshot data
  • solution-clear / solution-chunk / solution-complete / solution-stopped / solution-error — AI streaming lifecycle
  • ai-loading-start / ai-loading-end — loading state
  • scroll-page-up / scroll-page-down — keyboard-driven scroll
  • toggle-transcription — trigger start/stop transcription from shortcut
  • transcription-text / transcription-error / transcription-stopped / transcription-cleared — transcription events

Zustand Stores

Store File Persisted Key State
useSettingsStore lib/store/settings.ts Yes (v4) apiBaseURL, apiKey, model, customModels, codeLanguage, opacity, customPrompt, dashscopeApiKey
useShortcutsStore lib/store/shortcuts.ts Yes (v5) shortcuts (action → key mapping with categories)
useSolutionStore lib/store/solution.ts No isLoading, solutionChunks, screenshotData, errorMessage
useTranscriptionStore lib/store/transcription.ts No isTranscribing, transcriptionText, errorMessage
useAppStore lib/store/app.ts No ignoreMouse

Settings are bidirectionally synced: renderer persists to localStorage, and on mount syncs to main process via updateAppSettings(). Main process .env values serve as initial defaults only.

Key Patterns & Conventions

Window Stealth

The app is designed to be invisible to screen-sharing software:

  • BrowserWindow options: transparent: true, frame: false, skipTaskbar: true
  • setContentProtection(true) prevents screen capture of the window itself
  • setVisibleOnAllWorkspaces(true, { visibleOnFullScreen: true, skipTransformProcessType: true })
  • keepWindowInFront() repeatedly reasserts always-on-top for 5 seconds after show
  • showInactive() on macOS/Windows to avoid stealing focus

AI Integration

  • All AI calls go through src/main/ai.ts using Vercel AI SDK's streamText()
  • Provider: @ai-sdk/openai with custom baseURL (works with any OpenAI-compatible API)
  • Model fallback: Qwen/Qwen3-VL-32B-Instruct for SiliconFlow, gpt-5-mini otherwise
  • System prompt is loaded from prompts.md at build time (bundled via vite-plugin-static-copy)
  • Three streaming functions: getSolutionStream (first screenshot), getFollowUpStream (follow-up), getGeneralStream (multi-screenshot)
  • Conversation history (conversationMessages) is maintained in shortcuts.ts as ModelMessage[]

Stream Abort Pattern

  • StreamContext with AbortController and reason ('user' | 'new-request')
  • New requests automatically abort previous streams
  • User can manually stop via shortcut or UI button
  • Abort reason determines which IPC event to send (solution-stopped for user, silent for new-request)

Real-time Speech Transcription

  • Uses DashScope (Alibaba Cloud) Fun-ASR real-time ASR via WebSocket (src/main/transcription.ts)
  • Requires a separate dashscopeApiKey configured in settings
  • Audio is captured in the renderer via getDisplayMedia() (system audio), downsampled to 16kHz PCM, and streamed to main process via IPC
  • TranscriptionBar is absolute-positioned at the top of the coder page, shows up to 3 lines with auto-scroll
  • On screenshot (takeScreenshot / appendScreenshot), accumulated transcription text is automatically attached to the AI prompt, then cleared
  • clearTranscription shortcut clears text without submitting to AI
  • Transcription shortcuts are disabled in settings UI when dashscopeApiKey is not configured

Shortcut System

  • Global shortcuts registered via Electron's globalShortcut API
  • Renderer stores shortcut config in Zustand (persisted); sends to main on init
  • On Windows, Alt-based shortcuts also register Ctrl+Alt variant for compatibility
  • Shortcut actions are string-keyed callbacks in shortcuts.ts
  • Default shortcuts use platformAlt (Alt on macOS, CommandOrControl on Windows/Linux)

UI Component Patterns

  • shadcn/ui components in src/renderer/src/components/ui/ — do NOT edit these directly, use the shadcn CLI to add/update
  • cn() utility (clsx + tailwind-merge) for conditional class merging
  • getCloneableFields() strips functions from store state before sending over IPC
  • Platform-aware shortcut display via ShortcutRenderer (⌘, ⌥, ⇧ on Mac; Ctrl, Alt, Shift on Windows)

Development

Commands

npm install          # Install dependencies
npm run dev          # Start in dev mode (electron-vite dev)
npm run build        # Typecheck + build (electron-vite build)
npm run build:mac    # Build macOS distributable
npm run build:win    # Build Windows distributable
npm run typecheck    # Run TypeScript type checking (node + web)
npm run lint         # Run ESLint
npm run format       # Run Prettier

Configuration

The .env file at project root configures the AI provider:

API_BASE_URL="https://openrouter.ai/api/v1"  # OpenAI-compatible API endpoint
API_KEY="sk-..."                               # API key
MODEL="gpt-5-mini"                             # Optional: override default model

These are read by dotenv in the main process and merged with renderer-side settings (renderer settings take priority when set).

Path Aliases

  • @renderer/* and @/* both resolve to src/renderer/src/*
  • Configured in tsconfig.web.json and electron.vite.config.ts

Code Style

  • Prettier: single quotes, no semicolons, 100 char print width, no trailing commas
  • ESLint: TypeScript + React + React Hooks + React Refresh rules
  • UI text and user-facing strings are in Chinese (中文)
  • Code comments and variable names are in English

Important Notes for AI Agents

  1. Three TypeScript configs: tsconfig.node.json (main + preload), tsconfig.web.json (renderer). The root tsconfig.json is a project references file only.

  2. prompts.md is bundled: It lives in src/main/ but gets copied to the build output via vite-plugin-static-copy. Loaded at runtime with readFileSync(join(import.meta.dirname, 'prompts.md')).

  3. global.mainWindow: The main window reference is stored as a global variable, declared in src/main/index.d.ts.

  4. Settings flow: .env → main process settings object → renderer reads on mount via IPC → renderer persists to localStorage via Zustand. Renderer-side changes are sent back to main via updateAppSettings.

  5. No shared types directory: Main process types (AppSettings, AppState) are imported directly by the preload script from ../main/settings and ../main/state. This works because preload shares the Node.js tsconfig.

  6. Streaming orchestration is in shortcuts.ts: Despite the filename, this 580+ line file is the central orchestrator for both global shortcuts AND AI streaming logic. It manages conversation history, abort controllers, and IPC communication for the entire AI workflow.

  7. Window movement: The window can be moved via keyboard shortcuts in 200px steps (up/down/left/right).

  8. macOS auto-update is disabled: publish: null in electron-builder.yml for mac target. Auto-update only works on Windows.