Skip to content

feat(chat): tiered local AI model strategy for guest users#100

Merged
fcxmarquez merged 24 commits into
mainfrom
franciscomarquezsolt/fcx-107-implement-tiered-ai-model-strategy-for-guest-users
Apr 20, 2026
Merged

feat(chat): tiered local AI model strategy for guest users#100
fcxmarquez merged 24 commits into
mainfrom
franciscomarquezsolt/fcx-107-implement-tiered-ai-model-strategy-for-guest-users

Conversation

@fcxmarquez
Copy link
Copy Markdown
Owner

@fcxmarquez fcxmarquez commented Apr 19, 2026

Summary

  • Adds a Local provider with local-auto model that runs entirely in the browser — no API key required
  • Detects device capabilities (WebGPU, RAM, CPU cores) and selects the best model tier automatically: Gemma 4 E4B (WebGPU high), Qwen 3.5 0.8B (WebGPU low), or SmolLM2 135M (WASM CPU fallback)
  • Runs inference in a dedicated Web Worker via @huggingface/transformers (ONNX), keeping the UI thread unblocked
  • Model weights are downloaded once and cached in the browser's Cache Storage for subsequent sessions

Test plan

  • Select "Auto (Local)" model and send a message — verify download toast appears on first run
  • Verify second message uses cached model (no download toast)
  • Test stop generation button mid-stream
  • Verify cloud models (OpenAI, Anthropic, Google) still work normally
  • Test on a device without WebGPU — verify CPU fallback (SmolLM2) is selected

Summary by CodeRabbit

  • New Features

    • Added support for running AI models locally in the browser without requiring API keys.
    • Added consent dialog for downloading and managing local models with size information.
    • Added ability to clear API keys in settings.
  • Improvements

    • Updated chat input placeholder text to "Ask anything."
    • Settings UI now clearly indicates that local models don't require API keys, while cloud models do.

@vercel
Copy link
Copy Markdown
Contributor

vercel Bot commented Apr 19, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
circle-ai Ready Ready Preview, Comment, Open in v0 Apr 20, 2026 2:09am

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 19, 2026

Important

Review skipped

Auto incremental reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 4e725f89-b0c9-4b51-9c02-eb6680507b6c

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review

Walkthrough

This PR adds local model inference capabilities to the chat application. It introduces server-side rejection of local model requests, client-side consent management, device capability detection, Web Worker-based inference, and updated model configuration logic to differentiate between local and cloud providers.

Changes

Cohort / File(s) Summary
API Endpoint Guard
app/api/chat/route.ts
Added early validation to reject local model requests with a 400 error, preventing local models from being called server-side.
UI Components – Chat & Input
components/Inputs/InputChat/index.tsx, components/chat/ChatArea/index.tsx
Updated chat input placeholder text and integrated local model consent workflow into ChatArea with toast notifications for model status (downloading/ready/idle) and new consent dialog.
UI Components – Modals
components/modals/local-model-consent/index.tsx, components/modals/settings/index.tsx
Added new LocalModelConsentDialog component for user approval and updated settings modal to remove API key gating, add clear buttons for API key fields, and reflect that local models don't require keys.
Model Type & Availability Logic
constants/models.ts, lib/chat/config.ts, lib/chat/config.test.ts
Extended ModelProvider to include "Local", added local-auto model entry, updated DEFAULT_MODEL and availability filtering to suppress local models when cloud API keys are present, added isLocalModel() helper.
Chat Hooks & State
hooks/useCircleChat.ts, hooks/useLocalModelConsent.ts
Added useLocalModelConsent() hook for consent state management and extended useCircleChat() to support local model streaming with status tracking and conditional message routing.
Local Model Runtime Infrastructure
lib/local/capabilities.ts, lib/local/consent.ts, lib/local/inference.worker.ts, lib/local/localTransport.ts
New modules providing device capability detection, consent persistence, Web Worker-based inference pipeline management, and browser-to-worker communication for local chat generation.
Chat Service & Shared Prompts
lib/langchain/chatService.ts, lib/chat/prompt.ts
Extracted DEFAULT_SYSTEM_PROMPT to shared module, added CloudProvider type, and implemented server-side guard to reject local models in buildChatModel().
Build & Store Configuration
next.config.js, package.json, store/index.ts, store/slices/configSlice.ts
Added @huggingface/transformers dependency, configured external packages and webpack aliases for server-side isolation, bumped store schema to v5 with migration prepending local-auto to enabled models, and added cache-clearing logic on API key changes.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant UI as ChatArea<br/>(React)
    participant LocalConsent as useLocalModelConsent<br/>(Hook)
    participant LocalTransport as localTransport<br/>(Browser)
    participant Worker as inference.worker<br/>(Web Worker)
    participant HF as HuggingFace<br/>Transformers
    
    User->>UI: Send message (local model)
    UI->>LocalConsent: requestConsent()
    LocalConsent->>LocalConsent: resolveLocalModelSpec()
    LocalConsent->>UI: Show consent dialog
    User->>UI: Confirm download
    LocalConsent->>LocalConsent: setLocalModelConsent()
    UI->>LocalTransport: streamLocalChatRequest(spec, messages)
    LocalTransport->>Worker: POST generate message<br/>(modelId, device, dtype, messages)
    Worker->>HF: Load/cache pipeline
    HF-->>Worker: TextGenerationPipeline ready
    Worker->>HF: Stream generation + callbacks
    HF-->>Worker: progress events (download %)
    Worker-->>LocalTransport: progress message
    LocalTransport->>UI: onProgress callback
    HF-->>Worker: text chunks
    Worker-->>LocalTransport: chunk message
    LocalTransport->>UI: onChunk callback + accumulate
    HF-->>Worker: completion
    Worker-->>LocalTransport: complete message
    LocalTransport-->>UI: Resolved string
    UI->>User: Display response
Loading
sequenceDiagram
    participant User
    participant UI as ChatArea<br/>(React)
    participant API as /api/chat<br/>(Route Handler)
    participant Service as chatService<br/>(LangChain)
    participant Provider as Cloud Provider<br/>(OpenAI/Anthropic/Google)
    
    User->>UI: Send message (cloud model)
    UI->>API: POST /api/chat<br/>(selectedModel, messages)
    API->>API: Check isLocalModel()<br/>➜ false, continue
    API->>Service: buildChatModel(config)
    Service->>Service: Validate API key present
    Service->>Service: Create LLM instance
    API->>Provider: Stream chat completion
    Provider-->>API: Response chunks
    API-->>UI: SSE stream
    UI->>User: Display response
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

Poem

🐰 Hop hop, the workers dance,
Local models get their chance,
Consent dialogs bloom so bright,
Browser brains now run at night,
No keys needed, all looks right! 🎉

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The PR title 'feat(chat): tiered local AI model strategy for guest users' accurately captures the main feature: introducing tiered local models for guest users who lack API keys.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch franciscomarquezsolt/fcx-107-implement-tiered-ai-model-strategy-for-guest-users

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@fcxmarquez fcxmarquez self-assigned this Apr 19, 2026
When a local model is selected, resolveLocalModelSpec() was called once
in requestConsent() and again inside runLocalStream. Now the spec
resolved during consent is passed directly to sendMessage and forwarded
to runLocalStream, which only falls back to re-resolving if no spec is
provided.
@fcxmarquez fcxmarquez marked this pull request as ready for review April 20, 2026 01:52
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 7

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
components/modals/settings/index.tsx (1)

208-250: ⚠️ Potential issue | 🟠 Major

Apply the same availability filtering when saving settings.

onSubmit only removes models missing provider keys, so local-auto survives when any API key is present. Example: with only local-auto enabled, adding an Anthropic key saves selectedModel: "local-auto" even though getAvailableModels() suppresses Local models in that state. Use getAvailableModels(data) as the saved enabled list, and disable/hide Local options while keys are present.

🛠️ Proposed fix direction
 import {
   getAvailableModels,
   getConfigIssues,
+  hasAnyApiKey,
   hasRequiredKeyForModel,
 } from "@/lib/chat/config";
-    const enabledModels =
-      issues.enabledModelsMissingKeys.length > 0
-        ? data.enabledModels.filter(
-            (m) => !issues.enabledModelsMissingKeys.includes(m as ModelValue)
-          )
-        : data.enabledModels;
+    const enabledModels = getAvailableModels(data);
 
     if (enabledModels.length === 0) {
       toast.error("Please enable at least one model.");
       return;
     }
     const hasRequiredKey = hasRequiredKeyForModel(option.value, {
       openAIKey: watchedValues.openAIKey,
       anthropicKey: watchedValues.anthropicKey,
       googleKey: watchedValues.googleKey,
     });
+    const suppressLocal =
+      hasAnyApiKey(watchedValues) && option.provider === "Local";
     return {
       value: option.value,
       label: option.label,
       badge: option.provider,
-      disabled: !hasRequiredKey,
+      disabled: !hasRequiredKey || suppressLocal,
     };
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@components/modals/settings/index.tsx` around lines 208 - 250, The form submit
currently filters only models missing keys via getConfigIssues, allowing
local-auto to remain enabled when provider keys exist; update onSubmit to
compute the saved enabledModels using getAvailableModels(data) (or equivalent
availability function) so models suppressed by getAvailableModels are removed
before setConfig, and ensure selectedModel is validated against that filtered
list; reference onSubmit, getConfigIssues, getAvailableModels,
hasRequiredKeyForModel, MODEL_OPTIONS and setConfig when making the change and
keep the existing toast/handleClose behavior.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@hooks/useCircleChat.ts`:
- Around line 38-49: The localStorage marker
`enki-model-downloaded-${spec.modelId}` used in useCircleChat.ts inside the
onProgress handler can go out of sync with actual Cache Storage (so downloads
may be misreported as "loading-cache"); update the logic so the marker is kept
in sync with cache operations: either clear that localStorage key whenever
`clearLocalModelCache` (or any cache-clearing function) runs, or replace the
marker check in `onProgress` with an actual existence check against the Cache
Storage (query the relevant cache for the model URL by `spec.modelId`) and call
`options.onModelStatus` based on the real cache result instead of the stale
localStorage flag.
- Around line 16-24: runLocalStream currently auto-resolves a local model spec
(via resolveLocalModelSpec) when options.spec is omitted, which allows callers
(e.g., sendMessage) to trigger a local model download without user consent;
change runLocalStream to require options.spec (do not call
resolveLocalModelSpec) and immediately throw/return an error if spec is
undefined, and update any callers to only call runLocalStream with an explicit,
consent-approved spec (the same guard should be applied to the other local-path
streaming logic in the 88-140 region), ensuring the local-download path cannot
run unless a spec was explicitly provided by the consent flow.

In `@hooks/useLocalModelConsent.ts`:
- Around line 13-21: Resolve the model first, then scope consent and pending
prompts to that modelId: call resolveLocalModelSpec() and extract a stable model
identifier/version, then replace hasLocalModelConsent() and
setLocalModelConsent() with per-model variants (e.g.,
hasLocalModelConsent(modelId), setLocalModelConsent(modelId, value)). Instead of
a single resolverRef.current, store pending resolvers per modelId (e.g.,
Map<modelId, resolver[]>) so concurrent requestConsent() calls for the same
model enqueue their promises and do not overwrite each other; if consent already
exists for that model return resolved immediately, otherwise setSpec(resolved),
setOpen(true) if not already open for that model, push the resolver into the
model's resolver list, and when the user confirms/denies resolve all queued
resolvers and persist the per-model consent via setLocalModelConsent(modelId,
confirmed).

In `@lib/local/consent.ts`:
- Around line 3-10: The consent functions should defensively handle missing or
blocked localStorage: update hasLocalModelConsent() to first guard for SSR
(typeof window === "undefined") and wrap localStorage.getItem(CONSENT_KEY) in a
try/catch returning false on error, and update setLocalModelConsent() to include
the same SSR guard (return early if window is undefined) and wrap
localStorage.setItem(CONSENT_KEY, "true") in a try/catch that safely
ignores/storage-errors; reference the existing functions hasLocalModelConsent,
setLocalModelConsent and the CONSENT_KEY constant when making these changes.

In `@lib/local/inference.worker.ts`:
- Around line 96-106: The pipeline() call that awaits model loading
(TextGenerationPipeline) must be made abort-aware: while awaiting
pipeline(modelId...) monitor the abort signal/message for this request (the same
requestId used in scope.postMessage) and if an abort is received before pipeline
resolves, immediately stop the worker (e.g., post an "aborted" message via
scope.postMessage with requestId and then call self.close()/terminate) so the
long-running model download doesn't continue; ensure the parent knows to
recreate the worker for future requests. In practice: start pipeline(...) and
concurrently listen for an abort (AbortSignal or scope.onmessage "abort" for
requestId), and if aborted before pipeline resolves, skip using the returned
TextGenerationPipeline and terminate this worker instance instead of waiting for
pipeline to finish.

In `@lib/local/localTransport.ts`:
- Around line 126-137: The promise handling worker communication currently only
listens for "message" and the abort signal, so if the worker errors or emits
"messageerror" the request never settles; add listeners for "error" and
"messageerror" on the Worker instance (the same object referenced as w) that
call the same cleanup logic as handleAbort and reject the pending stream/promise
with a clear Error (include requestId and spec.modelId in the message), and
ensure these error listeners are removed alongside handleMessage and the abort
listener when the request completes; reference the existing handleMessage and
handleAbort handlers, the Worker variable w, and the requestId/spec identifiers
when implementing the rejection and cleanup.

In `@next.config.js`:
- Around line 11-17: The Turbopack aliases in next.config.js are incorrect
because turbopack: {} plus webpack config.resolve.alias entries like sharp$ and
"onnxruntime-node$": false won't disable native modules in Turbopack and
resolveAlias doesn't accept false values; remove the empty turbopack: {} block
(or replace with a validated resolveAlias mapping) and delete the false aliases
from the webpack config.resolve.alias, and instead rely on
serverExternalPackages (if present) or add proper turbopack.resolveAlias
mappings that map the module names to safe stubs; update references in the file
to the turbopack, webpack, config.resolve.alias, sharp$, onnxruntime-node$,
resolveAlias, and serverExternalPackages symbols so the change is located and
validated.

---

Outside diff comments:
In `@components/modals/settings/index.tsx`:
- Around line 208-250: The form submit currently filters only models missing
keys via getConfigIssues, allowing local-auto to remain enabled when provider
keys exist; update onSubmit to compute the saved enabledModels using
getAvailableModels(data) (or equivalent availability function) so models
suppressed by getAvailableModels are removed before setConfig, and ensure
selectedModel is validated against that filtered list; reference onSubmit,
getConfigIssues, getAvailableModels, hasRequiredKeyForModel, MODEL_OPTIONS and
setConfig when making the change and keep the existing toast/handleClose
behavior.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 66e46d8e-3be8-4009-a2cb-a0c6176a4cd7

📥 Commits

Reviewing files that changed from the base of the PR and between b2dc69a and dac8be1.

⛔ Files ignored due to path filters (1)
  • bun.lock is excluded by !**/*.lock
📒 Files selected for processing (20)
  • app/api/chat/route.ts
  • components/Inputs/InputChat/index.tsx
  • components/chat/ChatArea/index.tsx
  • components/modals/local-model-consent/index.tsx
  • components/modals/settings/index.tsx
  • constants/models.ts
  • hooks/useCircleChat.ts
  • hooks/useLocalModelConsent.ts
  • lib/chat/config.test.ts
  • lib/chat/config.ts
  • lib/chat/prompt.ts
  • lib/langchain/chatService.ts
  • lib/local/capabilities.ts
  • lib/local/consent.ts
  • lib/local/inference.worker.ts
  • lib/local/localTransport.ts
  • next.config.js
  • package.json
  • store/index.ts
  • store/slices/configSlice.ts

Comment thread hooks/useCircleChat.ts Outdated
Comment thread hooks/useCircleChat.ts
Comment thread hooks/useLocalModelConsent.ts
Comment thread lib/local/consent.ts
Comment thread lib/local/inference.worker.ts
Comment thread lib/local/localTransport.ts
Comment thread next.config.js
@fcxmarquez fcxmarquez merged commit 968f660 into main Apr 20, 2026
4 checks passed
@fcxmarquez fcxmarquez deleted the franciscomarquezsolt/fcx-107-implement-tiered-ai-model-strategy-for-guest-users branch April 20, 2026 02:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant