AI-powered coding assistant integrated with enterprise GenAI Gateway. Provides code completion, chat, and code editing capabilities using locally deployed language models.
- Project Overview
- Features
- Architecture
- Prerequisites
- Quick Start
- Configuration
- Usage
- Advanced Features
- Troubleshooting
Continue VS Code Extension enables developers to leverage enterprise-deployed Llama 3.2 3B model for code assistance through GenAI Gateway. Provides autocomplete, chat, and editing capabilities with Keycloak authentication.
Autocomplete
- Real-time code completion
- Multiline code generation
- Context-aware suggestions
- Configurable debounce and timeout
Chat Mode
- Interactive Q&A about code
- Code explanations
- Problem-solving assistance
- Keyboard shortcut:
Ctrl+L
Edit Mode
- Targeted code modifications
- Inline transformations
- Context-aware refactoring
- Keyboard shortcut:
Ctrl+I
graph TB
subgraph "Developer Workstation"
A[VS Code IDE]
B[Continue Extension]
end
subgraph "Enterprise Infrastructure"
C[GenAI Gateway<br/>LiteLLM]
D[Keycloak<br/>OAuth2 Auth]
end
subgraph "Kubernetes Cluster"
E[AI Model Pods<br/>Llama 3.2 3B]
end
A -->|Code Context| B
B -->|HTTPS Request<br/>API Key| C
C -->|Validate Token| D
D -->|Auth Success| C
C -->|Inference Request| E
E -->|Model Response| C
C -->|Response| B
B -->|Display Results| A
style A fill:#e1f5ff
style B fill:#fff4e1
style C fill:#f0e1ff
style D fill:#ffe1e1
style E fill:#e1ffe1
Developer types code in VS Code. Continue extension sends authenticated request to GenAI Gateway. Gateway validates credentials with Keycloak and routes to model. Model generates response. Result displayed in VS Code.
- VS Code (latest stable version)
- GenAI Gateway access with Keycloak authentication
- API key from Gateway administrator
code --version- Open VS Code
- Press
Ctrl+Shift+Xto open Extensions - Search for "Continue"
- Install: Continue - open-source AI code agent
- Publisher: Continue
Command line installation:
code --install-extension Continue.continue- Press
Ctrl+Shift+P - Type "Continue: Open config.yaml"
- Replace contents with configuration below
- Update
apiBaseandapiKeywith your credentials - Reload VS Code:
Ctrl+Shift+P→ "Developer: Reload Window"
Configuration file location:
Windows:
C:\Users\<username>\.continue\config.yaml
macOS/Linux:
~/.continue/config.yaml
name: GenAI Gateway Config
version: 1.0.0
schema: v1
tabAutocompleteOptions:
multilineCompletions: "always"
debounceDelay: 2500
maxPromptTokens: 100
prefixPercentage: 1.0
suffixPercentage: 0.0
maxSuffixPercentage: 0.0
modelTimeout: 5000
showWhateverWeHaveAtXMs: 2000
useCache: true
onlyMyCode: true
useRecentlyEdited: true
useRecentlyOpened: true
useImports: true
transform: true
experimental_includeClipboard: false
experimental_includeRecentlyVisitedRanges: true
experimental_includeRecentlyEditedRanges: true
experimental_includeDiff: true
disableInFiles:
- "*.md"
models:
- name: "Llama 3.2 3B"
provider: openai
model: "meta-llama/Llama-3.2-3B-Instruct"
apiBase: "https://api.example.com/v1"
apiKey: "your-api-key-here"
ignoreSSL: true
contextLength: 8192
completionOptions:
maxTokens: 1024
temperature: 0.3
stop:
- "\n\n"
- "def "
- "class "
requestOptions:
maxTokens: 1024
temperature: 0.3
autocompleteOptions:
maxTokens: 256
temperature: 0.2
stop:
- "\n\n\n"
- "# "
roles:
- chat
- edit
- apply
- autocomplete
promptTemplates:
autocomplete: "{{{prefix}}}"
useLegacyCompletionsEndpoint: true
experimental:
inlineEditing: true
allowAnonymousTelemetry: false- apiBase: Your GenAI Gateway URL with
/v1suffix - apiKey: API key from Gateway administrator
- model: Exact model name
meta-llama/Llama-3.2-3B-Instruct
For detailed configuration options and advanced setup, refer to SETUP_GUIDE.md.
export API_KEY="your-api-key-here"
export API_BASE="https://api.example.com/v1"curl -k $API_BASE/models \
-H "Authorization: Bearer $API_KEY"curl -k $API_BASE/chat/completions \
-H "Authorization: Bearer $API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "meta-llama/Llama-3.2-3B-Instruct",
"messages": [{"role": "user", "content": "What is Python?"}],
"max_tokens": 50
}'How to Use:
- Open Continue sidebar
- Switch to Agent mode
- Give task instruction
- Review and approve file operations
- Verify results
Preview: Requested "Create a FastAPI application with two routes". The model generated the code and created a new file with the complete implementation including imports, app initialization, and route definitions.
How to Use:
- Start typing code
- Pause 3 seconds
- Accept with
Tabor reject by continuing to type
Enable/disable via "Continue" button in status bar.
Preview: Started typing to create an endpoint for the sample FastAPI application and paused. The model generated the code for the endpoint and provided a prompt to accept or reject the code.
How to Use:
- Press
Ctrl+L - Type question
- Press Enter
- Review response
Context providers:
- Highlight code for automatic inclusion
@Files- Reference specific files@Terminal- Include terminal output
Preview: Asked how does FastAPI handles request validation in the current file and received the response with a suggestion, which can be viewed in the screenshots below
How to Use:
- Highlight code
- Press
Ctrl+I - Type instruction
- Review diff
- Accept or reject
Preview: Highlighted the code file and provided a prompt to "convert every endpoint to async". The model generated a difference showing the original code and proposed changes and provided a prompt to accept or reject the code.
Custom Rules
- Define custom system prompts and context for specific project needs
- Control AI behavior with project-specific guidelines
MCP Servers
- Extend functionality with Model Context Protocol servers
- Add custom tools and external integrations
For detailed setup instructions on creating custom rules and MCP servers, refer to SETUP_GUIDE.md.
For comprehensive troubleshooting guidance, common issues, and solutions, refer to:








