Skip to content

Commit 7200889

Browse files
author
alpsla
committed
feat: Add custom DeepWiki Docker image with dynamic tokenizer configuration
- Create Dockerfile that patches hardcoded text-embedding-3-small in data_pipeline.py - Add build and deployment scripts for DigitalOcean container registry - Include GitHub Actions workflow for automated builds - Implement dynamic configuration reading from embedder.json - Add comprehensive README with deployment instructions This fixes the tokenizer/embedding model mismatch issue where DeepWiki was using text-embedding-3-small for tokenization but text-embedding-3-large for embeddings.
1 parent 2857007 commit 7200889

5 files changed

Lines changed: 291 additions & 0 deletions

File tree

Lines changed: 46 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,46 @@
1+
name: Build Custom DeepWiki Image
2+
3+
on:
4+
workflow_dispatch:
5+
push:
6+
paths:
7+
- 'kubernetes/deepwiki-custom/**'
8+
branches:
9+
- main
10+
- develop
11+
12+
jobs:
13+
build:
14+
runs-on: ubuntu-latest
15+
steps:
16+
- name: Checkout code
17+
uses: actions/checkout@v3
18+
19+
- name: Install doctl
20+
uses: digitalocean/action-doctl@v2
21+
with:
22+
token: ${{ secrets.DIGITALOCEAN_ACCESS_TOKEN }}
23+
24+
- name: Log in to DigitalOcean Container Registry
25+
run: doctl registry login --expiry-seconds 1200
26+
27+
- name: Build image
28+
run: |
29+
cd kubernetes/deepwiki-custom/
30+
docker build -t registry.digitalocean.com/codequal/deepwiki-custom:latest .
31+
docker tag registry.digitalocean.com/codequal/deepwiki-custom:latest \
32+
registry.digitalocean.com/codequal/deepwiki-custom:${{ github.sha }}
33+
34+
- name: Push image to DigitalOcean Container Registry
35+
run: |
36+
docker push registry.digitalocean.com/codequal/deepwiki-custom:latest
37+
docker push registry.digitalocean.com/codequal/deepwiki-custom:${{ github.sha }}
38+
39+
- name: Update deployment
40+
if: github.ref == 'refs/heads/main'
41+
run: |
42+
kubectl set image deployment/deepwiki \
43+
deepwiki=registry.digitalocean.com/codequal/deepwiki-custom:${{ github.sha }} \
44+
-n codequal-dev
45+
46+
kubectl rollout status deployment/deepwiki -n codequal-dev
Lines changed: 39 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,39 @@
1+
FROM ghcr.io/asyncfuncai/deepwiki-open:latest
2+
3+
# Create a patched version of data_pipeline.py that reads model from config
4+
RUN cat > /tmp/patch_data_pipeline.py << 'EOF'
5+
import os
6+
import re
7+
8+
# Read the original file
9+
with open('/app/api/data_pipeline.py', 'r') as f:
10+
content = f.read()
11+
12+
# Replace hardcoded model with dynamic config
13+
old_pattern = r'encoding = tiktoken\.encoding_for_model\("text-embedding-3-small"\)'
14+
new_code = '''from api.config import get_embedder_config
15+
embedder_config = get_embedder_config()
16+
model_name = embedder_config.get('model_kwargs', {}).get('model', 'text-embedding-3-small')
17+
encoding = tiktoken.encoding_for_model(model_name)'''
18+
19+
content = re.sub(old_pattern, new_code, content)
20+
21+
# Write the patched file
22+
with open('/app/api/data_pipeline.py', 'w') as f:
23+
f.write(content)
24+
25+
print("Patched data_pipeline.py to use dynamic model configuration")
26+
EOF
27+
28+
# Apply the patch
29+
RUN python /tmp/patch_data_pipeline.py && rm /tmp/patch_data_pipeline.py
30+
31+
# Verify the patch was applied
32+
RUN grep -A 3 "get_embedder_config" /app/api/data_pipeline.py || echo "Patch verification"
33+
34+
# Add a startup message to confirm custom image is being used
35+
RUN echo '#!/bin/sh\necho "[CUSTOM] DeepWiki with dynamic embedding configuration started"\nexec "$@"' > /docker-entrypoint-custom.sh && \
36+
chmod +x /docker-entrypoint-custom.sh
37+
38+
ENTRYPOINT ["/docker-entrypoint-custom.sh"]
39+
CMD ["sh", "-c", "cd /app && python api/main.py & npm start"]
Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,10 @@
1+
FROM ghcr.io/asyncfuncai/deepwiki-open:latest
2+
3+
# Simple patch: just replace the hardcoded model name
4+
RUN sed -i 's/text-embedding-3-small/text-embedding-3-large/g' /app/api/data_pipeline.py
5+
6+
# Verify the change
7+
RUN grep "text-embedding-3-large" /app/api/data_pipeline.py
8+
9+
# Add a marker to identify this is our custom image
10+
RUN echo "Custom DeepWiki with text-embedding-3-large" > /app/CUSTOM_BUILD
Lines changed: 84 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,84 @@
1+
# Custom DeepWiki Docker Image
2+
3+
This directory contains the files needed to build a custom DeepWiki image that fixes the hardcoded embedding model configuration.
4+
5+
## What This Fixes
6+
7+
1. **Tokenizer Mismatch**: The original DeepWiki has `text-embedding-3-small` hardcoded in `data_pipeline.py`
8+
2. **Configuration Consistency**: Our patch makes the tokenizer read from the same config as embeddings
9+
10+
## Prerequisites
11+
12+
1. Docker Desktop installed and running
13+
2. Access to your container registry (DigitalOcean)
14+
3. `doctl` CLI configured (for DigitalOcean registry)
15+
16+
## Quick Build & Deploy
17+
18+
```bash
19+
# 1. Start Docker Desktop
20+
open -a Docker # On macOS
21+
22+
# 2. Login to DigitalOcean registry
23+
doctl registry login
24+
25+
# 3. Build and deploy
26+
./build-and-deploy.sh
27+
```
28+
29+
## Manual Steps
30+
31+
If the script doesn't work:
32+
33+
### 1. Build the Image
34+
```bash
35+
cd kubernetes/deepwiki-custom/
36+
docker build -t registry.digitalocean.com/codequal/deepwiki-custom:latest .
37+
```
38+
39+
### 2. Push to Registry
40+
```bash
41+
doctl registry login
42+
docker push registry.digitalocean.com/codequal/deepwiki-custom:latest
43+
```
44+
45+
### 3. Update Deployment
46+
```bash
47+
# Edit the deployment to use custom image
48+
kubectl set image deployment/deepwiki deepwiki=registry.digitalocean.com/codequal/deepwiki-custom:latest -n codequal-dev
49+
50+
# Or apply the full deployment
51+
kubectl apply -f ../deepwiki-deployment-custom.yaml
52+
```
53+
54+
## Verification
55+
56+
1. **Check Pod Status**
57+
```bash
58+
kubectl get pods -n codequal-dev -l app=deepwiki
59+
```
60+
61+
2. **Verify Custom Image**
62+
```bash
63+
kubectl logs -n codequal-dev -l app=deepwiki | grep "CUSTOM"
64+
```
65+
66+
3. **Test Functionality**
67+
```bash
68+
cd ../../packages/agents
69+
npx ts-node test-deepwiki-simple.ts
70+
```
71+
72+
## What the Dockerfile Does
73+
74+
1. **Base Image**: Uses the official DeepWiki image
75+
2. **Patches data_pipeline.py**: Replaces hardcoded model with dynamic config
76+
3. **Adds Startup Message**: Confirms custom image is running
77+
4. **Maintains Compatibility**: No breaking changes
78+
79+
## Rollback
80+
81+
If needed, rollback to original:
82+
```bash
83+
kubectl set image deployment/deepwiki deepwiki=ghcr.io/asyncfuncai/deepwiki-open:latest -n codequal-dev
84+
```
Lines changed: 112 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,112 @@
1+
#!/bin/bash
2+
# Build and Deploy Custom DeepWiki Image
3+
4+
set -e
5+
6+
# Configuration
7+
REGISTRY="registry.digitalocean.com/codequal"
8+
IMAGE_NAME="deepwiki-custom"
9+
TAG="latest"
10+
FULL_IMAGE="${REGISTRY}/${IMAGE_NAME}:${TAG}"
11+
12+
# Get the directory of this script
13+
SCRIPT_DIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" && pwd )"
14+
15+
echo "🔨 Building custom DeepWiki image..."
16+
cd ${SCRIPT_DIR}
17+
docker build -t ${FULL_IMAGE} .
18+
19+
echo "📤 Pushing to registry..."
20+
docker push ${FULL_IMAGE}
21+
22+
echo "📦 Creating updated deployment..."
23+
cat > ../deepwiki-deployment-custom.yaml << EOF
24+
apiVersion: apps/v1
25+
kind: Deployment
26+
metadata:
27+
name: deepwiki
28+
namespace: codequal-dev
29+
labels:
30+
app: deepwiki
31+
spec:
32+
replicas: 1
33+
selector:
34+
matchLabels:
35+
app: deepwiki
36+
template:
37+
metadata:
38+
labels:
39+
app: deepwiki
40+
spec:
41+
containers:
42+
- name: deepwiki
43+
image: ${FULL_IMAGE}
44+
imagePullPolicy: Always
45+
ports:
46+
- containerPort: 3000
47+
name: frontend
48+
- containerPort: 8001
49+
name: api
50+
env:
51+
- name: SERVER_BASE_URL
52+
value: http://deepwiki-api:8001
53+
- name: NEXT_PUBLIC_SERVER_BASE_URL
54+
value: http://deepwiki-api:8001
55+
56+
# API Keys
57+
- name: OPENROUTER_API_KEY
58+
valueFrom:
59+
secretKeyRef:
60+
name: deepwiki-api-keys
61+
key: OPENROUTER_API_KEY
62+
- name: OPENAI_API_KEY
63+
valueFrom:
64+
secretKeyRef:
65+
name: deepwiki-api-keys
66+
key: OPENAI_API_KEY
67+
- name: GITHUB_TOKEN
68+
valueFrom:
69+
secretKeyRef:
70+
name: deepwiki-api-keys
71+
key: GITHUB_TOKEN
72+
optional: true
73+
74+
# Model configuration via environment
75+
- name: OPENAI_BASE_URL
76+
value: https://api.openai.com/v1
77+
- name: LLM_MODEL
78+
value: openai/gpt-4-turbo-preview
79+
- name: EMBEDDING_MODEL
80+
value: text-embedding-3-large
81+
- name: EMBEDDING_API_BASE
82+
value: https://api.openai.com/v1
83+
- name: EMBEDDING_DIMENSIONS
84+
value: "3072"
85+
86+
resources:
87+
requests:
88+
memory: "1Gi"
89+
cpu: "250m"
90+
limits:
91+
memory: "2Gi"
92+
cpu: "1"
93+
volumeMounts:
94+
- mountPath: /root/.adalflow
95+
name: deepwiki-data
96+
volumes:
97+
- name: deepwiki-data
98+
persistentVolumeClaim:
99+
claimName: deepwiki-data
100+
imagePullSecrets:
101+
- name: registry-codequal
102+
EOF
103+
104+
echo "🚀 Deploying custom DeepWiki..."
105+
kubectl apply -f ../deepwiki-deployment-custom.yaml
106+
107+
echo "✅ Done! Custom DeepWiki image deployed."
108+
echo ""
109+
echo "Next steps:"
110+
echo "1. Wait for pod to be ready: kubectl wait --for=condition=Ready pod -l app=deepwiki -n codequal-dev"
111+
echo "2. Check logs: kubectl logs -n codequal-dev -l app=deepwiki"
112+
echo "3. Test with: cd ../../packages/agents && npx ts-node test-deepwiki-simple.ts"

0 commit comments

Comments
 (0)