Skip to content

Commit 7497e58

Browse files
committed
test
1 parent 4af87c7 commit 7497e58

8 files changed

Lines changed: 311 additions & 138 deletions

File tree

package-lock.json

Lines changed: 11 additions & 1 deletion
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

package.json

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -64,7 +64,8 @@
6464
"rimraf": "^5.0.1",
6565
"rxjs": "^7.8.2",
6666
"swagger-ui-express": "^4.6.3",
67-
"uuid-apikey": "^1.5.3"
67+
"uuid-apikey": "^1.5.3",
68+
"zod": "^4.2.1"
6869
},
6970
"devDependencies": {
7071
"@darraghor/eslint-plugin-nestjs-typed": "^6.9.3",

src/compare/libs/vlm/README.md

Lines changed: 37 additions & 30 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,30 @@
11
# VLM (Vision Language Model) Image Comparison
22

3-
AI-powered semantic image comparison using Vision Language Models via Ollama.
3+
Hybrid image comparison combining pixelmatch for objective difference detection and Vision Language Models (via Ollama) for human-noticeability analysis.
4+
5+
## Architecture Flow
6+
7+
```text
8+
VLM Comparison Request
9+
10+
11+
Run Pixelmatch Comparison
12+
13+
├─→ No Differences Found → Return OK Status
14+
15+
└─→ Differences Found
16+
17+
18+
Save Diff Image
19+
20+
21+
Run VLM with 3 Images:
22+
(Baseline, Comparison, Diff)
23+
24+
├─→ Not Noticeable → Override: Return OK Status
25+
26+
└─→ Noticeable → Return Unresolved with VLM Description
27+
```
428

529
## Quick Start
630

@@ -18,10 +42,9 @@ ollama serve
1842

1943
```bash
2044
# Recommended for accuracy
21-
ollama pull llava:7b
45+
ollama pull gemma3:12b
2246

23-
# Or for speed (smaller, less accurate)
24-
ollama pull moondream
47+
# Note: Smaller models do not show proper results - use gemma3:12b only
2548
```
2649

2750
### 3. Configure Backend
@@ -36,60 +59,44 @@ OLLAMA_BASE_URL=http://localhost:11434
3659
Set project's image comparison to `vlm` with config:
3760
```json
3861
{
39-
"model": "llava:7b",
62+
"model": "gemma3:12b",
4063
"temperature": 0.1
4164
}
4265
```
4366

4467
Optional custom prompt (replaces default system prompt):
4568
```json
4669
{
47-
"model": "llava:7b",
70+
"model": "gemma3:12b",
4871
"prompt": "Focus on button colors and text changes",
4972
"temperature": 0.1
5073
}
5174
```
5275

53-
**Note:** The `prompt` field replaces the entire system prompt. If omitted, a default system prompt is used that focuses on semantic differences while ignoring rendering artifacts.
76+
**Note:** The `prompt` field replaces the entire system prompt. If omitted, a default system prompt is used that analyzes the diff image to determine if highlighted differences are noticeable to humans.
5477

5578
## Recommended Models
5679

57-
| Model | Size | Speed | Accuracy | Best For |
58-
|-------|------|-------|----------|----------|
59-
| `llava:7b` | 4.7GB | ⚡⚡ | ⭐⭐⭐ | **Recommended** - best balance (minimal) |
60-
| `gemma3:latest` | ~ | ⚡⚡ | ⭐⭐⭐ | Minimal model option |
61-
| `llava:13b` | 8GB || ⭐⭐⭐⭐ | Best accuracy |
62-
| `moondream` | 1.7GB | ⚡⚡⚡ | ⭐⭐ | Fast, may hallucinate |
63-
| `minicpm-v` | 5.5GB | ⚡⚡ | ⭐⭐⭐ | Good alternative |
80+
| Model | Size |
81+
|-------|------|
82+
| `gemma3:12b` | ~12GB - **Recommended** |
83+
84+
**Note:** Models smaller than the default (`gemma3:12b`) have been tested and do not show proper results. They fail to follow structured output formats reliably and may produce incorrect or inconsistent responses. For production use, only use `gemma3:12b` or `llava:13b`.
6485

6586
## Configuration
6687

6788
| Option | Type | Default | Description |
6889
|--------|------|---------|-------------|
69-
| `model` | string | `llava:7b` | Ollama vision model name |
90+
| `model` | string | `gemma3:12b` | Ollama vision model name |
7091
| `prompt` | string | System prompt (see below) | Custom prompt for image comparison |
7192
| `temperature` | number | `0.1` | Lower = more consistent results (0.0-1.0) |
7293

73-
## How It Works
74-
75-
1. VLM analyzes both images semantically
76-
2. Returns JSON with `{"identical": true/false, "description": "..."}`
77-
3. `identical: true` = images match (pass), `identical: false` = differences found (fail)
78-
4. Ignores technical differences (anti-aliasing, shadows, 1-2px shifts)
79-
5. Provides description of differences found
80-
81-
### Default System Prompt
82-
83-
The default prompt instructs the model to:
84-
- **CHECK** for: data changes, missing/added elements, state changes, structural differences
85-
- **IGNORE**: rendering artifacts, anti-aliasing, shadows, minor pixel shifts
86-
8794
## API Endpoints
8895

8996
```bash
9097
# List available models
9198
GET /ollama/models
9299

93100
# Compare two images (for testing)
94-
POST /ollama/compare?model=llava:7b&prompt=<prompt>&temperature=0.1
101+
POST /ollama/compare?model=gemma3:12b&prompt=<prompt>&temperature=0.1
95102
```

src/compare/libs/vlm/ollama.service.spec.ts

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -16,7 +16,6 @@ jest.mock('ollama', () => {
1616
};
1717
});
1818

19-
2019
describe('OllamaService', () => {
2120
let service: OllamaService;
2221

@@ -102,7 +101,8 @@ describe('OllamaService', () => {
102101
mockChat.mockResolvedValue(mockResponse);
103102

104103
// Use a longer base64 string
105-
const longBase64 = 'iVBORw0KGgoAAAANSUhEUgAAAAEAAAABCAYAAAAfFcSJAAAADUlEQVR42mNk+M9QDwADhgGAWjR9awAAAABJRU5ErkJggg==';
104+
const longBase64 =
105+
'iVBORw0KGgoAAAANSUhEUgAAAAEAAAABCAYAAAAfFcSJAAAADUlEQVR42mNk+M9QDwADhgGAWjR9awAAAABJRU5ErkJggg==';
106106
const result = await service.generate({
107107
model: 'llava',
108108
messages: [

0 commit comments

Comments
 (0)