Skip to content

Commit 594c26c

Browse files
committed
Wire CI for end-to-end multimodal regression (closes #103 / #34)
Adds vision-capable model + matching mmproj + a CC0/PD test image to all four Java test jobs (Linux x86_64, macOS arm64 with/without Metal, Windows x86_64) and a model-gated MultimodalIntegrationTest that proves the typed ChatMessage(role, List<ContentPart>) surface from PR #189 round-trips through the upstream mtmd pipeline end-to-end. CI changes (.github/workflows/publish.yml) - New env vars: VISION_MODEL_URL / VISION_MODEL_NAME pointing at ggml-org/SmolVLM-500M-Instruct-Q8_0.gguf (smallest reliable vision GGUF on community ggml-org), VISION_MMPROJ_URL / _NAME for the matching mmproj, VISION_IMAGE_URL / _NAME for a small PD red-apple image from Wikimedia Commons. - Each of the four Java test jobs gains three download steps and three -D system properties on the mvn test invocation: -Dnet.ladenthin.llama.vision.model / .mmproj / .image. Validation scripts - validate-models.sh refactored into validate_gguf() + validate_image() helpers with a 'required' vs 'optional' mode. Required models still fail-fast; the new vision GGUFs and PD image are validated only when present so jobs that skip them keep passing. - validate-models.bat extended with a parallel OPTIONAL_MODELS loop. Test (src/test/java/.../MultimodalIntegrationTest.java) - Self-skips via Assume when any of the three -D paths is unset or its file is missing, so local mvn test stays green without the artifacts. - multimodalRequestProducesNonEmptyReply: builds a ChatMessage.userMultimodal with ContentPart.text(...) + ContentPart.imageFile(Paths.get(image)), calls chatCompleteText, asserts non-empty reply. Does NOT assert reply semantics &#x2014; a 500M model can caption inaccurately and CI must not flap on model quality. - multimodalThenTextOnSameModel: sanity check that a multimodal call followed by a text-only call on the same model both succeed (catches any parts/legacy split poisoning the inference context). TestConstants gains PROP_VISION_MODEL_PATH / PROP_VISION_MMPROJ_PATH / PROP_VISION_IMAGE_PATH so the test reads the system properties via the same naming pattern as PROP_NOMIC_MODEL_PATH. Docs - docs/history/49be664_open_issues.md: #103 and #34 PARTIALLY FIXED -> FIXED in the per-issue blocks, the verdict guide, the status overview table, the deep-dive table, the cannot-be-closed-by-unit-tests-alone table, and the recommended-sequencing list. Bottom-line summary updated to reflect that 0 of the original LIKELY/PARTIALLY FIXED items remain partially fixed. - (docs/feature-investigation-llama-stack-client-kotlin.md §2.1 was already updated in the PR-189 typed-multimodal-surface commit.) Verified locally - mvn test-compile: clean. - mvn test -Dtest=MultimodalIntegrationTest: SKIPPED (no -D properties set; expected self-skip path). - mvn javadoc:jar: BUILD SUCCESS.
1 parent 871a700 commit 594c26c

6 files changed

Lines changed: 333 additions & 30 deletions

File tree

.github/validate-models.bat

Lines changed: 25 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,12 @@ setlocal enabledelayedexpansion
1111

1212
set "MODELS=models\codellama-7b.Q2_K.gguf" "models\jina-reranker-v1-tiny-en-Q4_0.gguf" "models\AMD-Llama-135m-code.Q2_K.gguf" "models\Qwen3-0.6B-Q4_K_M.gguf"
1313

14-
echo Validating model files...
14+
REM Vision GGUFs are validated only when present (the Windows job downloads
15+
REM them too, but the validation step must not fail when a future job opts
16+
REM out of the vision matrix).
17+
set "OPTIONAL_MODELS=models\SmolVLM-500M-Instruct-Q8_0.gguf" "models\mmproj-SmolVLM-500M-Instruct-Q8_0.gguf"
18+
19+
echo Validating required model files...
1520
for %%M in (%MODELS%) do (
1621
if not exist "%%M" (
1722
echo ERROR: Model not found: %%M
@@ -37,5 +42,24 @@ for %%M in (%MODELS%) do (
3742
echo OK: %%M ^(!size! bytes^)
3843
)
3944

45+
echo Validating optional vision model files...
46+
for %%M in (%OPTIONAL_MODELS%) do (
47+
if not exist "%%M" (
48+
echo SKIP: %%M not present
49+
) else (
50+
for /f %%S in ('powershell -Command "(Get-Item '%%M').Length"') do set "size=%%S"
51+
if !size! lss 4 (
52+
echo ERROR: Model file too small (likely corrupted^): %%M (size: !size! bytes^)
53+
exit /b 1
54+
)
55+
for /f %%H in ('powershell -Command "[System.BitConverter]::ToString((Get-Content '%%M' -Encoding Byte -ReadCount 4)[0]) -replace '-',''"') do set "magic=%%H"
56+
if not "!magic!"=="47475546" (
57+
echo ERROR: Invalid GGUF magic bytes in %%M (got: !magic!, expected: 47475546^)
58+
exit /b 1
59+
)
60+
echo OK: %%M ^(!size! bytes^)
61+
)
62+
)
63+
4064
echo All models validated successfully!
4165
exit /b 0

.github/validate-models.sh

Lines changed: 61 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -17,28 +17,80 @@ MODELS=(
1717
"models/Qwen3-0.6B-Q4_K_M.gguf"
1818
)
1919

20-
echo "Validating model files..."
21-
for model in "${MODELS[@]}"; do
20+
# Optional GGUFs and image, validated only when present so jobs that do not
21+
# download them (e.g. cross-compile smoke runs) still pass.
22+
OPTIONAL_MODELS=(
23+
"models/nomic-embed-text-v1.5.f16.gguf"
24+
"models/SmolVLM-500M-Instruct-Q8_0.gguf"
25+
"models/mmproj-SmolVLM-500M-Instruct-Q8_0.gguf"
26+
)
27+
28+
OPTIONAL_IMAGES=(
29+
"models/Red_Apple.jpg"
30+
)
31+
32+
validate_gguf() {
33+
local model="$1"
34+
local required="$2"
2235
if [[ ! -f "$model" ]]; then
23-
echo "ERROR: Model not found: $model"
24-
exit 1
36+
if [[ "$required" == "required" ]]; then
37+
echo "ERROR: Model not found: $model"
38+
exit 1
39+
else
40+
echo "- $model (optional, skipped: not present)"
41+
return
42+
fi
2543
fi
26-
27-
# Check file size (must be > 4 bytes for magic header)
44+
local size
2845
size=$(stat -f%z "$model" 2>/dev/null || stat -c%s "$model" 2>/dev/null)
2946
if [[ $size -lt 4 ]]; then
3047
echo "ERROR: Model file too small (likely corrupted): $model (size: $size bytes)"
3148
exit 1
3249
fi
33-
34-
# Check GGUF magic bytes: 0x47 0x47 0x55 0x46
50+
local magic
3551
magic=$(xxd -p -l 4 "$model")
3652
if [[ "$magic" != "47475546" ]]; then
3753
echo "ERROR: Invalid GGUF magic bytes in $model (got: $magic, expected: 47475546)"
3854
exit 1
3955
fi
40-
4156
echo "$model ($(numfmt --to=iec-i --suffix=B $size 2>/dev/null || echo $size bytes))"
57+
}
58+
59+
validate_image() {
60+
local img="$1"
61+
if [[ ! -f "$img" ]]; then
62+
echo "- $img (optional, skipped: not present)"
63+
return
64+
fi
65+
local size
66+
size=$(stat -f%z "$img" 2>/dev/null || stat -c%s "$img" 2>/dev/null)
67+
if [[ $size -lt 100 ]]; then
68+
echo "ERROR: Image file too small (likely an HTML error page): $img (size: $size bytes)"
69+
exit 1
70+
fi
71+
# Accept JPEG (FF D8 FF), PNG (89 50 4E 47), WebP RIFF (52 49 46 46), GIF (47 49 46 38)
72+
local magic
73+
magic=$(xxd -p -l 4 "$img")
74+
case "$magic" in
75+
ffd8ff*|89504e47|52494646|47494638)
76+
echo "$img ($(numfmt --to=iec-i --suffix=B $size 2>/dev/null || echo $size bytes))"
77+
;;
78+
*)
79+
echo "ERROR: Unrecognised image magic in $img (got: $magic)"
80+
exit 1
81+
;;
82+
esac
83+
}
84+
85+
echo "Validating model files..."
86+
for model in "${MODELS[@]}"; do
87+
validate_gguf "$model" required
88+
done
89+
for model in "${OPTIONAL_MODELS[@]}"; do
90+
validate_gguf "$model" optional
91+
done
92+
for img in "${OPTIONAL_IMAGES[@]}"; do
93+
validate_image "$img"
4294
done
4395

4496
echo "All models validated successfully!"

.github/workflows/publish.yml

Lines changed: 59 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -22,6 +22,20 @@ env:
2222
REASONING_MODEL_NAME: "Qwen3-0.6B-Q4_K_M.gguf"
2323
NOMIC_EMBED_MODEL_URL: "https://huggingface.co/nomic-ai/nomic-embed-text-v1.5-GGUF/resolve/main/nomic-embed-text-v1.5.f16.gguf"
2424
NOMIC_EMBED_MODEL_NAME: "nomic-embed-text-v1.5.f16.gguf"
25+
# Vision model + mmproj for MultimodalIntegrationTest (issues #103 / #34).
26+
# SmolVLM-500M is the smallest community vision GGUF that loads reliably
27+
# under the upstream mtmd pipeline. Total download ~600 MB across model
28+
# plus mmproj; matches the existing per-test-job download budget.
29+
VISION_MODEL_URL: "https://huggingface.co/ggml-org/SmolVLM-500M-Instruct-GGUF/resolve/main/SmolVLM-500M-Instruct-Q8_0.gguf"
30+
VISION_MODEL_NAME: "SmolVLM-500M-Instruct-Q8_0.gguf"
31+
VISION_MMPROJ_URL: "https://huggingface.co/ggml-org/SmolVLM-500M-Instruct-GGUF/resolve/main/mmproj-SmolVLM-500M-Instruct-Q8_0.gguf"
32+
VISION_MMPROJ_NAME: "mmproj-SmolVLM-500M-Instruct-Q8_0.gguf"
33+
# Small CC0 / public-domain test image from Wikimedia Commons. A simple
34+
# subject (red apple, ~12 KB) so a 500M vision model has a fair chance of
35+
# producing recognisable output; the test only asserts a non-empty reply
36+
# so model accuracy is not the gating signal.
37+
VISION_IMAGE_URL: "https://upload.wikimedia.org/wikipedia/commons/1/15/Red_Apple.jpg"
38+
VISION_IMAGE_NAME: "Red_Apple.jpg"
2539
permissions:
2640
contents: read
2741
jobs:
@@ -393,6 +407,12 @@ jobs:
393407
run: curl -L --fail ${REASONING_MODEL_URL} --create-dirs -o models/${REASONING_MODEL_NAME}
394408
- name: Download nomic embedding model (issue #98 regression)
395409
run: curl -L --fail ${NOMIC_EMBED_MODEL_URL} --create-dirs -o models/${NOMIC_EMBED_MODEL_NAME}
410+
- name: Download vision model (issues #103 / #34)
411+
run: curl -L --fail ${VISION_MODEL_URL} --create-dirs -o models/${VISION_MODEL_NAME}
412+
- name: Download vision mmproj
413+
run: curl -L --fail ${VISION_MMPROJ_URL} --create-dirs -o models/${VISION_MMPROJ_NAME}
414+
- name: Download CC0 / public-domain test image
415+
run: curl -L --fail -A "java-llama.cpp-ci/1.0" "${VISION_IMAGE_URL}" --create-dirs -o models/${VISION_IMAGE_NAME}
396416
- name: List files in models directory
397417
run: ls -l models/
398418
- name: Validate model files
@@ -408,7 +428,12 @@ jobs:
408428
ulimit -c unlimited
409429
echo "${{ github.workspace }}/core.%e.%p" | sudo tee /proc/sys/kernel/core_pattern
410430
- name: Run tests
411-
run: mvn --no-transfer-progress test -Dnet.ladenthin.llama.nomic.path=models/${NOMIC_EMBED_MODEL_NAME}
431+
run: |
432+
mvn --no-transfer-progress test \
433+
-Dnet.ladenthin.llama.nomic.path=models/${NOMIC_EMBED_MODEL_NAME} \
434+
-Dnet.ladenthin.llama.vision.model=models/${VISION_MODEL_NAME} \
435+
-Dnet.ladenthin.llama.vision.mmproj=models/${VISION_MMPROJ_NAME} \
436+
-Dnet.ladenthin.llama.vision.image=models/${VISION_IMAGE_NAME}
412437
- uses: actions/upload-artifact@v7
413438
if: success()
414439
with:
@@ -455,6 +480,12 @@ jobs:
455480
run: curl -L --fail ${DRAFT_MODEL_URL} --create-dirs -o models/${DRAFT_MODEL_NAME}
456481
- name: Download reasoning model
457482
run: curl -L --fail ${REASONING_MODEL_URL} --create-dirs -o models/${REASONING_MODEL_NAME}
483+
- name: Download vision model (issues #103 / #34)
484+
run: curl -L --fail ${VISION_MODEL_URL} --create-dirs -o models/${VISION_MODEL_NAME}
485+
- name: Download vision mmproj
486+
run: curl -L --fail ${VISION_MMPROJ_URL} --create-dirs -o models/${VISION_MMPROJ_NAME}
487+
- name: Download CC0 / public-domain test image
488+
run: curl -L --fail -A "java-llama.cpp-ci/1.0" "${VISION_IMAGE_URL}" --create-dirs -o models/${VISION_IMAGE_NAME}
458489
- name: List files in models directory
459490
run: ls -l models/
460491
- name: Validate model files
@@ -468,7 +499,11 @@ jobs:
468499
- name: Enable core dumps
469500
run: ulimit -c unlimited
470501
- name: Run tests
471-
run: mvn --no-transfer-progress -Dnet.ladenthin.llama.test.ngl=0 test
502+
run: |
503+
mvn --no-transfer-progress -Dnet.ladenthin.llama.test.ngl=0 test \
504+
-Dnet.ladenthin.llama.vision.model=models/${VISION_MODEL_NAME} \
505+
-Dnet.ladenthin.llama.vision.mmproj=models/${VISION_MMPROJ_NAME} \
506+
-Dnet.ladenthin.llama.vision.image=models/${VISION_IMAGE_NAME}
472507
- name: Memory after tests
473508
if: always()
474509
run: vm_stat && sysctl hw.memsize hw.physmem
@@ -508,6 +543,12 @@ jobs:
508543
run: curl -L --fail ${DRAFT_MODEL_URL} --create-dirs -o models/${DRAFT_MODEL_NAME}
509544
- name: Download reasoning model
510545
run: curl -L --fail ${REASONING_MODEL_URL} --create-dirs -o models/${REASONING_MODEL_NAME}
546+
- name: Download vision model (issues #103 / #34)
547+
run: curl -L --fail ${VISION_MODEL_URL} --create-dirs -o models/${VISION_MODEL_NAME}
548+
- name: Download vision mmproj
549+
run: curl -L --fail ${VISION_MMPROJ_URL} --create-dirs -o models/${VISION_MMPROJ_NAME}
550+
- name: Download CC0 / public-domain test image
551+
run: curl -L --fail -A "java-llama.cpp-ci/1.0" "${VISION_IMAGE_URL}" --create-dirs -o models/${VISION_IMAGE_NAME}
511552
- name: List files in models directory
512553
run: ls -l models/
513554
- name: Validate model files
@@ -521,7 +562,11 @@ jobs:
521562
- name: Enable core dumps
522563
run: ulimit -c unlimited
523564
- name: Run tests
524-
run: mvn --no-transfer-progress test
565+
run: |
566+
mvn --no-transfer-progress test \
567+
-Dnet.ladenthin.llama.vision.model=models/${VISION_MODEL_NAME} \
568+
-Dnet.ladenthin.llama.vision.mmproj=models/${VISION_MMPROJ_NAME} \
569+
-Dnet.ladenthin.llama.vision.image=models/${VISION_IMAGE_NAME}
525570
- name: Memory after tests
526571
if: always()
527572
run: vm_stat && sysctl hw.memsize hw.physmem
@@ -564,6 +609,12 @@ jobs:
564609
run: curl -L --fail $env:DRAFT_MODEL_URL --create-dirs -o models/$env:DRAFT_MODEL_NAME
565610
- name: Download reasoning model
566611
run: curl -L --fail $env:REASONING_MODEL_URL --create-dirs -o models/$env:REASONING_MODEL_NAME
612+
- name: Download vision model (issues #103 / #34)
613+
run: curl -L --fail $env:VISION_MODEL_URL --create-dirs -o models/$env:VISION_MODEL_NAME
614+
- name: Download vision mmproj
615+
run: curl -L --fail $env:VISION_MMPROJ_URL --create-dirs -o models/$env:VISION_MMPROJ_NAME
616+
- name: Download CC0 / public-domain test image
617+
run: curl -L --fail -A "java-llama.cpp-ci/1.0" $env:VISION_IMAGE_URL --create-dirs -o models/$env:VISION_IMAGE_NAME
567618
- name: List files in models directory
568619
run: ls -l models/
569620
- name: Validate model files
@@ -576,7 +627,11 @@ jobs:
576627
run: Get-CimInstance Win32_OperatingSystem | Select-Object FreePhysicalMemory,TotalVisibleMemorySize | Format-List
577628
shell: pwsh
578629
- name: Run tests
579-
run: mvn --no-transfer-progress test
630+
run: |
631+
mvn --no-transfer-progress test `
632+
"-Dnet.ladenthin.llama.vision.model=models/$env:VISION_MODEL_NAME" `
633+
"-Dnet.ladenthin.llama.vision.mmproj=models/$env:VISION_MMPROJ_NAME" `
634+
"-Dnet.ladenthin.llama.vision.image=models/$env:VISION_IMAGE_NAME"
580635
- name: Memory after tests
581636
if: always()
582637
run: Get-CimInstance Win32_OperatingSystem | Select-Object FreePhysicalMemory,TotalVisibleMemorySize | Format-List

0 commit comments

Comments
 (0)