Skip to content

hotfix/0.22.2: BOM coordinates + unsigned GGUF metadata#586

Merged
michalharakal merged 3 commits intodevelopfrom
hotfix/0.22.2
May 2, 2026
Merged

hotfix/0.22.2: BOM coordinates + unsigned GGUF metadata#586
michalharakal merged 3 commits intodevelopfrom
hotfix/0.22.2

Conversation

@michalharakal
Copy link
Copy Markdown
Contributor

Summary

Two independent fixes targeted for 0.22.2, plus the version bump.

1. fix(bom): publish skainet-bom at sk.ainet:skainet-bom (351f727)

Already on the branch from the prior commit. Out of scope for this description.

2. fix(gguf): handle unsigned numeric metadata fields (86cc067) — closes #585

GgufModelMetadata.from() and any consumer using (value as? Number)?.toInt() on reader.fields silently dropped UInt/ULong-typed values. Modern GGUFs store dimensions and counts as uint32, but Kotlin's unsigned types do not extend kotlin.Number, so the cast yielded null. Result: contextLength, embeddingLength, layerCount, headCount, vocabSize (fallback), bosTokenId, eosTokenId all came back null on real-world GGUFs and the loader fell back to defaults (e.g. blockCount=0 → zero-layer transformer).

Fix:

  • New public file GgufFieldAccessors.kt exposing Map<String, Any?> extensions: getInt / getLong / getString / getIntList / getStringList. The numeric ones handle every signed and unsigned integer type the reader can emit (Int / UInt / Long / ULong / Short / UShort / Byte / UByte) plus the matching primitive arrays for the list variant, plus the string-encoded numeric variant some GGUF metadata uses.
  • GgufModelMetadata.from() now routes through these public accessors; the buggy private helpers are deleted.
  • New regression test GgufModelMetadataUnsignedTest covering uint32 / uint64 scalars, uint-typed lists, every-numeric-type, and key-priority order.

Non-breaking — only adds new public API and fixes existing methods to return correct values.

Test plan

  • :skainet-io:skainet-io-gguf:jvmTest green (existing tokenizer suite + 5 new unsigned cases)
  • Composite-build downstream check: SKaiNET-transformers :llm-core:jvmTest :llm-inference:apertus:jvmTest — 96 tests, 0 failures
  • Reviewer to confirm BOM fix still validated by validate-published-poms.sh
  • Verify VERSION_NAME=0.22.2 propagates correctly to all published artifact coordinates

🤖 Generated with Claude Code

michalharakal and others added 2 commits May 2, 2026 14:30
The umbrella BOM was being emitted as `sk.ainet.core:skainet-bom`
because vanniktech's auto-coordinates feature picks up `GROUP=sk.ainet.core`
from the root `gradle.properties`, clobbering the per-module
`group = "sk.ainet"` override. Downstream BOMs (e.g.
`sk.ainet.transformers:skainet-transformers-bom`) import this with
`<groupId>sk.ainet</groupId>`, so they were unresolvable from a
fresh `mavenCentral()`-only project.

- Use vanniktech's explicit `mavenPublishing { coordinates(...) }`
  so the BOM lands at `sk.ainet:skainet-bom:0.22.2` regardless of
  the engine-wide GROUP.
- Extend `validate-published-poms.sh` to assert the BOM exists at
  `~/.m2/repository/sk/ainet/skainet-bom/` so the regression cannot
  ship again silently.
- Bump VERSION_NAME to 0.22.2; update README, CHANGELOG, and Antora
  docs samples (java-getting-started, java-model-training,
  io-readers, architecture) to the new version and BOM coordinates.

Fixes #584

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
GgufModelMetadata.from() — and any consumer using `(value as? Number)?.toInt()`
on `reader.fields` — silently dropped UInt/ULong-typed values. Modern GGUFs
store dimensions and counts as uint32, but Kotlin's unsigned types do not
extend kotlin.Number, so the cast yielded null. As a result contextLength,
embeddingLength, layerCount, headCount, vocabSize fallback, bosTokenId,
eosTokenId all came back null on real-world GGUFs and the loader fell back
to defaults (e.g. blockCount=0 → zero-layer transformer).

Add public Map<String, Any?> extensions in GgufFieldAccessors.kt:
getInt / getLong / getString / getIntList / getStringList. The numeric
accessors handle Int / UInt / Long / ULong / Short / UShort / Byte / UByte
plus the matching primitive arrays for the list variant, and the string-
encoded numeric variant some GGUF metadata uses.

Route GgufModelMetadata.from() through the new public accessors and remove
the buggy private helpers. Add a regression test covering uint32/uint64
scalars, uint-typed lists, and every numeric type the accessor accepts.

Closes #585

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 2, 2026

📖 Documentation Preview

The documentation has been built successfully for this PR.

Generated Files:

  • Operator documentation: docs/modules/operators/_generated_/
  • JSON schema output: operators.json

Artifacts:

  • Download the documentation-preview-586 artifact to view the complete documentation locally.

This comment will be updated automatically when the PR is updated.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 2, 2026

📖 Documentation Preview

The documentation has been built successfully for this PR.

Generated Files:

  • Operator documentation: docs/modules/operators/_generated_/
  • JSON schema output: operators.json

Artifacts:

  • Download the documentation-preview-586 artifact to view the complete documentation locally.

This comment will be updated automatically when the PR is updated.

@michalharakal michalharakal merged commit a4b38ed into develop May 2, 2026
16 checks passed
@michalharakal michalharakal deleted the hotfix/0.22.2 branch May 2, 2026 16:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

GgufModelMetadata silently drops UInt/ULong-typed numeric fields

1 participant