Skip to content

Commit 1ebd21b

Browse files
michalharakalclaude
andcommitted
Annotate TokenizerFactory for Java call sites (#400)
Adds `@JvmStatic` to both factory entry points on the `TokenizerFactory` object (`fromGguf(Map)`, `fromTokenizerJson(String)`). Same motivation as StableHloConverterFactory in the previous commit — without the annotation, Java consumers had to navigate through the Kotlin object's `INSTANCE` marker: var tokenizer = TokenizerFactory.INSTANCE.fromGguf(ggufFields); With the annotation they get the idiomatic static form: var tokenizer = TokenizerFactory.fromGguf(ggufFields); The factory is the canonical entry point for the new Qwen byte-level BPE + SentencePiece tokenizers that landed in #463 and #464, so this is a meaningful win for Java consumers of the upcoming 0.19.0 release — they get Qwen / Llama / Gemma / TinyLlama tokenization without any Kotlin-specific interop glue. Verified across jvmTest, compileKotlinWasmJs, and macosArm64Test for skainet-io-core — no regressions. Third of five commits polishing the Java / JVM consumption story for the upcoming 0.19.0 release. See #400. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
1 parent 25be9dc commit 1ebd21b

1 file changed

Lines changed: 3 additions & 0 deletions

File tree

skainet-io/skainet-io-core/src/commonMain/kotlin/sk/ainet/io/tokenizer/TokenizerFactory.kt

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,7 @@ package sk.ainet.io.tokenizer
33
import kotlinx.serialization.json.Json
44
import kotlinx.serialization.json.jsonObject
55
import kotlinx.serialization.json.jsonPrimitive
6+
import kotlin.jvm.JvmStatic
67

78
/**
89
* Selects the right [Tokenizer] implementation for a model.
@@ -33,6 +34,7 @@ public object TokenizerFactory {
3334
* `ggufModelMetadata.rawFields` — this keeps `skainet-io-core` free of a
3435
* dependency on `skainet-io-gguf`.
3536
*/
37+
@JvmStatic
3638
public fun fromGguf(fields: Map<String, Any?>): Tokenizer {
3739
val model = (fields["tokenizer.ggml.model"] as? String)?.lowercase()
3840
?: throw UnsupportedTokenizerException(
@@ -57,6 +59,7 @@ public object TokenizerFactory {
5759
* to [QwenByteLevelBpeTokenizer]; `"Unigram"` (SentencePiece) and
5860
* `"WordPiece"` currently throw.
5961
*/
62+
@JvmStatic
6063
public fun fromTokenizerJson(json: String): Tokenizer {
6164
val root = JSON.parseToJsonElement(json).jsonObject
6265
val modelType = root["model"]?.jsonObject?.get("type")?.jsonPrimitive?.content

0 commit comments

Comments
 (0)