Skip to content

Latest commit

 

History

History
185 lines (141 loc) · 7.27 KB

File metadata and controls

185 lines (141 loc) · 7.27 KB

Answer Artifact Gate

Purpose

Answer-readiness claims need a model artifact that is known to produce coherent short answers before any CPU, CUDA, Apple, NPU, SLM, or server lane can claim a usable local-answer path.

A GGUF can be structurally valid and still fail answer readiness. Structural loading proves that the bytes can be parsed. Answer readiness also requires tokenizer and pre-tokenizer authority, prompt-template authority, a passing reference runner, deterministic prompt-suite output, and an exact artifact hash.

This gate is shared across hardware lanes so each backend does not rediscover the same model-artifact failure and misattribute it to its own kernels.

Required States

Machine-readable manifests must use one of these states:

State Meaning
unknown Artifact is known by name only. It is not valid evidence for answer claims.
structurally_valid Artifact loads or passes structural checks, but answer quality is not proven.
rejected_missing_tokenizer_authority Tokenizer or pre-tokenizer metadata is missing or ambiguous for answer claims.
rejected_reference_runner_failed A reference runner could not produce coherent output or could not run the artifact.
rejected_prompt_suite_failed The deterministic prompt suite failed under the reference runner.
answer_ready The artifact passes this gate and can unblock backend answer-readiness work.
blocked Search or regeneration is blocked; do not claim coherent local answers.

Project-specific manifests may retain more detailed legacy statuses, but they must map to one of these shared states before they are used as a cross-lane precondition.

Authority Dimensions

Rejected and candidate rows may carry additional authority dimensions when an artifact has mixed evidence:

Field Values
answer_readiness_scope official_target, alternate_quant_control, diagnostic_only
target_alignment official_i2s_cuda_target, official_derived_alt_quant, unrelated
runner_authority stock_llama_cpp, ik_llama_cpp, microsoft_bitnet, unknown
tokenizer_authority present, missing, defaulted, externally_supplied
pretokenizer_authority present, missing, defaulted, externally_supplied
prompt_suite_result passed, failed, blocked, not_run

Manifests should also record whether an artifact can unblock the official I2_S CUDA answer lane or only a separate alternate-quant control lane. Passing output from an alternate quantization is useful control evidence, but it is not proof that the official I2_S CUDA target is answer-ready.

Model/Kernel Compatibility

Model/kernel support is a separate precondition from answer quality. A model can be a valid diagnostic target and still be invalid as an answer, reference, parity, or benchmark authority for a specific CPU architecture and quantization kernel.

The compatibility ledger is:

  • ci/model-artifacts/model-kernel-compatibility.toml

The official Microsoft BitNet-b1.58-2B-4T I2_S path remains the x86 reference authority for BitNet-rs CPU and CUDA answer lanes. By contrast, 1bitLLM/bitnet_b1_58-3B on x86 with I2_S is marked unsupported_upstream, matching the upstream bitnet.cpp support table. That combination may be used for diagnostic runs, artifact inspection, or unsupported-path receipts, but it must not be used for:

  • answer_ready
  • reference_authority
  • backend_parity
  • speedup

Upstream lists 1bitLLM/bitnet_b1_58-3B x86 TL2 and ARM TL1 support, but those paths still require runner-path verification before they can become proof authority in this repo.

Answer-Ready Requirements

An answer_ready artifact must record:

  • Repository or source path.
  • Exact file name.
  • Exact SHA256.
  • Byte size.
  • Format and architecture.
  • Quantization family.
  • Tokenizer source.
  • Tokenizer model family.
  • Pre-tokenizer authority or an explicit compatibility decision.
  • Prompt-template family and stop-token policy.
  • Reference runner name and version or commit.
  • Reference runner command shape.
  • Deterministic prompt suite path.
  • Per-prompt pass/fail output summary.

The deterministic prompt suite must include constrained prompts such as math_2_plus_2 and capital_france. Constrained prompts must pass their explicit gates. Open prompts need readable, non-empty, printable UTF-8 output without raw special-token garbage or repetition failures.

Claim Boundaries

Allowed before this gate passes:

  • Structural GGUF validity.
  • Loader and tokenizer diagnostics.
  • Backend execution proof.
  • CPU/CUDA/Metal/NPU diagnostic receipts marked diagnostic_only.
  • Failure classification such as model_artifact_blocked.

Not allowed before this gate passes:

  • Coherent local-answer claims.
  • CPU answer-readiness claims.
  • CUDA answer-readiness claims.
  • Apple M4 local-answer success claims.
  • NPU, SLM, server, or other hardware answer claims.
  • Speedup claims based on generated text quality.

Current Shared Artifact

The official Microsoft BitNet I2_S GGUF is the shared answer_ready artifact for backend answer-readiness gates when paired with the documented external Microsoft tokenizer authority and tokenizer.ggml.pre=llama-bpe compatibility decision.

The GGUF remains structurally unchanged and still lacks embedded tokenizer.ggml.pre. The source Microsoft model repository publishes an external tokenizer.json with explicit pre-tokenizer behavior. MODEL-ARTIFACT-007 records that Microsoft BitNet.cpp passes the committed deterministic answer corpus for the official I2_S artifact when that external authority is supplied to the runner with:

--override-kv tokenizer.ggml.pre=str:llama-bpe

The tdh111 IQ2_BN_R4 artifact is recorded separately as alternate-quant control evidence: it passes the tiny prompt suite under its intended ik_llama.cpp runner, but it is still missing pre-tokenizer authority and does not unblock the official Microsoft I2_S CUDA target.

Shared manifests:

  • ci/model-artifacts/artifact-manifest.toml
  • ci/model-artifacts/candidate-artifacts.toml
  • ci/model-artifacts/rejected-artifacts.toml
  • ci/model-artifacts/tokenizer-authority.toml
  • ci/model-artifacts/model-kernel-compatibility.toml
  • ci/model-artifacts/model-coverage-matrix.toml

The shared search and promotion reports are:

  • docs/reports/MODEL_ARTIFACT_002_REFERENCE_GOOD_SEARCH.md
  • docs/reports/MODEL_ARTIFACT_007_MICROSOFT_BITNETCPP_EXTERNAL_PRETOKENIZER.md

Apple-local evidence remains at:

  • ci/quality/apple-m4-local-answer-model-artifacts.toml

Hardware-Lane Use

CPU and accelerator lanes can still run diagnostics against rejected artifacts, but receipts must keep claims narrow:

claim = diagnostic_only
answer_readiness_claim = false
speedup_claim = false

Once an artifact becomes answer_ready, hardware lanes may use it for their own strict answer-readiness gates while preserving lane-specific backend, runtime, fallback, kernel, and timing proof requirements.

The cross-family coverage matrix in ci/model-artifacts/model-coverage-matrix.toml records the current proof tier for BitNet, dense SLM, selected small-LLM, unsupported, and docs-only lanes. Use it as the first stop for claim boundaries before treating a verified model as CPU answer-ready, CUDA answer-ready, benchmark-qualified, CLI-ready, or server-ready.