Skip to content

Latest commit

 

History

History
99 lines (71 loc) · 4.42 KB

File metadata and controls

99 lines (71 loc) · 4.42 KB

What Nullsec S1 Does Not Claim

Nullsec S1 aims to be taken seriously because it is rigorous, not because it exaggerates. This document states plainly what the project does not claim today. Most of these constraints are enforced in code by scripts/validate_claims.py, which scans public docs and fails CI on any unsubstantiated assertion.


Not trained from scratch

Nullsec S1 fine-tunes an open code model (Qwen/Qwen2.5-Coder-7B-Instruct, Apache 2.0) with QLoRA. It does not pretrain a foundation model from scratch, and it does not claim to.

Release assets, not source commits

The trained adapter and benchmark reports for RC2/v1.1 are published as GitHub Release assets (v1.0.0-rc25). They are deliberately not committed to this source repo (trained weights ship as release assets; the repo stays lightweight). The in-repo claim validator gates on what is present on disk: an unpacked release bundle permits release-backed claims locally; a fresh source checkout without the adapter/report remains conservative. This is intentional.

Benchmark numbers are release-artifact claims

README benchmark numbers are tied to the v1.0.0-rc25 release artifacts, not to hand-entered source files. Benchmark numbers come only from real runs (--mode model, or --mode replay over captured real outputs); a case with no output is scored as a real miss, never a synthetic pass. The source repo does not commit large result bundles or trained weights.

Not a replacement for human security review

Nullsec S1 is an additional, security-native layer. It does not replace human security engineers, manual penetration testing, threat modeling, or established SAST/DAST tooling. Use it alongside them, not instead of them.

Not guaranteed to catch every vulnerability

A clean verdict reduces risk; it does not prove the absence of vulnerabilities. False negatives are possible. The deterministic Safety Layer guarantees a consistent, non-bypassable decision rule over a verdict — it does not guarantee the model found every issue worth finding.

No "first / only / best" claims

Statements about being the "first", "only", or "best" LLM/system of its kind cannot be validated from repository artifacts — no local file can substantiate a claim about the rest of the world. The claim validator never auto-permits these. They are not made here and would need independent support if ever stated.

No hidden reasoning-trace interface

S1 means Security-1. Nullsec-S1 is documented as a model that emits a final structured JSON security audit. It does not claim a hidden chain-of-thought API, <thought> token format, or custom reasoning-trace parser.

Integration claims are scoped

Transformers + PEFT inference is supported via inference.py. vLLM, Ollama, LM Studio, and GGUF packaging are roadmap items unless and until a future release adds tested support. The hosted web scanner and API backend are also roadmap items, not current hosted services.

"Production-ready model" requires real evidence

The strongest claim — that the model is suitable for production use — is gated on the highest bar: a trained adapter, a real-model benchmark, a zero false-safe rate, adequate detection quality, and independent review. RC2/v1.1 satisfies this bar on the included release benchmark suite. That does not make the model a guarantee for arbitrary real-world code. (Note: the verdict field production_ready is a separate, well-defined per-analysis decision computed by the Safety Layer; it is not an absolute security guarantee.)

Production-ready scope

When this repo says RC2/v1.1 is production-ready, it means:

RC2/v1.1 passed the Nullsec internal release gate on the included 111-case benchmark suite.

It does not mean every production system is guaranteed secure, every vulnerability will be found, or independent security review is unnecessary.


How this is enforced

python scripts/validate_claims.py          # status table of permitted/forbidden claims
python scripts/validate_claims.py --check   # fails if README/RELEASE_SUMMARY overclaim

The set of permitted claims is derived purely from artifacts on disk (scripts/_artifacts.py): a trained adapter, a real-model benchmark report with a non-empty result set and run_mode: "model", passing safety probes, and a release bundle. As those artifacts come into existence (see ROADMAP.md), the corresponding claims unlock — and not before.