Last updated: 2026-05-25
| Metric | Value |
|---|---|
| Frequency range | 280 - 1100 MHz |
| Countries | FR, US, UK, CN, DE, RU, ES, IT, CH (9) |
| Total entries | 500 |
| Unique services/applications | 178 |
| Q&A training pairs | 1500 |
| Fact-check corrections applied | 16 |
| Sources consulted for verification | 11 regulatory bodies |
- Baseline frequency tables (280-1100 MHz, 9 countries)
- Fine-grained segment split (5 sub-bands: 280-400, 400-470, 470-700, 700-870, 870-1100)
- Per-country allocations with real device examples
- Modulation types, channel spacing, power limits
- Regulatory references (ITU, FCC, ANFR, Ofcom, BNetzA, MIIT, CNAF, Roskomnadzor, BAKOM)
- Merged dataset:
merged_dataset/enriched_280_1100_mhz_ALL.json(500 entries) - CSV export:
merged_dataset/enriched_280_1100_mhz_ALL.csv
- 5 independent fact-check reports per sub-band
- Consolidated report:
factcheck_reports/factcheck_CONSOLIDATED_280_1100_mhz.json - 16 critical corrections applied (see below)
- Cross-referenced against official regulatory databases
| # | Issue | Fix |
|---|---|---|
| 1 | 315 MHz TPMS listed for EU | EU uses 433.92 MHz, 315 MHz is US/JP only |
| 2 | TETRA listed for US/CN/RU | US=P25 (NTIA), CN=PDT 350-370, RU=limited |
| 3 | WMTS 608-614 listed for EU | US-only (FCC Part 95H) |
| 4 | LoRaWAN 868 listed for US/CN | US=902-928 MHz, CN=470-510 MHz |
| 5 | UK 700 MHz Vodafone | Corrected to O2/VMO2 (Ofcom 2021 auction) |
| 6 | FR 700 MHz Free/Bouygues swap | ARCEP Dec 2015: Free=2x5, Bouygues=2x10 |
| 7 | ECC Decision (14)02 cited | Corrected to (15)01 for 700 MHz duplex gap |
| 8 | Gazpar gas meters at 868 MHz | Corrected to 169 MHz VHF Wize (GRDF) |
| 9 | AEHF listed as UHF MILSATCOM | Corrected to MUOS (AEHF is EHF 44 GHz) |
| 10 | COSPAS-SARSAT at 399.9 MHz | Corrected to 406.0-406.1 MHz |
| 11 | BeiDou/GLONASS at 399.9 MHz | Both are L-band only, removed |
| 12 | DME Y-mode 30 us | Corrected to 36 us (ICAO Annex 10) |
| 13 | ITU 5.328A for ADS-B | Corrected to 5.328B |
| 14 | 863-870 labeled ISM | Corrected to SRD (ETSI EN 300 220) |
| 15 | CH listed as TETRA | CH uses TETRAPOL (Polycom network) |
| 16 | CN listed TETRA 380 MHz | CN uses PDT 350-370 MHz |
- 1500 Q&A pairs generated
- 3 categories: fonction, reglementation, appareils_utilisateurs
- Source citations in every answer
- Format: JSONL (HuggingFace-ready) + JSON
- Output:
qa_dataset/spectrum_qa_dataset.jsonl
8-step automated pipeline in Data_Process/:
| Step | Script | Backend | Status |
|---|---|---|---|
| 01 | Ingestion multi-sources | git + datasets | Code ready |
| 02 | Dedup (hash + semantic) | sentence-transformers cuda:0 | Code ready |
| 03 | RF validation + multi-source cross-check + confidence % | rules + HF + Wikipedia + Web/forums | Code ready |
| 04 | Protocol DB matching | rapidfuzz CPU | Code ready |
| 05 | LLM hallucination check | transformers + device_map="auto" (Qwen 32B sharded across 6 GPUs) | Code ready |
| 06 | NLI fact verification | DeBERTa-v3 cuda:1 | Code ready |
| 07 | Scoring (0-13 scale) | rules | Code ready |
| 08 | Export (JSONL buckets + confidence_pct) | rules | Code ready |
Dual scoring system:
- Step 07 score (0-13) — internal quality checks (freq/protocol/timings/LLM/NLI/dedup)
- Step 03 confidence (0-100%) — external triangulation across HF + Wikipedia + WebSearch + forums, with bonuses for triangulation and penalties for contradictions / missing sources
Target rig: 6x RTX 3070 (48 GB VRAM), Ubuntu 24.04
Pending:
- Transfer scripts to rig
- Install vLLM (
pip install vllm) and downloadQwen/Qwen2.5-32B-InstructHF weights - Run full pipeline end-to-end (with
CROSS_CHECK=1) - Human audit of 'partial' bucket via
audit_dashboard.py - Merge audit decisions back into final dataset
- Upload verified.jsonl as HF dataset
- Dataset card with methodology
- License selection
Flipper_Zero_RF_DataSet/
baseline/ Original CSV frequency tables
enriched_data/ 5 enriched JSON files per sub-band
merged_dataset/ Merged 500-entry JSON + CSV
qa_dataset/ 1500 Q&A pairs (JSONL + JSON)
factcheck_reports/ 5 sub-band reports + consolidated
Data_Process/
README.md Pipeline architecture docs
scripts/
00_config.py Global config & backend dispatch
01_ingest_sources.py Multi-source ingestion
02_dedup.py Hash + semantic deduplication
03_rf_validation.py RF rules validation
04_protocols_db.py Protocol DB matching (rapidfuzz)
05_llm_hallucination.py LLM hallucination detection (Ollama)
06_fact_verification.py NLI fact-check (DeBERTa)
07_scoring.py Score aggregation & bucketing
08_export.py Final JSONL export + manifest
audit_dashboard.py Human audit CLI
run_pipeline.sh Pipeline orchestrator
STATUS.md This file
STANDARDS.md RF standards & regulatory references
README.md Project overview