You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
All pages use async health polling via `GovernanceClient.get_json()` and follow the monolith SettingsWidget pattern.
472
+
473
+
## 21. HuggingFace Open LLM Leaderboard Integration
474
+
475
+
Source: `src/specsmith/agent/hf_leaderboard.py`
476
+
477
+
Syncs model benchmark data from the HuggingFace Datasets Server (`datasets-server.huggingface.co/rows?dataset=open-llm-leaderboard/contents`). Supports paginated fetch, exponential-backoff 429 handling with `RateLimit: t=` header parsing, optional HF API token (doubles rate limit to 1000 req/5min), and a static fallback of 50+ known models for offline operation.
478
+
479
+
Background task runs 15 s after startup then every 24 h. Scores are persisted to `~/.specsmith/model_scores.json` under a `bucket_scores` key alongside existing role scores.
480
+
481
+
Benchmarks mapped: IFEval, BBH, MATH Lvl 5, GPQA, MUSR, MMLU-PRO (HF field names → internal keys).
482
+
483
+
REST endpoints exposed by governance server:
484
+
-`GET /api/model-intel/scores` — all cached scores
Ranked recommendation returns the top-10 models for a requested bucket. The engine merges HF-synced data with the existing `BASELINE_SCORES` so both cloud and local Ollama models appear in rankings.
503
+
504
+
Base+org-prefix deduplication: `Qwen/Qwen3-14B` is stored under both its full name and `Qwen3-14B` so vLLM-style repo-ID model names match correctly.
505
+
506
+
## 23. Model Capability Profiles
507
+
508
+
Source: `src/specsmith/agent/model_profiles.py`
509
+
510
+
Per-model capability descriptors resolved by prefix matching (longest key wins):
511
+
512
+
| Field | Type | Meaning |
513
+
|---|---|---|
514
+
|`max_tokens`| int | Max completion tokens to request |
515
+
|`temperature`| float | Sampling temperature |
516
+
|`ctx_budget`| int | Approx. chars of conversation history to keep |
Covers 40+ models across Ollama (Mistral, Qwen, Llama, Gemma, Phi, DeepSeek), cloud (OpenAI o-series, Claude, Mistral API), and a `_DEFAULT` fallback.
521
+
522
+
Context history trimmer (`trim_history`) summarises dropped turns into a compact `[Earlier conversation summary — N turns condensed]` assistant message to preserve research continuity.
Enhancements over the existing rolling-window scheduler:
529
+
530
+
-**EMA utilisation tracking** — exponentially-weighted moving average of RPM/TPM utilisation (`alpha=0.25`) surfaced in `snapshot()`
531
+
-**Adaptive concurrency** — `dynamic_concurrency` decreases on `on_rate_limit()`, restores after 120 s (incrementally, 60 s between steps)
532
+
-**Retry-After parsing** — `parse_retry_after_seconds()` extracts `"try again in Xs"` from provider error strings; used when exponential backoff alone is insufficient
533
+
-**Image token estimation** — `estimate_request_tokens()` accepts `image_count` and multiplies by a per-model `image_token_estimate` (default 4096)
All operations are guarded by a single `threading.Condition` lock so the pacer is safe for concurrent agent sessions.
537
+
538
+
## 25. Multi-Provider LLM Client with Fallback
539
+
540
+
Source: `src/specsmith/agent/llm_client.py`
541
+
542
+
Provider-agnostic chat client that tries a configurable ordered list of providers, falling back on 401/403/429/5xx. No optional packages required — uses `urllib` only.
**O-series translation**: OpenAI o1/o3/o4 models receive `max_completion_tokens` instead of `max_tokens` and their `system` messages are renamed to `developer`.
549
+
550
+
**vLLM guided-JSON**: endpoints of type `byoe` or `huggingface` receive `guided_json` + `chat_template_kwargs: {enable_thinking: false}` when a JSON schema is provided.
551
+
552
+
**Gemini parts extraction**: handles models that return answer text in `parts` rather than `content`.
553
+
554
+
**JSON extraction helper** (`_extract_json`): tries direct parse → `\`\`\`json` fence → first balanced `{}` block before raising.
Generates a list of ready-to-add `ProviderEntry` suggestions by inspecting:
586
+
587
+
1. Cloud API keys present in environment variables
588
+
2. Ollama models currently installed (`/api/tags`)
589
+
3. Custom BYOE endpoints in `providers.json`
590
+
591
+
For each backend, role-tuned parameter sets (temperature, max_tokens) are proposed following the AEE bucket taxonomy: `reasoning`, `conversational`, `longform`.
592
+
593
+
Suggestions are inert previews — the user calls `specsmith agent providers add` to persist.
Copy file name to clipboardExpand all lines: docs/REQUIREMENTS.md
+133Lines changed: 133 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1803,3 +1803,136 @@ ame, command, and rgs fields derived from the description. The stub MUST be val
1803
1803
-**Description:** The Kairos Agents > MCP servers list page MUST include a collapsible AI Builder card that accepts a natural-language server description, calls specsmith mcp generate <description> --json, displays the generated JSON stub, and offers an 'Add to ~/.specsmith/mcp.json' button that appends the stub to the user's MCP config file.
-**Description:** specsmith MUST implement `src/specsmith/agent/hf_leaderboard.py` that fetches model benchmark data from the HuggingFace Datasets Server (`datasets-server.huggingface.co/rows?dataset=open-llm-leaderboard/contents`). The sync MUST be paginated (100 rows/page) and persist results to `~/.specsmith/model_scores.json` under a `bucket_scores` key.
1811
+
-**Source:** ARCHITECTURE.md §21 [HF-001]
1812
+
-**Status:** defined
1813
+
1814
+
## 264. HF Leaderboard Rate-Limit Handling
1815
+
-**ID:** REQ-264
1816
+
-**Title:** HF Leaderboard Rate-Limit Handling
1817
+
-**Description:** The HF leaderboard sync MUST handle HTTP 429 with exponential-backoff retry (up to 4 attempts). It MUST parse the `RateLimit: "api";r=X;t=Y` header to extract the exact reset window and wait accordingly. A +1 s safety margin MUST be added to the `t=` value.
1818
+
-**Source:** ARCHITECTURE.md §21 [HF-002]
1819
+
-**Status:** defined
1820
+
1821
+
## 265. HF API Token Support
1822
+
-**ID:** REQ-265
1823
+
-**Title:** HF API Token Support
1824
+
-**Description:** When `SPECSMITH_HF_TOKEN` or `hf_api_token` is configured, the HF sync MUST include an `Authorization: Bearer <token>` header. The CLI `specsmith model-intel test-hf` MUST validate the token via `huggingface.co/api/whoami-v2` and report whether the Datasets Server is reachable.
1825
+
-**Source:** ARCHITECTURE.md §21 [HF-003]
1826
+
-**Status:** defined
1827
+
1828
+
## 266. HF Leaderboard Static Fallback
1829
+
-**ID:** REQ-266
1830
+
-**Title:** HF Leaderboard Static Fallback
1831
+
-**Description:** When HF is unreachable (network error, 5xx, or zero parseable rows), specsmith MUST load built-in static benchmark scores covering at least 40 models (OpenAI GPT-4o/mini, Claude 3.5 sonnet/haiku, Gemini 2.x, Mistral, Qwen, Llama, DeepSeek, Phi). The fallback MUST be transparent to callers.
1832
+
-**Source:** ARCHITECTURE.md §21 [HF-004]
1833
+
-**Status:** defined
1834
+
1835
+
## 267. Bucket Scoring Engine
1836
+
-**ID:** REQ-267
1837
+
-**Title:** Bucket Scoring Engine
1838
+
-**Description:** specsmith MUST compute three task-bucket scores from raw benchmark values (0–100 scale): Reasoning = 0.35×MATH + 0.30×GPQA + 0.25×BBH + 0.10×IFEval; Conversational = 0.40×IFEval + 0.35×MMLU-PRO + 0.25×BBH; Longform = 0.35×MUSR + 0.35×IFEval + 0.30×MMLU-PRO. Scores MUST be rounded to 2 decimal places.
1839
+
-**Source:** ARCHITECTURE.md §22 [BKT-001]
1840
+
-**Status:** defined
1841
+
1842
+
## 268. Model Intelligence Recommendations
1843
+
-**ID:** REQ-268
1844
+
-**Title:** Model Intelligence Recommendations
1845
+
-**Description:**`specsmith model-intel recommendations [--bucket reasoning|conversational|longform]` MUST return the top-10 models sorted by the requested bucket score. The governance HTTP server MUST expose `GET /api/model-intel/recommendations?bucket=<name>` returning the same data.
1846
+
-**Source:** ARCHITECTURE.md §22 [BKT-002]
1847
+
-**Status:** defined
1848
+
1849
+
## 269. Model Intelligence CLI Commands
1850
+
-**ID:** REQ-269
1851
+
-**Title:** Model Intelligence CLI Commands
1852
+
-**Description:** specsmith MUST provide a `model-intel` CLI group with subcommands: `sync` (run HF sync), `scores [--model NAME]` (list/get cached scores), `recommendations [--bucket NAME]` (top-10 per bucket), `test-hf` (connectivity probe). All commands MUST support `--json` flag.
1853
+
-**Source:** ARCHITECTURE.md §21 [HF-005]
1854
+
-**Status:** defined
1855
+
1856
+
## 270. Model Capability Profiles
1857
+
-**ID:** REQ-270
1858
+
-**Title:** Model Capability Profiles
1859
+
-**Description:** specsmith MUST implement `src/specsmith/agent/model_profiles.py` with a `ModelProfile` TypedDict containing `max_tokens`, `temperature`, `ctx_budget`, `action_capable`, `prompt_style` fields. A `get_profile(model)` function MUST resolve by prefix matching (longest key first) over ≥40 known models.
1860
+
-**Source:** ARCHITECTURE.md §23 [PRF-001]
1861
+
-**Status:** defined
1862
+
1863
+
## 271. Context History Trimmer
1864
+
-**ID:** REQ-271
1865
+
-**Title:** Context History Trimmer
1866
+
-**Description:**`trim_history(messages, budget_chars)` in `model_profiles.py` MUST trim conversation history to fit within `budget_chars`. Oldest turns MUST be summarised into a compact `[Earlier conversation summary — N turns condensed]` assistant message rather than silently dropped. System messages MUST always be preserved.
1867
+
-**Source:** ARCHITECTURE.md §23 [PRF-002]
1868
+
-**Status:** defined
1869
+
1870
+
## 272. AI Model Pacer EMA Utilisation
1871
+
-**ID:** REQ-272
1872
+
-**Title:** AI Model Pacer EMA Utilisation
1873
+
-**Description:** The `ModelRateLimitScheduler` MUST track RPM and TPM utilisation as exponentially-weighted moving averages (alpha=0.25) and expose them in `snapshot()` as `rpm_ema` and `tpm_ema` fields.
1874
+
-**Source:** ARCHITECTURE.md §24 [PCR-001]
1875
+
-**Status:** defined
1876
+
1877
+
## 273. AI Model Pacer Adaptive Concurrency
1878
+
-**ID:** REQ-273
1879
+
-**Title:** AI Model Pacer Adaptive Concurrency
1880
+
-**Description:**`on_rate_limit(model, error, attempt)` MUST decrease `dynamic_concurrency` by 1 (minimum=1) and set `reduced_until` to now+120 s. Concurrency MUST restore incrementally (1 step per 60 s) once `reduced_until` has passed. The method MUST return a float delay for the caller to sleep.
1881
+
-**Source:** ARCHITECTURE.md §24 [PCR-002]
1882
+
-**Status:** defined
1883
+
1884
+
## 274. AI Model Pacer Image Token Estimation
1885
+
-**ID:** REQ-274
1886
+
-**Title:** AI Model Pacer Image Token Estimation
1887
+
-**Description:**`estimate_request_tokens()` MUST accept an `image_count` parameter and include `image_count × image_token_estimate` tokens in the reservation. The default `image_token_estimate` MUST be 4096.
1888
+
-**Source:** ARCHITECTURE.md §24 [PCR-003]
1889
+
-**Status:** defined
1890
+
1891
+
## 275. Multi-Provider LLM Client with Fallback
1892
+
-**ID:** REQ-275
1893
+
-**Title:** Multi-Provider LLM Client with Fallback
1894
+
-**Description:** specsmith MUST implement `src/specsmith/agent/llm_client.py` with a `LLMProvider` ABC and `LLMClient` that tries providers in order, falling back on HTTP 401/403/429/5xx. Concrete providers MUST cover Mistral, OpenAI, Google Gemini, and Ollama. A `MockProvider` MUST be available for tests.
1895
+
-**Source:** ARCHITECTURE.md §25 [LLM-001]
1896
+
-**Status:** defined
1897
+
1898
+
## 276. LLM Client O-Series Translation
1899
+
-**ID:** REQ-276
1900
+
-**Title:** LLM Client O-Series Translation
1901
+
-**Description:** When the model name starts with `o1`, `o3`, or `o4`, or contains `-o1-`/`-o3-`/`-o4-`, the LLM client MUST use `max_completion_tokens` instead of `max_tokens`, force temperature to 1, and rename `system` role messages to `developer`.
1902
+
-**Source:** ARCHITECTURE.md §25 [LLM-002]
1903
+
-**Status:** defined
1904
+
1905
+
## 277. LLM Client vLLM Guided-JSON Mode
1906
+
-**ID:** REQ-277
1907
+
-**Title:** LLM Client vLLM Guided-JSON Mode
1908
+
-**Description:** When a JSON schema is provided and the provider type is `byoe` or `huggingface`, the request MUST include `guided_json` and `chat_template_kwargs: {"enable_thinking": false}` to suppress chain-of-thought tokens and enforce structured output.
1909
+
-**Source:** ARCHITECTURE.md §25 [LLM-003]
1910
+
-**Status:** defined
1911
+
1912
+
## 278. Endpoint Preset Registry
1913
+
-**ID:** REQ-278
1914
+
-**Title:** Endpoint Preset Registry
1915
+
-**Description:**`src/specsmith/agent/provider_registry.py` MUST export `ENDPOINT_PRESETS` — a list of built-in connection presets for at least: vLLM (localhost:8000), LM Studio (localhost:1234), llama.cpp (localhost:8080), OpenRouter, Together AI, Groq, Fireworks, DeepInfra, Perplexity, and Azure OpenAI. Each preset MUST include `id`, `label`, `base_url`, `endpoint_kind`, and `needs_key`.
1916
+
-**Source:** ARCHITECTURE.md §26 [PRE-001]
1917
+
-**Status:** defined
1918
+
1919
+
## 279. Endpoint Probe Enriched Metadata
1920
+
-**ID:** REQ-279
1921
+
-**Title:** Endpoint Probe Enriched Metadata
1922
+
-**Description:**`probe_openai_compatible()` MUST return a `models_detail` list where each entry includes `id`, `owner`, `context_length` (from `max_model_len` on vLLM, `context_length` or `context_window` otherwise), and `description`. The cap MUST be 200 models.
1923
+
-**Source:** ARCHITECTURE.md §26 [PRE-002]
1924
+
-**Status:** defined
1925
+
1926
+
## 280. Suggested Profile Generation
1927
+
-**ID:** REQ-280
1928
+
-**Title:** Suggested Profile Generation
1929
+
-**Description:**`specsmith agent suggest-profiles` MUST inspect available backends (cloud env vars, installed Ollama models, saved BYOE endpoints) and propose ready-to-add `ProviderEntry` suggestions with role-tuned temperature and max_tokens for the reasoning/conversational/longform AEE buckets. Suggestions MUST be inert (not auto-saved).
1930
+
-**Source:** ARCHITECTURE.md §27 [SGP-001]
1931
+
-**Status:** defined
1932
+
1933
+
## 281. Kairos AI Settings Bucket Score Display
1934
+
-**ID:** REQ-281
1935
+
-**Title:** Kairos AI Settings Bucket Score Display
1936
+
-**Description:** The Kairos Agents > Providers settings page MUST display bucket scores (reasoning, conversational, longform) retrieved from `GET /api/model-intel/scores/{model}` for each configured provider. Scores MUST be shown as compact numeric badges. A Sync button MUST call `POST /api/model-intel/sync`.
0 commit comments