Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
87 changes: 87 additions & 0 deletions backend/glossa_lab/EXPERIMENT_LEDGER.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,87 @@
# Experiment Ledger

Comprehensive catalog of all experiment graph phase files in the Glossa Lab research platform.

## Summary

| Phase File | Category | Status | Purpose | Recommendation |
|---|---|---|---|---|
| experiment_graph.py | structural_analysis | active | Core engine with ~50 built-in atomic nodes and graph execution runtime | keep |
| experiment_graph_ctt.py | ctt | active | CTT nodes: sign role classifier, admissibility filter, holdout recall, compound constraint, anchored SA | keep |
| experiment_graph_indus_evidence.py | misc | active | Evidence graph nodes for literature/claims management | keep |
| experiment_graph_phase14.py | structural_analysis | active | Block entropy, conditional entropy, Zipf-Mandelbrot, MI decay, epistatic detection, language verdict | keep |
| experiment_graph_phase15.py | structural_analysis | active | Long-tail validity, cipher self-test, multi-hypothesis ranker, deciphered mapping exporter | keep |
| experiment_graph_phase_legacy.py | misc | legacy | Migration shims for Phase 16-19 standalone scripts | keep |
| experiment_graph_phase20.py | structural_analysis | active | Length-stratified spectral, cluster archaeology, Ferrara OCR, Fuls classifier | keep |
| experiment_graph_phase21.py | structural_analysis | active | Repetition collapser, site stratifier, numerical-weight regression | keep |
| experiment_graph_phase22.py | contact_zone | active | CDLI Meluhha-mention corpus, Meluhhan persons, Indus seals at Mesopotamia | keep |
| experiment_graph_phase23.py | contact_zone | active | Refined seal audit, strict PN extraction, bilingual readout test | keep |
| experiment_graph_phase24.py | contact_zone | active | Laursen Table 1, seal sign upgrade, persons-v2, bipartite readout v2 | keep |
| experiment_graph_phase25.py | sa_variant | active | Janabiyah readout, held-out test, period-stratified replication, Tamil-Brahmi | keep |
| experiment_graph_phase26.py | sa_variant | active | Provenience-stratified SA, Bayesian decoder, expanded readout | keep |
| experiment_graph_phase27.py | sa_variant | active | Reverse Janabiyah, Bayesian v2, iconographic anchors | keep |
| experiment_graph_phase28.py | archaeological | active | CISI Vol 3 OCR, Mahadevan crosswalk, allograph-aware SA | keep |
| experiment_graph_phase29.py | archaeological | active | Corpus 10x expansion: M77, ePSD2, Fuls, ICIT loaders | keep |
| experiment_graph_phase30.py | sa_variant | active | Length-cohort reverse Janabiyah, permutation test, Dravidian syllable LM | keep |
| experiment_graph_phase_misc_gaps.py | misc | legacy | Phases 44-47, 202, 209-215, 254-256 wrapped as subprocess runners | keep |
| experiment_graph_phase48_55.py | sa_variant | active | Full Indus decipherment pipeline (GPU mandatory) | keep |
| experiment_graph_phase56_61.py | sa_variant | active | Expanded Parpola SA, phonotactic falsification (GPU mandatory) | keep |
| experiment_graph_phase62_66.py | sa_variant | active | Ensemble fix, phonotactic filter, morpheme boundary, Sanskrit SA | keep |
| experiment_graph_phase67_73.py | sa_variant | active | Sanskrit norm, formula translation, M267 validation, parser | keep |
| experiment_graph_phase74_80.py | lm_scoring | active | Grammar test, Levit, place formula, SA agreement, semantic cluster, DEDR | keep |
| experiment_graph_phase81_87.py | sa_variant | active | M293 deep-dive, seal translation, formula lexicon, phonology | keep |
| experiment_graph_phase88_90.py | misc | active | Literature mine, DEDR expansion to 120, scholarly translations | keep |
| experiment_graph_phase91_100.py | sa_variant | active | Anchor-120 SA, M293 SA, trigram, grammar, full-corpus runs | keep |
| experiment_graph_phase101_103.py | archaeological | active | M293 iconographic, PDF extraction, personal name lexicon | keep |
| experiment_graph_phase104_109.py | sa_variant | active | OCR, name signs, name SA, Tamil-Brahmi check, phoneme exhaustion | keep |
| experiment_graph_phase110_115.py | sa_variant | active | Targeted SA, allographs, grammar inference, seal translations | keep |
| experiment_graph_phase116_121.py | sa_variant | active | SA recalibration, grammar LOW, site semantics, arXiv | keep |
| experiment_graph_phase122_123.py | cross_language | active | Syllabic SA, Munda/BMAC substrate hypothesis | keep |
| experiment_graph_phase124_125.py | structural_analysis | active | Fish-sign polysemy, Arthasastra mining | keep |
| experiment_graph_phase126.py | archaeological | active | ICIT corpus integration and sign inventory alignment | keep |
| experiment_graph_phase127.py | contact_zone | active | Gulf corpus analysis, Roif mining, fish-sign polysemy | keep |
| experiment_graph_phase128_133.py | sa_variant | active | SA refinement and anchor injection cycles | keep |
| experiment_graph_phase134_141.py | structural_analysis | active | Falsification suite, advancement testing, master scorecard | keep |
| experiment_graph_phase142_145.py | sa_variant | active | SA anchor injection and refinement cycles | keep |
| experiment_graph_phase146_155.py | sa_variant | active | SA parameter exploration and convergence testing | keep |
| experiment_graph_phase156_165.py | sa_variant | active | Advanced SA with refined anchors | keep |
| experiment_graph_phase166_168.py | lm_scoring | active | Sibilant validation, Meluhhan expansion, blocker SA | keep |
| experiment_graph_phase169_170.py | structural_analysis | active | Master synthesis, grammar variance — computational frontier | keep |
| experiment_graph_phase171_178.py | structural_analysis | active | Network centrality, betweenness stratification, network deep-dive | keep |
| experiment_graph_phase179_180.py | misc | active | Literature mine, Mesopotamian contact mine | keep |
| experiment_graph_phase181.py | archaeological | active | aDNA archaeogenetics mine | keep |
| experiment_graph_phase182.py | misc | active | Deep evidence mine | keep |
| experiment_graph_phase183.py | misc | active | Bulk mine 5000 (superseded by phase184) | keep |
| experiment_graph_phase184.py | misc | active | Bulk mine 5000 second run, fresh clusters | keep |
| experiment_graph_phase185_189.py | cross_language | active | Fish-sign battery, Elamo-Dravidian gap, commodity semantic, north Dravidian LM | keep |
| experiment_graph_phase190_192.py | sa_variant | active | Elamite anchor injection, grammar validation | keep |
| experiment_graph_phase193_195.py | sa_variant | active | SA rerun, SSRN fetch, grammar revalidation | keep |
| experiment_graph_phase196_201.py | sa_variant | active | Mine3, top-8, DEDR lookup, triple-LM, inscription reading | keep |
| experiment_graph_phase203_205.py | cross_language | active | E28 falsification, McAlpin cognates, Bayesian phylogenetics | keep |
| experiment_graph_phase206_208.py | sa_variant | active | Anchor injection M692/M861, SA rerun 404, mine5 | keep |
| experiment_graph_phase216_220.py | sa_variant | active | SA recalibration, site semantic, arXiv, Parpola/CISI | keep |
| experiment_graph_phase221_225.py | sa_variant | active | P324/P122 investigation, CISI injection, slot mismatch | keep |
| experiment_graph_phase226_228.py | sa_variant | active | P122 phonetic, P324 formula, CISI tripartite | keep |
| experiment_graph_phase229.py | sa_variant | active | CISI anchor SA test, M122 upgrade | keep |
| experiment_graph_phase230_234.py | contact_zone | active | Cross-ref matrix, indirect bilingual, cultural/demographic | keep |
| experiment_graph_phase235_236.py | cross_language | active | Elamite–PDr bridge, Sanskrit loanword mapping | keep |
| experiment_graph_phase237_246.py | sa_variant | active | Blocker mine, batch upgrades, synthesis, SA crossing | keep |
| experiment_graph_phase248_253.py | sa_variant | active | Ceiling-breaker mine, allograph, semantic constraint | keep |
| experiment_graph_phase257_294.py | sa_variant | active | SA reruns, Yajnadevam, DEDR resolution, 605/605 decipherment | keep |
| experiment_graph_phase295_297.py | misc | active | Bulk mine May 2026, cross-reference, gap analysis | keep |
| experiment_graph_phase298_308.py | cross_language | active | Munda mine/SA, substrate, archaeology, DEDR, Elamite baseline | keep |
| experiment_graph_phase322_362.py | sa_variant | active | May 2026 decipherment advancement session | keep |
| experiment_graph_contact_zone.py | contact_zone | active | KL divergence, synthesis, A/B comparison for contact signals | keep |
| experiment_graph_ab_language.py | cross_language | active | A/B language SA: Dravidian vs Sanskrit/Munda/Hebrew, LM consistency | keep |
| experiment_graph_cross_culture.py | cross_language | active | Cultural contact matrix, script family classifier | keep |

## Category Legend

- **structural_analysis** — Entropy, positional, spectral, network analysis of sign/symbol systems
- **lm_scoring** — Language model scoring, grammar tests, validation
- **sa_variant** — Simulated annealing decipherment runs and variants
- **ctt** — Constraint Topology Theory framework
- **contact_zone** — Mesopotamian contact corpus, bilingual analysis, seals
- **cross_language** — Cross-language comparison, substrate hypothesis, phylogenetics
- **archaeological** — OCR, corpus expansion, archaeogenetics, iconography
- **misc** — Literature mining, evidence management, infrastructure, legacy shims
11 changes: 11 additions & 0 deletions backend/glossa_lab/api/experiments.py
Original file line number Diff line number Diff line change
Expand Up @@ -38,6 +38,17 @@
router = APIRouter()


@router.get("/experiments/metadata")
async def get_experiments_metadata() -> list[dict[str, Any]]:
"""Return experiment ledger metadata for all registered nodes.

Merges the static experiment_ledger.json with live ATOMIC_NODES
registration data. Used by the frontend ExperimentRegistry component.
"""
from glossa_lab.experiment_graph import get_experiment_metadata # noqa: PLC0415
return get_experiment_metadata()


@router.get("/experiments")
async def list_experiments() -> list[dict[str, Any]]:
"""Return graph experiments only (H16 compliance).
Expand Down
88 changes: 88 additions & 0 deletions backend/glossa_lab/experiment_graph.py
Original file line number Diff line number Diff line change
Expand Up @@ -2834,6 +2834,39 @@ def _experiment_wrapper(inputs: dict, params: dict) -> dict:
except Exception as _p322390_exc: # noqa: BLE001
logger.warning("Phase-322-390 nodes not registered: %s", _p322390_exc)

# ── Contact-zone analysis templates (KL divergence, synthesis, A/B) ─────────
try:
from glossa_lab.experiment_graph_contact_zone import (
_contact_zone_node_defs as _cz_defs,
)
for _d in _cz_defs():
ATOMIC_NODES[_d.id] = _d
logger.info("Registered %d contact-zone template nodes", len(_cz_defs()))
except Exception as _cz_exc: # noqa: BLE001
logger.warning("Contact-zone template nodes not registered: %s", _cz_exc)

# ── A/B language comparison templates (anchored SA, consistency matrix) ──────
try:
from glossa_lab.experiment_graph_ab_language import (
_ab_language_node_defs as _ab_defs,
)
for _d in _ab_defs():
ATOMIC_NODES[_d.id] = _d
logger.info("Registered %d A/B language template nodes", len(_ab_defs()))
except Exception as _ab_exc: # noqa: BLE001
logger.warning("A/B language template nodes not registered: %s", _ab_exc)

# ── Cross-culture contact matrix + script family classifier ─────────────────
try:
from glossa_lab.experiment_graph_cross_culture import (
_cross_culture_node_defs as _cc_defs,
)
for _d in _cc_defs():
ATOMIC_NODES[_d.id] = _d
logger.info("Registered %d cross-culture template nodes", len(_cc_defs()))
except Exception as _cc_exc: # noqa: BLE001
logger.warning("Cross-culture template nodes not registered: %s", _cc_exc)

# ── Research Loop Runner (meta-node for Experiment Builder)
try:
from glossa_lab.pipelines.research_loop import ResearchLoop as _RL
Expand Down Expand Up @@ -2869,6 +2902,61 @@ def _research_loop_runner(inputs: dict, params: dict) -> dict:
logger.warning("ResearchLoopRunner not registered: %s", _rl_exc)


# ── Experiment metadata (ledger-backed) ──────────────────────────────────────

def get_experiment_metadata() -> list[dict[str, Any]]:
"""Return ledger metadata for all registered experiment nodes.

Merges the static experiment_ledger.json with live registration data
from ATOMIC_NODES to provide a unified metadata view.
"""
ledger_path = Path(__file__).parent / "experiment_ledger.json"
ledger: list[dict] = []
if ledger_path.exists():
try:
ledger = json.loads(ledger_path.read_text("utf-8"))
except Exception: # noqa: BLE001
pass

# Build a lookup from node ID to ledger entry
node_to_ledger: dict[str, dict] = {}
for entry in ledger:
for node_name in entry.get("key_nodes", []):
node_to_ledger[node_name] = entry

# Merge live registration with ledger
result: list[dict[str, Any]] = []
for node_id, node_def in ATOMIC_NODES.items():
ledger_entry = node_to_ledger.get(node_id, {})
result.append({
"id": node_id,
"display_name": node_def.name,
"category": ledger_entry.get("category", node_def.category),
"phase": ledger_entry.get("phases", ""),
"description": node_def.description,
"status": ledger_entry.get("status", "active"),
"superseded_by": ledger_entry.get("superseded_by"),
"source_file": ledger_entry.get("file", ""),
})

# Also include ledger-only entries (files without individually named nodes)
seen_files = {e.get("source_file") for e in result if e.get("source_file")}
for entry in ledger:
if entry.get("file") not in seen_files:
result.append({
"id": entry.get("file", "").replace(".py", ""),
"display_name": entry.get("file", "").replace("experiment_graph_", "").replace(".py", ""),
"category": entry.get("category", "misc"),
"phase": entry.get("phases", ""),
"description": entry.get("purpose", ""),
"status": entry.get("status", "active"),
"superseded_by": entry.get("superseded_by"),
"source_file": entry.get("file", ""),
})

return result


# ── Graph execution

def _topo_sort(nodes: list[dict], edges: list[dict]) -> list[dict]:
Expand Down
Loading