|
| 1 | +# Fungrim Corpus for the Compute Engine |
| 2 | + |
| 3 | +A machine-translated snapshot of the [Fungrim](https://fungrim.org) (Mathematical |
| 4 | +Functions Grimoire) formula collection, expressed as MathJSON, annotated for |
| 5 | +Compute Engine consumption. This is the Phase-0 deliverable of the Fungrim |
| 6 | +integration plan (`FUNGRIM-PLAN-1-TRANSLATOR.md` at the repository root). |
| 7 | + |
| 8 | +## Provenance |
| 9 | + |
| 10 | +- **Upstream:** Fungrim by Fredrik Johansson, MIT licensed (see `LICENSE` — |
| 11 | + the upstream notice is reproduced verbatim; this data set is a *translated |
| 12 | + derivative*, not a copy, of the upstream Python sources). |
| 13 | +- **Snapshot pin:** the upstream is an unversioned local snapshot at |
| 14 | + `fungrim-master/`. `MANIFEST.json` records a content hash as the pin: |
| 15 | + SHA-256 over the concatenation of all `pygrim/**/*.py` files sorted by path |
| 16 | + (`find pygrim -name '*.py' -type f | LC_ALL=C sort | xargs cat | shasum -a 256`). |
| 17 | + If the fork is ever pushed to a git host, replace the hash with the commit id. |
| 18 | +- **Translator:** `grim2mathjson` (lives in the fungrim fork as a sibling |
| 19 | + package to `pygrim`). Every emitted file embeds a |
| 20 | + `"generator": "grim2mathjson <version>"` field. |
| 21 | +- **Determinism:** the translator emits stable key order, entries sorted by id |
| 22 | + within topic, and `\n` line endings — regeneration produces byte-identical |
| 23 | + files when nothing changed, so the diff is the review artifact. |
| 24 | + |
| 25 | +## Files |
| 26 | + |
| 27 | +| File | Contents | |
| 28 | +|---|---| |
| 29 | +| `MANIFEST.json` | schema version, generation date, upstream pin, counts | |
| 30 | +| `corpus/<topic>.json` | 57 per-topic files, 2,551 annotated formula entries | |
| 31 | +| `declarations.json` | 152 symbol-shell declarations for heads CE does not define | |
| 32 | +| `properties.json` | 131 analytic-property records (Poles, Zeros, BranchCuts, …) | |
| 33 | +| `skipped.json` | full skip ledger: 448 records with machine-readable reason codes | |
| 34 | +| `LICENSE` | upstream MIT license | |
| 35 | + |
| 36 | +## Entry schema (`corpus/<topic>.json`) |
| 37 | + |
| 38 | +Each topic file is `{ "topic", "title", "source", "generator", "entries": [...] }`. |
| 39 | +Each entry: |
| 40 | + |
| 41 | +```json |
| 42 | +{ |
| 43 | + "id": "d4b0b6", // Fungrim entry id (stable, 6 hex chars) |
| 44 | + "formula": ["Equal", ...], // non-canonical MathJSON (source form) |
| 45 | + "variables": ["z"], // free variables, from Expr.free_variables() |
| 46 | + "assumptions": ["Element", ...], // MathJSON; null when the entry has none |
| 47 | + "assumptionAlternatives": [...], // OPTIONAL (27 entries): alternative assumption sets, see below |
| 48 | + "class": "identity", // taxonomy, see below |
| 49 | + "subclass": null, // for class=representation: series|integral|product|limit |
| 50 | + "heads": ["Sin", "Arctan"], // named function heads (post-mapping), the rule-dispatch index |
| 51 | + "guardLevel": "complex-domain", // assumption-discharge difficulty, see below |
| 52 | + "flavor": null, // typed limit/derivative variant: real|complex|left|right|sequence|meromorphic |
| 53 | + "references": null, // upstream literature references, when present |
| 54 | + "topics": ["atan"], // all topics referencing the entry (first = defining module) |
| 55 | + "directedInfinity": true, // OPTIONAL (26 entries): formula contains c·∞; CE evaluates |
| 56 | + // Multiply(c, PositiveInfinity) to NaN — exclude from numeric checks |
| 57 | + "indexedFamilies": ["a_"] // OPTIONAL (5 entries): trailing-underscore family heads used |
| 58 | + // in call form ["a_", k] to preserve binder scoping |
| 59 | +} |
| 60 | +``` |
| 61 | + |
| 62 | +### `assumptionAlternatives` |
| 63 | + |
| 64 | +Upstream `Assumptions(expr, alt_expr, ...)` with multiple args states |
| 65 | +**alternative** assumption sets — the formula holds under each set |
| 66 | +independently (pygrim renders args past the first as "Alternative |
| 67 | +assumptions"). They are NOT conjoined: 16 of the 27 affected entries would |
| 68 | +otherwise produce genuinely contradictory conjunctions (e.g. `sqrt/0d8e03`: |
| 69 | +`b ∈ (0,∞)` AND `b ∈ ℂ∖(−∞,0]`). The first set is the entry's primary |
| 70 | +`assumptions`; the remaining sets are emitted under the optional |
| 71 | +`assumptionAlternatives` array (same translation pipeline per alternative). |
| 72 | +`guardLevel` is computed from the primary set only. |
| 73 | + |
| 74 | +### `class` taxonomy |
| 75 | + |
| 76 | +- `specific-value` — `Equal` with zero free variables (e.g. ζ(2) = π²/6) |
| 77 | +- `identity` — `Equal` between closed forms with variables |
| 78 | +- `representation` — `Equal` whose side is a `Sum`/`Product`/`Integrate`/`Limit` |
| 79 | + (sub-tagged via `subclass`) |
| 80 | +- `inequality` — top head `Less`/`LessEqual`/`Greater`/`GreaterEqual`/`AsymptoticTo`-like |
| 81 | +- `logical` — top head `Implies`/`Equivalent`/`Element`/quantifier shapes |
| 82 | +- `other` — residual (2 entries) |
| 83 | + |
| 84 | +### `guardLevel` semantics |
| 85 | + |
| 86 | +Difficulty of discharging the entry's assumptions, the max over the flattened |
| 87 | +`And` conjuncts (`none` < `real-simple` < `complex-domain` < `undischargeable`): |
| 88 | + |
| 89 | +- `none` — no assumptions |
| 90 | +- `real-simple` — memberships in integer/rational/real sets or intervals over |
| 91 | + bare variables, simple real inequalities, parity, divisibility |
| 92 | +- `complex-domain` — membership in ℂ/ℍ, predicates over `Re/Im/Abs/Arg`, |
| 93 | + complex set algebra (`SetMinus`, `Union`, …), and plain-symbol membership |
| 94 | + in inert shell sets (`DirichletGroup(q)`, `SL2Z`, lattices, symbolic sets, |
| 95 | + …) — dischargeable via CE stored-membership facts (Track 3; verified by |
| 96 | + the `scripts/fungrim/guard-census.ts` measurement) |
| 97 | +- `undischargeable` — quantifiers, `Where` in assumptions, holomorphy |
| 98 | + predicates, Riemann-hypothesis atoms, indexed-family memberships, and |
| 99 | + structural objects in *term* position (`Matrix(...) ∈ SL2Z`, lattice |
| 100 | + bounds inside `Infimum`, `DirichletCharacter` comparisons, …) |
| 101 | + |
| 102 | +## `declarations.json` |
| 103 | + |
| 104 | +`{ "generator", "declarations": { <name>: {...} }, "existing": { <name>: {...} } }`. |
| 105 | +Each declaration record carries `fungrimId`, `description`, `arity` |
| 106 | +(number or `[min, max]`), `signature` (CE type syntax, feeds `ce.declare()` |
| 107 | +directly), `signatureSource`/`signatureInferred`, `domainTable`, `wikidata`. |
| 108 | +The `existing` section audits heads CE already defines (not declared by the |
| 109 | +harness), including the **LambertW note**: CE's `LambertW` is 1-arg (principal |
| 110 | +branch) while the corpus emits 2-arg `["LambertW", z, k]` for non-principal |
| 111 | +branches — consumers must re-declare `LambertW: (complex, integer?) -> complex` |
| 112 | +in a child scope before boxing (the validation harness does this). |
| 113 | + |
| 114 | +## `skipped.json` reason codes |
| 115 | + |
| 116 | +Translation is total: every upstream entry lands in the corpus, in |
| 117 | +`properties.json`, or here. Codes: |
| 118 | + |
| 119 | +| code | meaning | |
| 120 | +|---|---| |
| 121 | +| `symbol-definition` | entry defines a symbol (no `Formula`) — feeds `declarations.json` | |
| 122 | +| `tuple-indexing-set` | `ForElement(Tuple(...), S)` index — no CE encoding | |
| 123 | +| `where-def-tuple` | `Where` with `Def(Tuple(...), <non-literal>)` destructuring | |
| 124 | +| `where-function-def` | `Where` local function def that could not be beta-reduced | |
| 125 | +| `where-recursive-def` | `Where` with a self-referential function def | |
| 126 | +| `formal-indeterminate` | formal power-series indeterminates (`XX`, `SerX`, …) | |
| 127 | +| `generator-list` | list/tuple-generator syntax with no MathJSON encoding | |
| 128 | +| `matrix-generator` | matrix built from a generator expression | |
| 129 | +| `repeat-splice` / `step-splice` | `Repeat`/`Step` argument splices (Carlson topics) | |
| 130 | +| `ellipsis` | literal `Ellipsis`/`EqualQSeriesEllipsis` | |
| 131 | +| `path-integral` | integral over a path object | |
| 132 | + |
| 133 | +## Validation |
| 134 | + |
| 135 | +The CE-side harness lives at `scripts/fungrim/` (repository root): |
| 136 | + |
| 137 | +```sh |
| 138 | +# Stage 1 (always): shell-declare, type-declare variables, box every entry |
| 139 | +npx tsx scripts/fungrim/validate.ts --corpus data/fungrim |
| 140 | + |
| 141 | +# Stage 2: seeded numeric spot checks on the none/real-simple guard slice |
| 142 | +npx tsx scripts/fungrim/validate.ts --corpus data/fungrim --numeric --seed 42 |
| 143 | +``` |
| 144 | + |
| 145 | +Reports are written to `scripts/fungrim/validation-report.json` (and |
| 146 | +`numeric-failures.json` for Stage-2 `False` instances, which are triage input, |
| 147 | +not build failures). |
| 148 | + |
| 149 | +## Regeneration workflow |
| 150 | + |
| 151 | +From a clean state: |
| 152 | + |
| 153 | +```sh |
| 154 | +# 1. Re-run the translator over the pinned fungrim snapshot |
| 155 | +cd /Users/arno/dev/fungrim-master |
| 156 | +python3 -m grim2mathjson --strict --out grim2mathjson/out |
| 157 | + |
| 158 | +# 2. Copy the artifacts into this directory |
| 159 | +cd /Users/arno/dev/compute-engine |
| 160 | +cp -R /Users/arno/dev/fungrim-master/grim2mathjson/out/corpus data/fungrim/corpus |
| 161 | +cp /Users/arno/dev/fungrim-master/grim2mathjson/out/{declarations,properties,skipped}.json data/fungrim/ |
| 162 | + |
| 163 | +# 3. Recompute the upstream pin and update MANIFEST.json if pygrim changed |
| 164 | +(cd /Users/arno/dev/fungrim-master && find pygrim -name '*.py' -type f | LC_ALL=C sort | xargs cat | shasum -a 256) |
| 165 | + |
| 166 | +# 4. Validate |
| 167 | +npx tsx scripts/fungrim/validate.ts --corpus data/fungrim |
| 168 | +``` |
| 169 | + |
| 170 | +The output is deterministic; review the git diff of `data/fungrim/` as the |
| 171 | +change artifact. This directory is intentionally **not** part of the npm |
| 172 | +package (`package.json` `files` is `["/dist"]`); revisit as |
| 173 | +`@cortex-js/fungrim-data` when Phase 1 ships runtime loading. |
0 commit comments