Skip to content

Commit d2eca2f

Browse files
committed
Refactor LaTeX syntax handling and improve parsing/serialization
- Updated `definitions-arithmetic.ts` to clarify handling of range and step-range notations. - Removed outdated TODO comments in `definitions-core.ts` regarding Leibniz and Euler notations. - Enhanced `definitions-other.ts` to support horizontal spacing commands (`\hspace`, `\hskip`, `\kern`) with appropriate parsing and serialization. - Improved `definitions-sets.ts` to implement serialization for the complement operator and added support for set-builder notation. - Introduced `serializeTabularBody` function in `definitions.ts` for better handling of matrix-like structures in LaTeX environments. - Updated `_Parser` class in `parse.ts` to handle horizontal spacing commands and symbol-to-Unicode mapping. - Modified `Serializer` class to prevent redundant wrapping of matchfix operators. - Added comprehensive tests for new features and edge cases in `matchfix.test.ts`, `sets.test.ts`, `style.test.ts`, and `symbols.test.ts`.
1 parent 06e8f8b commit d2eca2f

File tree

14 files changed

+1317
-40
lines changed

14 files changed

+1317
-40
lines changed

CHANGELOG.md

Lines changed: 42 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -2,13 +2,48 @@
22

33
#### Fixed
44

5-
- **LaTeX parsing: style, size, and color switch commands**
6-
`\displaystyle`, `\textstyle`, `\scriptstyle`, `\scriptscriptstyle`,
7-
`\tiny`..`\Huge` (10 size commands), and `\color{...}` were silently
8-
discarded during parsing. They now produce `Annotated` expressions that
9-
preserve the styling information and round-trip correctly through
10-
serialization. Added `\scriptstyle` / `\scriptscriptstyle` serialization
11-
support (previously only `\displaystyle` and `\textstyle` were handled).
5+
- **LaTeX parsing: style, size, and color switch commands**`\displaystyle`,
6+
`\textstyle`, `\scriptstyle`, `\scriptscriptstyle`, `\tiny`..`\Huge` (10 size
7+
commands), and `\color{...}` were silently discarded during parsing. They now
8+
produce `Annotated` expressions that preserve the styling information and
9+
round-trip correctly through serialization. Added `\scriptstyle` /
10+
`\scriptscriptstyle` serialization support (previously only `\displaystyle`
11+
and `\textstyle` were handled).
12+
13+
- **LaTeX parsing: set-builder notation**`\{x \in \R \mid x > 0\}` now parses
14+
to `["Set", expr, ["Condition", cond]]`. Registered `\mid` as an infix
15+
operator (`Divides`, precedence 160). The serializer round-trips set-builder
16+
notation correctly.
17+
18+
- **LaTeX serialization: `Complement`**`["Complement", "A"]` now serializes
19+
to `A^\complement` instead of falling back to the generic function form.
20+
Removed stale `@todo` comments about a non-existent multi-argument case.
21+
22+
- **LaTeX parsing: spacing commands**`\hspace{dim}`, `\hspace*{dim}`,
23+
`\hskip`, and `\kern` are now consumed during parsing (previously caused
24+
"unexpected token" errors). These are treated as visual spacing and skipped.
25+
26+
- **LaTeX serialization: `HorizontalSpacing` math classes** — the 2-argument
27+
form `["HorizontalSpacing", expr, "'bin'"]` now serializes to `\mathbin{expr}`
28+
(and similarly for `rel`, `op`, `ord`, `open`, `close`, `punct`, `inner`).
29+
Previously the second argument was silently dropped.
30+
31+
- **LaTeX serialization: redundant parens on matchfix operators**`wrap()` no
32+
longer adds parentheses around `Abs`, `Floor`, `Ceil`, `Norm`, and other
33+
matchfix expressions that already have visible delimiters.
34+
35+
- **LaTeX serialization: tabular environments** — default environment serializer
36+
now renders matrix bodies (List of Lists) with `&` column separators and `\\`
37+
row separators instead of nested function calls.
38+
39+
- **LaTeX serialization: matchfix delimiter scaling** — default matchfix
40+
serializer now respects `groupStyle` to choose between bare delimiters,
41+
`\left..\right`, or `\bigl..\bigr` scaling.
42+
43+
- **LaTeX parsing: Greek symbols in string groups**`\alpha`, `\beta`, etc. in
44+
`parseStringGroupContent()` (used by `\begin`/`\end`, color arguments) are now
45+
interpreted as their Unicode equivalents instead of passing through as raw
46+
LaTeX commands.
1247

1348
### 0.55.5 _2026-03-06_
1449

Lines changed: 183 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,183 @@
1+
# LaTeX Parser @todo Cleanup — Design
2+
3+
**Date**: 2026-03-07
4+
5+
## Goal
6+
7+
Address ~30 `@todo` comments across `src/compute-engine/latex-syntax/`, covering
8+
missing parse/serialize implementations (Category B) and presentation quality
9+
fixes (Category D).
10+
11+
## Work Units
12+
13+
### 1. Stale Derivative Comments (trivial)
14+
15+
The `@todo` items at `definitions-core.ts:1365-1366` reference missing Leibniz
16+
and Euler derivative parsing. Investigation shows these are **already
17+
implemented**:
18+
19+
- Leibniz ordinary: `definitions-arithmetic.ts:454-500` (outputs `D`)
20+
- Leibniz partial: `definitions-arithmetic.ts:420-451` (outputs
21+
`PartialDerivative`)
22+
- Euler `D_x f`: `definitions-core.ts:1478-1518`
23+
- Euler partial `\partial_x f`: `definitions-other.ts:127-157`
24+
- Newton `\dot{x}`: `definitions-core.ts:1428-1476`
25+
- `D` serializer: `definitions-core.ts:1381-1425`
26+
27+
**Action**: Remove the stale `@todo` comments.
28+
29+
### 2. Set Operations (medium)
30+
31+
#### 2a. Set Builder Parsing
32+
33+
**File**: `definitions-sets.ts:364`
34+
35+
The `{...}` matchfix parser only handles enumerated sets (`\{1,2,3\}`). Add
36+
detection of `\mid`, `|`, or `\colon` as a separator to produce set-builder
37+
notation.
38+
39+
**Parsing**: `\{x \in \R \mid x > 0\}`
40+
41+
```json
42+
["Set", ["Element", "x", "RealNumbers"], ["Condition", ["Greater", "x", 0]]]
43+
```
44+
45+
The serializer for this shape already exists at `definitions-sets.ts:571-581`.
46+
47+
**Implementation**: Inside the matchfix parse handler, after parsing the body,
48+
check if the body contains a `\mid`/`|`/`\colon` separator. If so, split into
49+
expression + condition and wrap in `["Set", expr, ["Condition", cond]]`.
50+
51+
The challenge: `|` is ambiguous (could be absolute value). Use `\mid` and
52+
`\colon` as unambiguous triggers. For bare `|`, only treat as separator when
53+
inside `\{...\}` matchfix context (which is already the case here).
54+
55+
#### 2b. `Multiple` — Defer
56+
57+
`Multiple` has no library definition (no entry in `sets.ts`). The latex-syntax
58+
entry has a name and an empty serialize stub. Since it's not a real operator in
59+
the engine, **defer** this until `Multiple` is defined in the library. Remove the
60+
empty serialize stub to avoid confusion.
61+
62+
#### 2c. Multi-arg `CartesianProduct` / `Complement` Serialization
63+
64+
**File**: `definitions-sets.ts:221,228`
65+
66+
Currently these only handle the 2-arg infix case. Extend:
67+
68+
- `CartesianProduct(A, B, C)``A \times B \times C`
69+
- `Complement(A, B)` — already works as postfix `A^\complement`; the multi-arg
70+
comment may be stale. Verify and update/remove.
71+
72+
### 3. BigOp Step Ranges — Update Comment (trivial)
73+
74+
**File**: `definitions-arithmetic.ts:1712`
75+
76+
The `Element` form (`i \in S`) is already handled at line 1720-1725. The
77+
step-range gap (`i=1..3..10`) is intentionally deferred — uncommon LaTeX
78+
notation. **Action**: Update the comment to reflect current state.
79+
80+
### 4. Spacing Commands (small)
81+
82+
#### 4a. Parse `\hspace`, `\hskip`, `\kern`
83+
84+
**File**: `parse.ts:689`
85+
86+
These take dimension arguments. Parse into
87+
`["HorizontalSpacing", "'<dimension>'"]` with the dimension as a string
88+
preserving unit.
89+
90+
- `\hspace{1em}`, `\hspace*{1em}` — group argument
91+
- `\hskip 5pt`, `\kern-3mu` — inline glue (parse number + unit, ignore
92+
plus/minus stretch)
93+
94+
Register as expression triggers in `definitions-other.ts`. The parse handler
95+
reads the dimension and returns `["HorizontalSpacing", "'<value><unit>'"]`.
96+
97+
#### 4b. Serialize `HorizontalSpacing` with Math Spacing Classes
98+
99+
**File**: `definitions-other.ts:544`
100+
101+
The 2-arg form `["HorizontalSpacing", expr, "'bin'"]` should serialize as:
102+
103+
- `"bin"``\mathbin{expr}`
104+
- `"op"``\mathop{expr}`
105+
- `"rel"``\mathrel{expr}`
106+
- `"ord"``\mathord{expr}`
107+
- `"open"``\mathopen{expr}`
108+
- `"close"``\mathclose{expr}`
109+
- `"punct"``\mathpunct{expr}`
110+
111+
Currently the second argument is silently dropped.
112+
113+
### 5. Serializer Quality (medium)
114+
115+
#### 5a. Skip Redundant Parens on Matchfix Operators
116+
117+
**File**: `serializer.ts:90`
118+
119+
`wrap()` adds parentheses around low-precedence expressions. But matchfix
120+
operators (`Abs`, `Floor`, `Ceil`, `Delimiter`) already have visible delimiters.
121+
Adding parens produces `\left(|x|\right)`.
122+
123+
**Fix**: In `wrap()`, check if the expression is a matchfix operator with visible
124+
delimiters. If so, skip the wrapping. Identify matchfix by operator name:
125+
`Abs`, `Floor`, `Ceil`, `Norm`, and any `Delimiter` expression.
126+
127+
#### 5b. `serializeTabular()` for Environments
128+
129+
**File**: `definitions.ts:519`
130+
131+
Environment entries use a generic serializer. When the body is a Matrix (List of
132+
Lists), serialize as tabular: `&` between columns, `\\` between rows.
133+
134+
**Implementation**: Add a `serializeTabular()` helper that takes a matrix
135+
expression and produces `row1col1 & row1col2 \\ row2col1 & row2col2`. Wire it
136+
into the environment default serializer when the body matches a matrix shape.
137+
138+
#### 5c. `groupStyle` for `\left..\right` in Matchfix
139+
140+
**File**: `definitions.ts:531`
141+
142+
Matchfix serialization currently emits raw delimiter strings. It should call
143+
`serializer.groupStyle(expr)` to choose between:
144+
145+
- `"none"` → bare delimiters `(`, `)`
146+
- `"auto"``\left(`, `\right)`
147+
- `"big"``\bigl(`, `\bigr)`
148+
- etc.
149+
150+
### 6. String Group Symbols (small)
151+
152+
**File**: `parse.ts:1143`
153+
154+
In `parseStringGroup()`, when encountering a `\`-prefixed token, check if it
155+
maps to a known Unicode symbol (Greek letters, common math symbols). Substitute
156+
the Unicode character instead of passing through the raw LaTeX command.
157+
158+
Example: `\operatorname{\alpha-test}` → the string `"α-test"` instead of
159+
`"\\alpha-test"`.
160+
161+
**Implementation**: Use the existing symbol dictionary to look up the mapping.
162+
Only substitute for symbols that have a single Unicode character representation
163+
(Greek letters, `\infty`, etc.). Leave unknown commands as-is.
164+
165+
## Out of Scope
166+
167+
- `Multiple` operator (no library definition)
168+
- Step ranges in BigOp indexing (uncommon notation)
169+
- Percent notation (`types.ts:618,693`)
170+
- Domain checks for `Abs`/`Norm` (`definitions-arithmetic.ts:877,1470,1479`)
171+
- Precedence corrections vs MathML (`definitions-other.ts:54,60,110`)
172+
173+
## Testing Strategy
174+
175+
Each work unit gets its own test block in the appropriate test file:
176+
177+
- Set builder: `test/compute-engine/latex-syntax/sets.test.ts`
178+
- Spacing: `test/compute-engine/latex-syntax/style.test.ts`
179+
- Serializer quality: new tests alongside existing serialize tests
180+
- String groups: `test/compute-engine/latex-syntax/stefnotch.test.ts` or a new
181+
`string-groups.test.ts`
182+
183+
Round-trip tests (parse → serialize → parse) for all new parse/serialize pairs.

0 commit comments

Comments
 (0)