Skip to content

Commit cf16b10

Browse files
committed
feat: support unicode superscript and subscript in latex parsing (strict and non-strict)
1 parent de875a3 commit cf16b10

2 files changed

Lines changed: 57 additions & 14 deletions

File tree

CHANGELOG.md

Lines changed: 11 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -1,20 +1,17 @@
11
### [Unreleased]
22

3-
- **Stochastic equality check for expressions with unknowns**: `expr.isEqual()`
4-
now uses a stochastic fallback when symbolic methods (expand + simplify) can't
5-
prove equality. Both expressions are evaluated at 50 sample points (9
6-
well-known values + 41 random) and compared with relative+absolute tolerance,
7-
including both real and imaginary parts. This detects equivalences like
8-
`sin²(x) + cos²(x) = 1`, `(x+y)² = x²+2xy+y²`, and `sin(2x) = 2sin(x)cos(x)`
9-
that were previously returned as `undefined`. Singularities (NaN at a sample
10-
point) are skipped rather than treated as disagreements. The check also works
11-
when the two expressions have different unknowns (e.g. `x - x + y` vs `y`).
12-
133
- **Non-strict parser supports exponents on bare functions**: In non-strict mode
14-
(`strict: false`), bare function names like `sin`, `cos`, `tan` can now include
15-
an exponent before the argument list. For example, `sin^2(x)` and `cos^{10}(x)`
16-
are now correctly parsed as `["Power", ["Sin", "x"], 2]`, matching the behavior
17-
of their LaTeX counterparts `\sin^2(x)` and `\cos^{10}(x)`.
4+
(`strict: false`), bare function names like `sin`, `cos`, `tan` can now
5+
include an exponent before the argument list. For example, `sin^2(x)` and
6+
`cos^{10}(x)` are now correctly parsed as `["Power", ["Sin", "x"], 2]`,
7+
matching the behavior of their LaTeX counterparts `\sin^2(x)` and
8+
`\cos^{10}(x)`.
9+
10+
- **Unicode superscript and subscript digit support**: The LaTeX parser now
11+
recognizes Unicode superscript digits (`⁰¹²³⁴⁵⁶⁷⁸⁹⁻`) and subscript digits
12+
(`₀₁₂₃₄₅₆₇₈₉₋`), converting them to `^{...}` and `_{...}` respectively.
13+
This works in all parsing modes. For example, `` parses as `x^{2}`,
14+
`sin²(x)` as `\sin^{2}(x)`, `x⁻²` as `x^{-2}`, and `x₁₂` as `x_{12}`.
1815

1916
- **`.is()` now works with assigned variables**: Previously, `.is()` only
2017
evaluated expressions made entirely of declared constants (like `Pi`). Now it

src/compute-engine/latex-syntax/tokenizer.ts

Lines changed: 46 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,34 @@
77

88
import { splitGraphemes } from '../../common/grapheme-splitter';
99

10+
const UNICODE_SUPERSCRIPT_MAP: Record<string, string> = {
11+
'\u2070': '0', // ⁰
12+
'\u00B9': '1', // ¹
13+
'\u00B2': '2', // ²
14+
'\u00B3': '3', // ³
15+
'\u2074': '4', // ⁴
16+
'\u2075': '5', // ⁵
17+
'\u2076': '6', // ⁶
18+
'\u2077': '7', // ⁷
19+
'\u2078': '8', // ⁸
20+
'\u2079': '9', // ⁹
21+
'\u207B': '-', // ⁻
22+
};
23+
24+
const UNICODE_SUBSCRIPT_MAP: Record<string, string> = {
25+
'\u2080': '0', // ₀
26+
'\u2081': '1', // ₁
27+
'\u2082': '2', // ₂
28+
'\u2083': '3', // ₃
29+
'\u2084': '4', // ₄
30+
'\u2085': '5', // ₅
31+
'\u2086': '6', // ₆
32+
'\u2087': '7', // ₇
33+
'\u2088': '8', // ₈
34+
'\u2089': '9', // ₉
35+
'\u208B': '-', // ₋
36+
};
37+
1038
// The 'special' tokens must be of length > 1 to distinguish
1139
// them from literals.
1240
// '<space>': whitespace
@@ -43,6 +71,24 @@ class Tokenizer {
4371
// Replace the Unicode minus sign (U+2212: MINUS SIGN) with a hyphen
4472
s = s.replace(/\u2212/g, '-');
4573

74+
// Replace Unicode superscript sequences with ^{...}
75+
// Handles: ⁰¹²³⁴⁵⁶⁷⁸⁹ and ⁻ (superscript minus)
76+
s = s.replace(/[¹²³]+/g, (m) => {
77+
const digits = Array.from(m)
78+
.map((c) => UNICODE_SUPERSCRIPT_MAP[c])
79+
.join('');
80+
return `^{${digits}}`;
81+
});
82+
83+
// Replace Unicode subscript sequences with _{...}
84+
// Handles: ₀₁₂₃₄₅₆₇₈₉ and ₋ (subscript minus)
85+
s = s.replace(/[]+/g, (m) => {
86+
const digits = Array.from(m)
87+
.map((c) => UNICODE_SUBSCRIPT_MAP[c])
88+
.join('');
89+
return `_{${digits}}`;
90+
});
91+
4692
this.s = splitGraphemes(s);
4793
this.pos = 0;
4894
}

0 commit comments

Comments
 (0)