Skip to content

Commit c07a38e

Browse files
committed
arch: support non-strict LaTeX parsing (i.e. ASCIIMath/Typst-like)
1 parent 5662a87 commit c07a38e

7 files changed

Lines changed: 395 additions & 6 deletions

File tree

requirements/OEIS.md

Lines changed: 87 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -3,12 +3,18 @@
33
[!] Need to think more about MathJSON API. What does "ClosedForm" return? Is it
44
a function? Or is there such a think as a "Series" type?
55

6+
**✅ UPDATE (2026)**: The LaTeX parser now has a `strict: false` option that accepts Math-ASCII/Typst-like syntax (e.g., `sin(x)`, `x^(n+1)`, `a_(k+m)`). This **may eliminate the need for a separate ASCII Math parser** for OEIS formulas. See [LaTeX Parser Non-Strict Mode](#latex-parser-non-strict-mode) section for details.
7+
68
## Summary
79

810
Add the ability to parse OEIS formula notation and reconstruct usable sequence
9-
definitions. This is achieved by creating an **ASCII Math parser** (inverse of
11+
definitions. This can be achieved through two approaches:
12+
13+
**Option A (Original Plan)**: Create an **ASCII Math parser** (inverse of
1014
the existing `toAsciiMath()` serializer), with OEIS support as a thin wrapper.
1115

16+
**Option B (Simplified)**: Leverage the LaTeX parser's `strict: false` mode which already handles most ASCII Math/OEIS notation, reducing implementation complexity.
17+
1218
## Approach
1319

1420
**Strategy**: Build an ASCII Math parser, use it for OEIS formulas
@@ -44,10 +50,49 @@ ASCII Math string. We create the inverse: ASCII Math string → BoxedExpression.
4450

4551
The main transformation needed for OEIS: `a(n-1)``a_(n-1)`
4652

53+
## LaTeX Parser Non-Strict Mode
54+
55+
**UPDATE**: The LaTeX parser now supports a `strict: false` option that accepts
56+
Math-ASCII/Typst-like syntax, which overlaps significantly with OEIS notation:
57+
58+
```typescript
59+
ce.parse('sin(x)^(n+1)', { strict: false })
60+
// Accepts:
61+
// - Parentheses for superscripts/subscripts: x^(n+1), a_(k+m)
62+
// - Bare function names: sin(x), cos(x), log(x), sqrt(x), etc.
63+
// - Division with slash: (n+1)/b
64+
65+
// Supported bare functions:
66+
// Trig: sin, cos, tan, cot, sec, csc
67+
// Hyperbolic: sinh, cosh, tanh, coth, sech, csch
68+
// Inverse: arcsin, arccos, arctan, asin, acos, atan
69+
// Logarithmic: log, ln, exp, lg, lb
70+
// Other: sqrt, abs, floor, ceil, round, max, min, gcd, lcm
71+
```
72+
73+
This means we can potentially **leverage the LaTeX parser in non-strict mode**
74+
for OEIS formula parsing instead of building a separate ASCII Math parser,
75+
reducing implementation complexity.
76+
77+
**Benefits:**
78+
79+
- Reuses existing, well-tested parser infrastructure
80+
- Handles operator precedence, function calls, parentheses automatically
81+
- Already integrated with MathJSON conversion
82+
- Reduces maintenance burden (one parser instead of two)
83+
84+
**Considerations:**
85+
86+
- Still need OEIS-specific pre-processing for `a(n)``a_n` notation
87+
- May need minor extensions for OEIS-specific patterns (e.g., `Sum_{k=0..n}`)
88+
- Non-strict mode is permissive but doesn't validate against ASCII Math spec
89+
4790
## New API
4891

4992
### ASCII Math Parser (general use)
5093

94+
**Option A: Dedicated ASCII Math parser**
95+
5196
```typescript
5297
// New method on ComputeEngine
5398
ce.parseAsciiMath('sqrt(x^2 + 1)') // → BoxedExpression
@@ -56,6 +101,19 @@ ce.parseAsciiMath('sqrt(x^2 + 1)') // → BoxedExpression
56101
expr.toAsciiMath() → stringce.parseAsciiMath() → same expr
57102
```
58103

104+
**Option B: Use LaTeX parser in non-strict mode** _(recommended for simplicity)_
105+
106+
```typescript
107+
// Use existing LaTeX parser with strict: false
108+
ce.parse('sqrt(x^2 + 1)', { strict: false }) // → BoxedExpression
109+
110+
// Advantages:
111+
// - Reuses existing parser infrastructure
112+
// - No need for new parser implementation
113+
// - Handles most OEIS notation already
114+
// - Maintains single source of truth for parsing logic
115+
```
116+
59117
### MathJSON Functions for Sequence Analysis
60118

61119
These functions take sequence terms and evaluate to formulas by looking up OEIS:
@@ -112,6 +170,8 @@ interface ParsedOEISFormula {
112170

113171
## Files to Modify
114172

173+
**If using dedicated ASCII Math parser (Option A):**
174+
115175
| File | Change |
116176
| ---------------------------------------------------------- | ------------------------------------------------------------------------------------ |
117177
| `src/compute-engine/boxed-expression/ascii-math-parser.ts` | **NEW** - ASCII Math parser |
@@ -123,11 +183,23 @@ interface ParsedOEISFormula {
123183
| `test/compute-engine/ascii-math-parser.test.ts` | **NEW** - Parser tests + round-trip |
124184
| `test/compute-engine/oeis.test.ts` | Add formula parsing tests |
125185

186+
**If using LaTeX parser with strict: false (Option B - recommended):**
187+
188+
| File | Change |
189+
| ----------------------------------------------- | ------------------------------------------------------------------------------------ |
190+
| ~~`ascii-math-parser.ts`~~ | **NOT NEEDED** - Use LaTeX parser with `strict: false` |
191+
| `src/compute-engine/latex-syntax/types.ts` |**DONE** - Added `strict?: boolean` to `ParseLatexOptions` |
192+
| `src/compute-engine/oeis.ts` | Add OEIS pre-processing (`a(n)``a_n`), `declareOEISSequence()` |
193+
| `src/compute-engine/library/sequences.ts` | **NEW** or extend - Add ClosedForm, Recurrence, GeneratingFunction, OEISId functions |
194+
| `src/compute-engine/index.ts` | Expose `declareOEISSequence()` |
195+
| `src/compute-engine/global-types.ts` | Add new type definitions |
196+
| `test/compute-engine/oeis.test.ts` | Add formula parsing tests using `parse(formula, {strict: false})` |
197+
126198
## Implementation Phases
127199

128200
### Phase 1: ASCII Math Parser Core
129201

130-
Create `ascii-math-parser.ts`:
202+
**Option A: Build dedicated ASCII Math parser** Create `ascii-math-parser.ts`:
131203

132204
- Tokenizer for ASCII Math notation
133205
- Precedence-climbing parser (similar to LaTeX parser but simpler)
@@ -136,12 +208,25 @@ Create `ascii-math-parser.ts`:
136208
- Handle symbols: `pi`, `phi`, Greek letters, single-letter variables
137209
- Handle subscripts: `a_n`, `a_(n-1)`
138210

211+
**Option B: Use LaTeX parser with `strict: false`** _(✅ IMPLEMENTED)_
212+
213+
- **Already handles**: `sqrt(x)`, `sin(x)`, `x^(n+1)`, `a_(k+m)`, `(n+1)/b`
214+
- **Still needed**: OEIS-specific pre-processing for `a(n)``a_n` notation
215+
- **Benefit**: No new parser needed, leverages existing tested infrastructure
216+
139217
### Phase 2: Round-trip Testing
140218

219+
**If using dedicated parser:**
220+
141221
- Test: `toAsciiMath(parseAsciiMath(str)) ≈ str`
142222
- Test: `parseAsciiMath(toAsciiMath(expr)).isSame(expr)`
143223
- Use existing serializer output as test cases
144224

225+
**If using LaTeX parser with strict: false:**
226+
227+
- Test: `parse(str, {strict: false}).toAsciiMath() ≈ str`
228+
- Verify OEIS notation compatibility with non-strict mode
229+
145230
### Phase 3: OEIS Pre-processing
146231

147232
Add to `oeis.ts`:

src/api.md

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -5368,6 +5368,7 @@ dictionary entries to define custom LaTeX parsing and serialization.
53685368
53695369
```ts
53705370
type ParseLatexOptions = NumberFormat & {
5371+
strict: boolean;
53715372
skipSpace: boolean;
53725373
parseNumbers: "auto" | "rational" | "decimal" | "never";
53735374
getSymbolType: (symbol) => BoxedType;
@@ -5381,6 +5382,20 @@ type ParseLatexOptions = NumberFormat & {
53815382
53825383
The LaTeX parsing options can be used with the `ce.parse()` method.
53835384
5385+
#### ParseLatexOptions.strict
5386+
5387+
```ts
5388+
strict: boolean;
5389+
```
5390+
5391+
Controls the strictness of LaTeX parsing:
5392+
5393+
- `true`: Strict LaTeX syntax required (e.g., `\sin{x}`, `x^{n+1}`)
5394+
- `false`: Accept relaxed Math-ASCII/Typst-like syntax in addition to
5395+
LaTeX (e.g., `sin(x)`, `x^(n+1)`)
5396+
5397+
**Default**: `true`
5398+
53845399
#### ParseLatexOptions.skipSpace
53855400
53865401
```ts

src/compute-engine/index.ts

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2178,6 +2178,7 @@ export class ComputeEngine implements IComputeEngine {
21782178

21792179
repeatingDecimal: 'auto', // auto will accept any notation
21802180

2181+
strict: true,
21812182
skipSpace: true,
21822183
parseNumbers: 'auto',
21832184
getSymbolType: (id) => {

src/compute-engine/latex-syntax/dictionary/definitions-core.ts

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -623,7 +623,10 @@ export const DEFINITIONS_CORE: LatexDictionary = [
623623
// return null (or interpret as a symbol).
624624

625625
// Parse either a group or a single symbol
626-
const rhs = parser.parseGroup() ?? parser.parseToken();
626+
let rhs = parser.parseGroup() ?? parser.parseToken();
627+
// In non-strict mode, also accept parenthesized expressions
628+
if (rhs === null && parser.options.strict === false && parser.peek === '(')
629+
rhs = parser.parseEnclosure();
627630
return ['Subscript', lhs, rhs];
628631
},
629632
} as PostfixEntry,

src/compute-engine/latex-syntax/parse.ts

Lines changed: 128 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1757,6 +1757,112 @@ export class _Parser implements Parser {
17571757
return null;
17581758
}
17591759

1760+
/**
1761+
* In non-strict mode, try to parse a bare function name followed by parentheses.
1762+
* This allows syntax like `sin(x)` instead of requiring `\sin(x)`.
1763+
*
1764+
* Returns the parsed function call or null if not a bare function.
1765+
*/
1766+
private tryParseBareFunction(until?: Readonly<Terminator>): Expression | null {
1767+
if (this.options.strict !== false) return null;
1768+
1769+
const start = this.index;
1770+
1771+
// Collect consecutive letter tokens to form a potential function name
1772+
let name = '';
1773+
while (!this.atEnd && /^[a-zA-Z]$/.test(this.peek)) {
1774+
name += this.peek;
1775+
this.index++;
1776+
}
1777+
1778+
if (!name) {
1779+
this.index = start;
1780+
return null;
1781+
}
1782+
1783+
this.skipSpace();
1784+
1785+
// Check if followed by opening parenthesis
1786+
if (this.peek !== '(') {
1787+
this.index = start;
1788+
return null;
1789+
}
1790+
1791+
// Map of common function names to their LaTeX equivalents
1792+
const BARE_FUNCTION_MAP: Record<string, string> = {
1793+
// Trigonometric
1794+
sin: 'Sin',
1795+
cos: 'Cos',
1796+
tan: 'Tan',
1797+
cot: 'Cot',
1798+
sec: 'Sec',
1799+
csc: 'Csc',
1800+
// Hyperbolic
1801+
sinh: 'Sinh',
1802+
cosh: 'Cosh',
1803+
tanh: 'Tanh',
1804+
coth: 'Coth',
1805+
sech: 'Sech',
1806+
csch: 'Csch',
1807+
// Inverse trigonometric
1808+
arcsin: 'Arcsin',
1809+
arccos: 'Arccos',
1810+
arctan: 'Arctan',
1811+
arccot: 'Arccot',
1812+
arcsec: 'Arcsec',
1813+
arccsc: 'Arccsc',
1814+
asin: 'Arcsin',
1815+
acos: 'Arccos',
1816+
atan: 'Arctan',
1817+
// Inverse hyperbolic
1818+
arcsinh: 'Arsinh',
1819+
arccosh: 'Arcosh',
1820+
arctanh: 'Artanh',
1821+
arccoth: 'Arcoth',
1822+
arcsech: 'Arsech',
1823+
arccsch: 'Arcsch',
1824+
asinh: 'Arsinh',
1825+
acosh: 'Arcosh',
1826+
atanh: 'Artanh',
1827+
// Logarithms and exponentials
1828+
log: 'Log',
1829+
ln: 'Ln',
1830+
exp: 'Exp',
1831+
lg: 'Lg',
1832+
lb: 'Lb',
1833+
// Other common functions
1834+
sqrt: 'Sqrt',
1835+
abs: 'Abs',
1836+
sgn: 'Sgn',
1837+
sign: 'Sgn',
1838+
floor: 'Floor',
1839+
ceil: 'Ceil',
1840+
round: 'Round',
1841+
max: 'Max',
1842+
min: 'Min',
1843+
gcd: 'Gcd',
1844+
lcm: 'Lcm',
1845+
};
1846+
1847+
const fnName = BARE_FUNCTION_MAP[name];
1848+
if (!fnName) {
1849+
// Not a recognized function name, backtrack
1850+
this.index = start;
1851+
return null;
1852+
}
1853+
1854+
// Parse the arguments in the enclosure (parentheses)
1855+
const args = this.parseArguments('enclosure', until);
1856+
1857+
if (args === null) {
1858+
// No valid arguments found, backtrack
1859+
this.index = start;
1860+
return null;
1861+
}
1862+
1863+
return [fnName, ...args];
1864+
}
1865+
17601866
/**
17611867
* Parse a sequence superfix/subfix operator, e.g. `^{*}`
17621868
*
@@ -1787,8 +1893,16 @@ export class _Parser implements Parser {
17871893
if (this.match('_') || this.match('^'))
17881894
subscripts.push(this.error('syntax-error', subIndex));
17891895
else {
1790-
const sub =
1791-
this.parseGroup() ?? this.parseToken() ?? this.parseStringGroup();
1896+
let sub = this.parseGroup() ?? this.parseToken();
1897+
// In non-strict mode, also accept parenthesized expressions
1898+
// Note: After match('_'), peek has changed but TypeScript doesn't know
1899+
if (
1900+
sub === null &&
1901+
this.options.strict === false &&
1902+
(this.peek as string) === '('
1903+
)
1904+
sub = this.parseEnclosure();
1905+
sub ??= this.parseStringGroup();
17921906
if (sub === null) return this.error('missing', index);
17931907

17941908
subscripts.push(sub);
@@ -1798,7 +1912,15 @@ export class _Parser implements Parser {
17981912
if (this.match('_') || this.match('^'))
17991913
superscripts.push(this.error('syntax-error', subIndex));
18001914
else {
1801-
const sup = this.parseGroup() ?? this.parseToken();
1915+
let sup = this.parseGroup() ?? this.parseToken();
1916+
// In non-strict mode, also accept parenthesized expressions
1917+
// Note: After match('^'), peek has changed but TypeScript doesn't know
1918+
if (
1919+
sup === null &&
1920+
this.options.strict === false &&
1921+
(this.peek as string) === '('
1922+
)
1923+
sup = this.parseEnclosure();
18021924
if (sup === null) return this.error('missing', index);
18031925
superscripts.push(sup);
18041926
}
@@ -2104,6 +2226,9 @@ export class _Parser implements Parser {
21042226
if (result === null && this.matchAll(this._imaginaryUnitTokens))
21052227
result = 'ImaginaryUnit';
21062228

2229+
// In non-strict mode, try to parse bare function names like sin(x)
2230+
result ??= this.tryParseBareFunction(until);
2231+
21072232
// ParseGenericExpression() has priority. Some generic expressions
21082233
// may include symbols which have not been explicitly defined
21092234
// with a 'symbol' kind

src/compute-engine/latex-syntax/types.ts

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -705,6 +705,17 @@ export type NumberSerializationFormat = NumberFormat & {
705705
*/
706706

707707
export type ParseLatexOptions = NumberFormat & {
708+
/**
709+
* Controls the strictness of LaTeX parsing:
710+
*
711+
* - `true`: Strict LaTeX syntax required (e.g., `\sin{x}`, `x^{n+1}`)
712+
* - `false`: Accept relaxed Math-ASCII/Typst-like syntax in addition to
713+
* LaTeX (e.g., `sin(x)`, `x^(n+1)`)
714+
*
715+
* **Default**: `true`
716+
*/
717+
strict: boolean;
718+
708719
/**
709720
* If true, ignore space characters in math mode.
710721
*

0 commit comments

Comments
 (0)