Skip to content

Commit f5482be

Browse files
authored
Fix lone marker line wrongly starting a block in significantNewlines mode (#180)
In significantNewlines mode, a hard-wrapped prose line beginning with -, *, +, > or | was unconditionally treated as a list/blockquote/table, turning arithmetic and comparison expressions into spurious blocks: Die Frage ist, wann ist x = 5 * 3 + 17 wahr. became a paragraph followed by a bullet list. The bullet, blockquote and table branches of startsNewBlockSignificant() had no "is this a real block?" guard, unlike the ordered-list branch which already restricts interruption to "1." (so 5. / 1985. stay prose). Add a bounded one-line lookahead, threaded only into the paragraph collector (the other startsNewBlock() call sites keep their previous behavior via the null default). A lone bullet/blockquote/table marker now interrupts a paragraph only when it forms a real block: two or more consecutive markers, an indented continuation, or a preceding blank line. Headings and code/comment/div fences are unambiguous and still interrupt on a single line. This is a deliberate behavior change in significantNewlines mode: single-item lists and single-line blockquotes no longer interrupt a paragraph without a blank line. Documented in parser-options.md; affected tests updated and regression tests added.
1 parent d9684af commit f5482be

5 files changed

Lines changed: 204 additions & 15 deletions

File tree

docs/guide/parser-options.md

Lines changed: 30 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -170,22 +170,46 @@ Output:
170170

171171
Note: Soft break rendering is controlled separately via `SoftBreakMode` - see the [Soft Break Modes](#soft-break-modes) section above.
172172

173+
#### Lone marker lines are not blocks
174+
175+
Hard-wrapped prose frequently starts a line with `-`, `*`, `+`, `>` or `|` as an
176+
arithmetic/comparison operator or pipe rather than a list/quote/table marker:
177+
178+
```text
179+
Die Frage ist, wann ist x = 5
180+
* 3 + 17 wahr.
181+
```
182+
183+
To avoid turning these into spurious lists, a **single** marker line followed by
184+
ordinary prose does *not* interrupt a paragraph. A bullet/blockquote/table
185+
marker only interrupts when it forms a *real* block:
186+
187+
- two or more consecutive marker lines (`- a` / `- b`, or `> a` / `> b`), **or**
188+
- a marker line with an indented continuation (`- item` / ` more`), **or**
189+
- it is preceded by a blank line (then any single marker starts a block).
190+
191+
This mirrors the existing rule that only `1.` (not `5.` or `1985.`) interrupts
192+
a paragraph as an ordered list. Headings (`#`) and code/comment/div fences are
193+
unambiguous and still interrupt on a single line.
194+
173195
### Preventing Block Interruption with Escaping
174196

175197
In significant newlines mode, if you want to include literal block markers without triggering block parsing, escape the first character with a backslash:
176198

177199
```php
178200
$converter = DjotConverter::withSignificantNewlines();
179201

180-
// Without escaping - creates a list
202+
// Without escaping - two markers form a list
181203
$result = $converter->convert("Price:
182-
- 10 dollars");
183-
// Output: <p>Price:</p><ul><li>10 dollars</li></ul>
204+
- 10 dollars
205+
- 5 cents");
206+
// Output: <p>Price:</p><ul><li>10 dollars</li><li>5 cents</li></ul>
184207

185-
// With escaping - literal text
208+
// With escaping - literal text (first marker neutralized)
186209
$result = $converter->convert("Price:
187-
\\- 10 dollars");
188-
// Output: <p>Price:<br>- 10 dollars</p>
210+
\\- 10 dollars
211+
- 5 cents");
212+
// Output: <p>Price:<br>- 10 dollars<br>- 5 cents</p>
189213
```
190214

191215
Common escapes:

src/Parser/BlockParser.php

Lines changed: 77 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -2724,7 +2724,7 @@ protected function tryParseParagraph(Node $parent, array $lines, int $start): in
27242724
break;
27252725
}
27262726

2727-
if (!$hasUnclosedBrace && $this->startsNewBlock($nextLine)) {
2727+
if (!$hasUnclosedBrace && $this->startsNewBlock($nextLine, $lines, $i)) {
27282728
break;
27292729
}
27302730

@@ -2876,7 +2876,15 @@ protected function appendToLastParagraph(Node $parent, string $content, int $lin
28762876
}
28772877
}
28782878

2879-
protected function startsNewBlock(string $line): bool
2879+
/**
2880+
* Determine whether a continuation line should interrupt the current block (paragraph etc.).
2881+
*
2882+
* @param string $line The continuation line being inspected
2883+
* @param array<string>|null $lines All source lines, when lookahead is available (prose
2884+
* interruption only). When null, a lone marker keeps the legacy "interrupts" behavior.
2885+
* @param int $index Index of $line within $lines (so the next line is $lines[$index + 1])
2886+
*/
2887+
protected function startsNewBlock(string $line, ?array $lines = null, int $index = -1): bool
28802888
{
28812889
// Quick check: empty lines don't start blocks
28822890
if ($line === '' || !isset($line[0])) {
@@ -2897,7 +2905,7 @@ protected function startsNewBlock(string $line): bool
28972905

28982906
// In significantNewlines mode, block elements can interrupt paragraphs
28992907
if ($this->significantNewlines) {
2900-
return $this->startsNewBlockSignificant($line);
2908+
return $this->startsNewBlockSignificant($line, $lines, $index);
29012909
}
29022910

29032911
// Standard djot behavior:
@@ -2914,8 +2922,13 @@ protected function startsNewBlock(string $line): bool
29142922
* - Ordered lists (1. 2. etc)
29152923
* - Code fences (```)
29162924
* - Fenced divs (:::)
2925+
*
2926+
* @param string $line The continuation line being inspected
2927+
* @param array<string>|null $lines All source lines, when prose lookahead is
2928+
* available; null keeps the legacy "lone marker interrupts" behavior.
2929+
* @param int $index Index of $line within $lines
29172930
*/
2918-
protected function startsNewBlockSignificant(string $line): bool
2931+
protected function startsNewBlockSignificant(string $line, ?array $lines = null, int $index = -1): bool
29192932
{
29202933
// Use first-char switch to avoid unnecessary regex checks
29212934
$first = $line[0];
@@ -2929,16 +2942,34 @@ protected function startsNewBlockSignificant(string $line): bool
29292942
case '+':
29302943
// Unordered lists or thematic breaks
29312944
if (isset($line[1]) && $line[1] === ' ') {
2945+
// A lone marker line in flowing prose is almost always an
2946+
// operator ("x = 5\n* 3 + 17", "10\n- 3 ist 7"), not a list.
2947+
// When prose lookahead is available, only interrupt for a
2948+
// real block: 2+ markers or an indented continuation.
2949+
if ($lines !== null) {
2950+
return $this->significantMarkerHasBlockContinuation('bullet', $lines, $index);
2951+
}
2952+
29322953
return true; // Unordered list
29332954
}
29342955

29352956
// Thematic breaks: *\s*\*\s*\* or -\s*-\s*-
29362957
return preg_match('/^(\*\s*\*\s*\*|-\s*-\s*-)/', $line) === 1;
29372958
case '|':
2938-
// Tables
2959+
// Tables — a single "| ..." line is not a valid table; in prose
2960+
// require a real table (next line also a row, e.g. delimiter).
2961+
if ($lines !== null) {
2962+
return $this->significantMarkerHasBlockContinuation('table', $lines, $index);
2963+
}
2964+
29392965
return true;
29402966
case '>':
2941-
// Block quotes
2967+
// Block quotes — a lone ">" in prose ("if x\n> 5 then") is a
2968+
// comparison, not a quote; require a real (multi-line) quote.
2969+
if ($lines !== null) {
2970+
return $this->significantMarkerHasBlockContinuation('quote', $lines, $index);
2971+
}
2972+
29422973
return true;
29432974
case '`':
29442975
// Code fences: `{3,}
@@ -2960,6 +2991,46 @@ protected function startsNewBlockSignificant(string $line): bool
29602991
}
29612992
}
29622993

2994+
/**
2995+
* In significantNewlines mode, decide whether a lone marker line in flowing
2996+
* prose really begins a block, or is just an operator/punctuation that
2997+
* happens to start a hard-wrapped line ("x = 5\n* 3 + 17", "if x\n> 5").
2998+
*
2999+
* A real block requires the immediately following line to either continue
3000+
* the block (another same-kind marker, i.e. 2+ markers) or be an indented
3001+
* continuation. A single isolated marker line stays paragraph text.
3002+
*
3003+
* @param string $kind One of 'bullet', 'quote', 'table'
3004+
* @param array<string> $lines All source lines
3005+
* @param int $index Index of the marker line within $lines
3006+
*/
3007+
private function significantMarkerHasBlockContinuation(string $kind, array $lines, int $index): bool
3008+
{
3009+
$next = $lines[$index + 1] ?? null;
3010+
3011+
// Lone marker as the last line (e.g. "Total: 5\n- 3") is not a block.
3012+
if ($next === null || IndentationHelper::isBlankLine($next)) {
3013+
return false;
3014+
}
3015+
3016+
// Indented continuation of a multi-line first item ("- item\n more").
3017+
// Only meaningful for lists; quotes/tables need their own marker.
3018+
if ($kind === 'bullet' && preg_match('/^\s/', $next) === 1) {
3019+
return true;
3020+
}
3021+
3022+
$t = ltrim($next);
3023+
3024+
return match ($kind) {
3025+
'bullet' => isset($t[0], $t[1])
3026+
&& ($t[0] === '-' || $t[0] === '*' || $t[0] === '+')
3027+
&& $t[1] === ' ',
3028+
'quote' => isset($t[0]) && $t[0] === '>',
3029+
'table' => isset($t[0]) && $t[0] === '|',
3030+
default => true,
3031+
};
3032+
}
3033+
29633034
/**
29643035
* Check if line starts a block element that should terminate list content collection.
29653036
*

tests/TestCase/Parser/BlockParserTest.php

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -544,8 +544,10 @@ public function testSignificantNewlinesListInterruptsParagraph(): void
544544

545545
public function testSignificantNewlinesBlockquoteInterruptsParagraph(): void
546546
{
547+
// A lone ">" line in prose is a comparison operator; interrupting a
548+
// paragraph requires a real (2+ line) quote.
547549
$parser = new BlockParser(significantNewlines: true);
548-
$doc = $parser->parse("They said:\n> This is important");
550+
$doc = $parser->parse("They said:\n> This is important\n> Pay attention");
549551

550552
$children = $doc->getChildren();
551553
$this->assertCount(2, $children);

tests/TestCase/SignificantNewlinesTest.php

Lines changed: 91 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -65,15 +65,28 @@ public function testListInterruptsParagraph(): void
6565

6666
public function testBlockquoteInterruptsParagraph(): void
6767
{
68+
// A lone ">" line in flowing prose is a comparison operator, not a
69+
// quote ("if x\n> 5"). Interrupting requires a *real* (2+ line) quote.
6870
$parser = new BlockParser(significantNewlines: true);
69-
$doc = $parser->parse("They said:\n> This is important");
71+
$doc = $parser->parse("They said:\n> This is important\n> Pay attention");
7072

7173
$children = $doc->getChildren();
7274
$this->assertCount(2, $children);
7375
$this->assertInstanceOf(Paragraph::class, $children[0]);
7476
$this->assertInstanceOf(BlockQuote::class, $children[1]);
7577
}
7678

79+
public function testLoneBlockquoteMarkerDoesNotInterruptParagraph(): void
80+
{
81+
// "if x\n> 5 then ..." — the ">" is greater-than, not a blockquote.
82+
$parser = new BlockParser(significantNewlines: true);
83+
$doc = $parser->parse("Wenn x\n> 5 dann ist die Bedingung wahr.");
84+
85+
$children = $doc->getChildren();
86+
$this->assertCount(1, $children);
87+
$this->assertInstanceOf(Paragraph::class, $children[0]);
88+
}
89+
7790
public function testOrderedListInterruptsParagraph(): void
7891
{
7992
$parser = new BlockParser(significantNewlines: true);
@@ -238,7 +251,8 @@ public function testConverterConstructorParameter(): void
238251
{
239252
$converter = new DjotConverter(significantNewlines: true);
240253

241-
$djot = "They said:\n> Important";
254+
// 2+ line quote required (a single ">" line is treated as prose).
255+
$djot = "They said:\n> Important\n> Really";
242256
$result = $converter->convert($djot);
243257

244258
$this->assertStringContainsString('<blockquote>', $result);
@@ -356,4 +370,79 @@ public function testHighNumberedListAfterBlankLine(): void
356370
$this->assertInstanceOf(Paragraph::class, $children[0]);
357371
$this->assertInstanceOf(ListBlock::class, $children[1]);
358372
}
373+
374+
// ==================== Operator vs. List Marker (bullet/table) ====================
375+
//
376+
// In hard-wrapped prose a line can begin with -, *, +, > or | as an
377+
// arithmetic/comparison operator or pipe. A *lone* marker line followed by
378+
// ordinary prose must NOT become a list/quote/table; a real block requires
379+
// 2+ marker lines or an indented continuation. (Mirrors the existing
380+
// "only 1. interrupts" rule for ordered lists.)
381+
382+
public function testMultiplicationStarDoesNotBecomeList(): void
383+
{
384+
$parser = new BlockParser(significantNewlines: true);
385+
$doc = $parser->parse("Die Frage ist, wann ist x = 5\n* 3 + 17 wahr. Leider eine Liste\nim Text.");
386+
387+
$children = $doc->getChildren();
388+
$this->assertCount(1, $children);
389+
$this->assertInstanceOf(Paragraph::class, $children[0]);
390+
}
391+
392+
public function testMinusOperatorDoesNotBecomeList(): void
393+
{
394+
$parser = new BlockParser(significantNewlines: true);
395+
$doc = $parser->parse("Das Ergebnis von 10\n- 3 ist 7. Kein Listenpunkt.");
396+
397+
$children = $doc->getChildren();
398+
$this->assertCount(1, $children);
399+
$this->assertInstanceOf(Paragraph::class, $children[0]);
400+
}
401+
402+
public function testPlusOperatorDoesNotBecomeList(): void
403+
{
404+
$parser = new BlockParser(significantNewlines: true);
405+
$doc = $parser->parse("Die Summe ist 5\n+ 3 ergibt 8. Keine Liste.");
406+
407+
$children = $doc->getChildren();
408+
$this->assertCount(1, $children);
409+
$this->assertInstanceOf(Paragraph::class, $children[0]);
410+
}
411+
412+
public function testLonePipeLineDoesNotSplitParagraph(): void
413+
{
414+
// Regression: a single "| ..." line is not a valid table, yet it used
415+
// to sever the paragraph into two stray <p> blocks.
416+
$parser = new BlockParser(significantNewlines: true);
417+
$doc = $parser->parse("Das berechnet a\n| b als bitweises Oder.");
418+
419+
$children = $doc->getChildren();
420+
$this->assertCount(1, $children);
421+
$this->assertInstanceOf(Paragraph::class, $children[0]);
422+
}
423+
424+
public function testTwoMarkersStillFormAList(): void
425+
{
426+
// The guard must not regress real lists: 2+ markers => list.
427+
$parser = new BlockParser(significantNewlines: true);
428+
$doc = $parser->parse("Hier eine Liste:\n- erstes Element\n- zweites Element");
429+
430+
$children = $doc->getChildren();
431+
$this->assertCount(2, $children);
432+
$this->assertInstanceOf(Paragraph::class, $children[0]);
433+
$this->assertInstanceOf(ListBlock::class, $children[1]);
434+
}
435+
436+
public function testSingleBulletWithIndentedContinuationIsList(): void
437+
{
438+
// One item but with an indented continuation line still reads as a
439+
// real (multi-line) list, so it interrupts.
440+
$parser = new BlockParser(significantNewlines: true);
441+
$doc = $parser->parse("Shopping:\n- milk and\n some bread");
442+
443+
$children = $doc->getChildren();
444+
$this->assertCount(2, $children);
445+
$this->assertInstanceOf(Paragraph::class, $children[0]);
446+
$this->assertInstanceOf(ListBlock::class, $children[1]);
447+
}
359448
}

tests/TestCase/SoftBreakModeTest.php

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -131,9 +131,12 @@ public function testBlockquotesWithoutBlankLines(): void
131131
softBreakMode: SoftBreakMode::Space,
132132
);
133133

134+
// A real (2+ line) quote interrupts without a blank line; a single ">"
135+
// line in prose is treated as a comparison operator, not a quote.
134136
$djot = <<<'DJOT'
135137
Some text
136138
> A quote
139+
> spanning two lines
137140
More text
138141
DJOT;
139142

0 commit comments

Comments
 (0)