|
2 | 2 |
|
3 | 3 | Three inline scanners are registered: |
4 | 4 |
|
5 | | -- **gfm_autolink_www** (char ``w``): bare ``www.`` URLs |
| 5 | +- **gfm_autolink_www** (char ``w``): bare ``www.`` URLs. |
| 6 | + Uses ``add_terminator_char("w")`` so the text scanner interrupts at ``w``. |
6 | 7 | - **gfm_autolink_protocol** (char ``:``): ``http://``, ``https://``, |
7 | 8 | ``mailto:``, ``xmpp:`` URLs via back-scanning ``pending``. |
8 | 9 | - **gfm_autolink_email** (char ``@``): bare email addresses via |
9 | 10 | back-scanning ``pending``. |
10 | 11 |
|
| 12 | +Since ``:`` and ``@`` are already default terminator characters in |
| 13 | +markdown-it-py, the protocol and email rules are invoked at every occurrence |
| 14 | +of those characters. They use a *back-scanning* approach: looking backwards |
| 15 | +through ``state.pending`` for a protocol prefix or email local-part that was |
| 16 | +accumulated by the text rule. This means every ``:`` and ``@`` in the |
| 17 | +document incurs a (cheap) regex check or character scan of pending text. |
| 18 | +
|
| 19 | +The trade-off vs. a **core-rule** (post-processing) approach β which would |
| 20 | +walk the final token stream, find autolink patterns in text tokens, and |
| 21 | +split them β is: |
| 22 | +
|
| 23 | +- **Inline approach** (current): simpler, integrates naturally with |
| 24 | + ``state.linkLevel`` to suppress matching inside links, but relies on the |
| 25 | + prefix being present in ``state.pending`` (if a prior inline rule consumed |
| 26 | + part of the prefix, matching would fail β unlikely in practice). |
| 27 | +- **Core-rule approach**: guaranteed to find all autolinks regardless of |
| 28 | + inline rule ordering, but requires token-stream surgery (splitting text |
| 29 | + tokens and inserting link tokens) and cannot easily interact with nesting |
| 30 | + guards like ``linkLevel``. |
| 31 | +
|
| 32 | +The ``w`` terminator is the only *new* terminator added. It causes the text |
| 33 | +rule to interrupt at every ``w``, which is a minor performance cost for |
| 34 | +documents heavy in that letter, but necessary since ``www.`` must be matched |
| 35 | +from the start of the URL. |
| 36 | +
|
11 | 37 | Specification: https://github.github.com/gfm/#autolinks-extension- |
12 | 38 |
|
13 | 39 | .. versionadded:: 0.5.0 |
|
0 commit comments