|
| 1 | +# `_frontmatter.py` coverage |
| 2 | + |
| 3 | +Valid at: a6f4c47 |
| 4 | + |
| 5 | +## Recent changes |
| 6 | + |
| 7 | +- a6f4c47 — dropped the `if text.startswith(_UTF8_BOM):` guard in |
| 8 | + `parse_frontmatter` before the `text = text.removeprefix(_UTF8_BOM)` |
| 9 | + call. Python's `str.removeprefix` is already a no-op (returns the |
| 10 | + same string object) when the prefix is absent, so the guard was |
| 11 | + purely decorative dead code. Behavior preserved: |
| 12 | + - BOM-prefixed input still gets stripped (pinned by |
| 13 | + `test_utf8_bom_does_not_break_frontmatter` in |
| 14 | + `tests/test_frontmatter.py`). |
| 15 | + - Non-BOM input is passed through unchanged (exercised by every |
| 16 | + other parse test in the file). |
| 17 | + - The CPython implementation returns the same object identity when |
| 18 | + no prefix match occurs, so there's no allocation overhead either. |
| 19 | + |
| 20 | +## Shape of the module |
| 21 | + |
| 22 | +- `parse_frontmatter(text)` — public entry point. Strips optional |
| 23 | + UTF-8 BOM, splits on `---` delimiters via `_extract_frontmatter_block`, |
| 24 | + runs `yaml.safe_load`, strips HTML comments from the body with |
| 25 | + `_strip_html_comments`, returns `(dict, body)`. |
| 26 | +- `serialize_frontmatter(frontmatter, body)` — inverse; emits |
| 27 | + `---`-delimited blocks only when the frontmatter is non-empty *or* |
| 28 | + the body would otherwise be mis-parsed as frontmatter. |
| 29 | +- Constants: `RALPH_MARKER`, `FIELD_*`, `CMD_FIELD_*`, `NAME_RE`, |
| 30 | + `VALID_NAME_CHARS_MSG`. All imported by `cli.py` and |
| 31 | + `_resolver.py`; each one is reused across modules so centralisation |
| 32 | + is justified. |
| 33 | + |
| 34 | +## Verified live (grepped, confirmed used) |
| 35 | + |
| 36 | +- `_FRONTMATTER_DELIMITER` — used 4× in `serialize_frontmatter` (plus |
| 37 | + 2× in `_extract_frontmatter_block`). |
| 38 | +- `_FENCE_OR_COMMENT_RE` — used in `_strip_html_comments`. |
| 39 | +- `_UTF8_BOM` — used in `parse_frontmatter` (BOM strip). |
| 40 | + |
| 41 | +## Potential future wins (not yet taken) |
| 42 | + |
| 43 | +- `_extract_frontmatter_block` splits `text` on `"\n"` up front, then |
| 44 | + re-joins slices for the body — the body slice is re-joined even when |
| 45 | + the input is tiny. Could stream with `str.find` + `str.index` to |
| 46 | + avoid the list allocation, but this function runs once per |
| 47 | + iteration and the input is ~1 KB in practice; not worth the churn. |
| 48 | +- The `serialize_frontmatter` `needs_delimiters` expression uses |
| 49 | + `body.lstrip().startswith(...)` which allocates a new stripped |
| 50 | + string just to check a prefix. Could collapse via |
| 51 | + `re.match(r"\s*---", body)` but the current form reads cleanly; |
| 52 | + revisit only if this becomes hot. |
0 commit comments