fix: Doc Maker content duplication bug and add nested/parenthesized list support#321
fix: Doc Maker content duplication bug and add nested/parenthesized list support#321QuantumNightmare wants to merge 11 commits into
Conversation
…xes an issue where content in lists were being duplicated
…irst list don't display
🔍 Integration Validation ResultsCommit: Changed directories:
✅ Structure Check output❌ Code Check output✅ Tests Check output✅ README Check output✅ Version Check output |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: dd617f1bf4
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| m = _PAREN_ITEM_RE.match(stripped) | ||
| if m: | ||
| ol_type, start_val = _detect_paren_type(m.group(1)) | ||
| indent_spaces = len(line) - len(line.lstrip()) | ||
| list_items.append((ol_type, start_val, stripped[m.end():], indent_spaces)) |
There was a problem hiding this comment.
Preserve non-list text in mixed paragraphs
When Python-Markdown emits a soft-broken block like Intro (1) first (2) second as one <p>, this loop only records lines that start with a parenthesized marker; the later p.decompose() removes the original paragraph, so the leading Intro line (and any continuation lines that do not start with a marker) is silently dropped from the generated document. Mixed explanatory text plus a parenthesized list should either keep the non-matching lines as a paragraph or append continuations to the relevant item before decomposing the original node.
Useful? React with 👍 / 👎.
| low = marker.lower() | ||
| if low in _ROMAN_VALS: | ||
| return "i", _ROMAN_VALS[low] |
There was a problem hiding this comment.
Keep alphabetic lists from switching at (i)
For a parenthesized lower-letter list that reaches (i) (also (v) or (x)), this check classifies the marker as lowerRoman before considering single-letter alpha markers, so an (a)…(i) list changes format and restarts instead of continuing with the ninth alphabetic item. Because the integration advertises (a) list support, the marker type needs to be inferred from the surrounding sequence/run rather than treating every roman-looking single letter as roman in isolation.
Useful? React with 👍 / 👎.
Description
Purpose: Fix a bug in the Doc Maker integration where content inside list items was being duplicated due to recursive element traversal stepping down into nested elements within a list item. Also adds support for nested lists, parenthesized numbering formats
(1),(a),(i)etc., and numbered list continuation across document sections.Approach: Adds
recursive=Falseto the top-levelsoup.find_all()call to prevent double-processing of nested elements. Introduces a new_add_list_items()function that recursively handles nested<ul>/<ol>elements with proper OOXML numbering definitions. Adds_post_process_paren_lists()to detect and convert(a),(1),(i)text patterns into proper nested<ol>elements before processing. Implements low-level OOXML numbering helpers (_get_or_create_abstract_num,_create_num,_apply_numbering,_patch_abstract_num_level) to produce correct multilevel Word numbering with consistent hanging indents. Updates_add_formatted_text_to_paragraph()with askip_nested_listsflag to avoid re-processing nested list content.Type of change
Test plan
Exercises the new and fixed list-rendering logic in
parse_markdown_to_docx.doc-maker/tests/test_doc_maker_unit.py:TestParenthesizedListNumbering— verifies nested(a)/(1)/(i)lists share a singlenumId, correctilvlvalues, and correctnumFmtper level.TestMultipleParenListsAfterHeadings— verifies that independent paren lists separated by headings each get their ownabstractNumand all items are numbered.TestMixedNumberedListFormats— verifies all five list styles (1.,(1),(a),(A),(i)) render with the correctnumFmt, parenthesizedlvlText, left-alignment, and consistent hanging indent.TestOrderedListStartOverride— verifies numbered list continuation across headings respects the start number (4. five sixfollowing1. two three).TestNestedNumberedListIndentation— verifies deeply nested (5-level) mixed-format lists produce correctilvl,numFmt, andleftindent values (hanging × (ilvl + 1))..docxfrom a markdown document containing nested and parenthesized lists and open it in Word/LibreOffice to visually confirm no duplication and correct formatting.Author(s) to check