Skip to content

fix: Doc Maker content duplication bug and add nested/parenthesized list support#321

Open
QuantumNightmare wants to merge 11 commits into
masterfrom
jf-fix-doc-maker-duplication-bug
Open

fix: Doc Maker content duplication bug and add nested/parenthesized list support#321
QuantumNightmare wants to merge 11 commits into
masterfrom
jf-fix-doc-maker-duplication-bug

Conversation

@QuantumNightmare
Copy link
Copy Markdown
Contributor

Description

  • Purpose: Fix a bug in the Doc Maker integration where content inside list items was being duplicated due to recursive element traversal stepping down into nested elements within a list item. Also adds support for nested lists, parenthesized numbering formats (1), (a), (i) etc., and numbered list continuation across document sections.

  • Approach: Adds recursive=False to the top-level soup.find_all() call to prevent double-processing of nested elements. Introduces a new _add_list_items() function that recursively handles nested <ul>/<ol> elements with proper OOXML numbering definitions. Adds _post_process_paren_lists() to detect and convert (a), (1), (i) text patterns into proper nested <ol> elements before processing. Implements low-level OOXML numbering helpers (_get_or_create_abstract_num, _create_num, _apply_numbering, _patch_abstract_num_level) to produce correct multilevel Word numbering with consistent hanging indents. Updates _add_formatted_text_to_paragraph() with a skip_nested_lists flag to avoid re-processing nested list content.

Type of change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • This change requires a documentation update

Test plan

Exercises the new and fixed list-rendering logic in parse_markdown_to_docx.

  1. Run the new unit tests in doc-maker/tests/test_doc_maker_unit.py:
    • TestParenthesizedListNumbering — verifies nested (a)/(1)/(i) lists share a single numId, correct ilvl values, and correct numFmt per level.
    • TestMultipleParenListsAfterHeadings — verifies that independent paren lists separated by headings each get their own abstractNum and all items are numbered.
    • TestMixedNumberedListFormats — verifies all five list styles (1., (1), (a), (A), (i)) render with the correct numFmt, parenthesized lvlText, left-alignment, and consistent hanging indent.
    • TestOrderedListStartOverride — verifies numbered list continuation across headings respects the start number (4. five six following 1. two three).
    • TestNestedNumberedListIndentation — verifies deeply nested (5-level) mixed-format lists produce correct ilvl, numFmt, and left indent values (hanging × (ilvl + 1)).
  2. Generate a .docx from a markdown document containing nested and parenthesized lists and open it in Word/LibreOffice to visually confirm no duplication and correct formatting.

Author(s) to check

  • Project and all contained modules builds successfully
  • Self-/dev-tested
  • Unit/UI/Automation/Integration tests provided where applicable
  • Code is written to standards
  • Appropriate documentation written (code comments, internal docs)

@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 12, 2026

🔍 Integration Validation Results

Commit: a894f5adf98374b8f2f681eddcabffbb77d38356 · Fix repository validation checks
Updated: 2026-05-12T04:51:34Z

Changed directories: doc-maker

Check Result
Structure ✅ Passed
Code ❌ Failed
Tests ✅ Passed
README ✅ Passed
Version ✅ Passed
✅ Structure Check output
Validating 1 integration(s)...

============================================================
Integration: doc-maker
============================================================
✅ All checks passed!

============================================================
SUMMARY
============================================================
Integrations validated: 1
Total errors: 0
Total warnings: 0

✅ All validations passed!
❌ Code Check output

[notice] A new release of pip is available: 26.0.1 -> 26.1.1
[notice] To update, run: pip install --upgrade pip
----------------------------------------
Checking: doc-maker
----------------------------------------

📦 Installing dependencies...

🐍 Checking Python syntax...
   ✅ Syntax OK

📥 Checking imports...
   ✅ Imports OK

📄 Checking JSON files...
   ✅ JSON files OK

🔍 Linting with ruff...
   ✅ Lint OK

🎨 Checking formatting with ruff...
   ❌ Formatting issues found

   Fix: Run 'ruff format' to auto-format

🔒 Scanning for security issues with bandit...
   ✅ Security OK

🛡️ Checking dependencies for vulnerabilities with pip-audit...
   ✅ Dependencies OK

🔗 Checking config-code sync...
   ⚠️  Action 'create_document': parameter 'title' accessed in code but not defined in input_schema
   ⚠️  Action 'add_image': parameter 'files' is required in schema but accessed with inputs.get() (safe for missing)
   ✅ Config-code sync OK

🔄 Checking fetch patterns...
   ✅ Fetch patterns OK

========================================
❌ CODE CHECK FAILED
========================================
✅ Tests Check output

Integration   Tests  Coverage        Status
-------------------------------------------
doc-maker     90/90       65%      ✅ Passed
-------------------------------------------
Total         90/90            ✅ All passed

✅ Tests passed: doc-maker
✅ README Check output
========================================
✅ README CHECK PASSED
========================================
✅ Version Check output
✅ doc-maker: 2.0.0 → 3.0.0 (major bump)

========================================
✅ VERSION CHECK PASSED
========================================

Comment thread doc-maker/tests/test_doc_maker_unit.py Fixed
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: dd617f1bf4

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread doc-maker/doc_maker.py Outdated
Comment on lines +665 to +669
m = _PAREN_ITEM_RE.match(stripped)
if m:
ol_type, start_val = _detect_paren_type(m.group(1))
indent_spaces = len(line) - len(line.lstrip())
list_items.append((ol_type, start_val, stripped[m.end():], indent_spaces))
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Preserve non-list text in mixed paragraphs

When Python-Markdown emits a soft-broken block like Intro (1) first (2) second as one <p>, this loop only records lines that start with a parenthesized marker; the later p.decompose() removes the original paragraph, so the leading Intro line (and any continuation lines that do not start with a marker) is silently dropped from the generated document. Mixed explanatory text plus a parenthesized list should either keep the non-matching lines as a paragraph or append continuations to the relevant item before decomposing the original node.

Useful? React with 👍 / 👎.

Comment thread doc-maker/doc_maker.py
Comment on lines +576 to +578
low = marker.lower()
if low in _ROMAN_VALS:
return "i", _ROMAN_VALS[low]
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Keep alphabetic lists from switching at (i)

For a parenthesized lower-letter list that reaches (i) (also (v) or (x)), this check classifies the marker as lowerRoman before considering single-letter alpha markers, so an (a)(i) list changes format and restarts instead of continuing with the ninth alphabetic item. Because the integration advertises (a) list support, the marker type needs to be inferred from the surrounding sequence/run rather than treating every roman-looking single letter as roman in isolation.

Useful? React with 👍 / 👎.

@QuantumNightmare QuantumNightmare changed the title Fix Doc Maker content duplication bug and add nested/parenthesized list support fix: Doc Maker content duplication bug and add nested/parenthesized list support May 12, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant