MsWordDocumentBackend does not correctly extract tables when bullet lists in different cells use the same numId

### Bug

When a DOCX table contains multiple bullet lists with the same `numId`, `MsWordDocumentBackend` does not extract the table correctly.

### Steps to reproduce

#### Using python-docx, create a DOCX file with a 2-cells table containing bullet lists that share the same `numId`

```python
from docx import Document
from docx.oxml import OxmlElement
from docx.oxml.ns import qn


def bullets(cell, items, numId="1"):
    """Add bullet paragraphs with numPr injected directly into each paragraph."""
    for i, text in enumerate(items):
        p = cell.paragraphs[0] if i == 0 else cell.add_paragraph()
        p.text = text
        pPr = p._p.get_or_add_pPr()
        numPr = OxmlElement("w:numPr")
        for tag, val in [("w:ilvl", "0"), ("w:numId", numId)]:
            el = OxmlElement(tag)
            el.set(qn("w:val"), val)
            numPr.append(el)
        pPr.insert(0, numPr)


doc = Document()
table = doc.add_table(rows=2, cols=1)
table.style = "Table Grid"

bullets(table.cell(0, 0), ["First row first bulletpoint", "First row second bulletpoint"])
bullets(table.cell(1, 0), ["Second row first bulletpoint", "Second row second bulletpoint"])

doc.save("issue-table.docx")
print("Saved issue-table.docx")
```

#### Extract the table

```python
from docling.document_converter import DocumentConverter
import re

docling_doc = DocumentConverter().convert("issue-table.docx").document
print(re.search(r"<table>.*?</table>", docling_doc.export_to_html(), re.DOTALL).group(0))
```

#### Actual result

```html
<table><tbody><tr><th><ul>
<li>First row first bulletpoint</li>
<li>First row second bulletpoint</li>
<li>Second row first bulletpoint</li>
<li>Second row second bulletpoint</li>
</ul></th></tr><tr><td></td></tr></tbody></table>
```

The two bullet lists are merged into a single list in the first table cell, and the second cell is left empty.

#### Docling item extraction

```python
[i for i in docling_doc.iterate_items(with_groups=True)]
```

The extracted items show that all list items were attached to a single `ListGroup`:

```
[(GroupItem(self_ref='#/body', parent=None, children=[RefItem(cref='#/tables/0')], content_layer=<ContentLayer.BODY: 'body'>, meta=None, name='_root_', label=<GroupLabel.UNSPECIFIED: 'unspecified'>),
  0),
 (TableItem(self_ref='#/tables/0', parent=RefItem(cref='#/body'), children=[RefItem(cref='#/groups/1')], content_layer=<ContentLayer.BODY: 'body'>, meta=None, label=<DocItemLabel.TABLE: 'table'>, prov=[], source=[], comments=[], captions=[], references=[], footnotes=[], image=None, data=TableData(table_cells=[RichTableCell(bbox=None, row_span=1, col_span=1, start_row_offset_idx=0, end_row_offset_idx=1, start_col_offset_idx=0, end_col_offset_idx=1, text='First row first bulletpoint\nFirst row second bulletpoint', column_header=True, row_header=False, row_section=False, fillable=False, ref=RefItem(cref='#/groups/1')), RichTableCell(bbox=None, row_span=1, col_span=1, start_row_offset_idx=1, end_row_offset_idx=2, start_col_offset_idx=0, end_col_offset_idx=1, text='Second row first bulletpoint\nSecond row second bulletpoint', column_header=False, row_header=False, row_section=False, fillable=False, ref=RefItem(cref='#/groups/1'))], num_rows=2, num_cols=1, grid=[[RichTableCell(bbox=None, row_span=1, col_span=1, start_row_offset_idx=0, end_row_offset_idx=1, start_col_offset_idx=0, end_col_offset_idx=1, text='First row first bulletpoint\nFirst row second bulletpoint', column_header=True, row_header=False, row_section=False, fillable=False, ref=RefItem(cref='#/groups/1'))], [RichTableCell(bbox=None, row_span=1, col_span=1, start_row_offset_idx=1, end_row_offset_idx=2, start_col_offset_idx=0, end_col_offset_idx=1, text='Second row first bulletpoint\nSecond row second bulletpoint', column_header=False, row_header=False, row_section=False, fillable=False, ref=RefItem(cref='#/groups/1'))]]), annotations=[]),
  1),
 (GroupItem(self_ref='#/groups/1', parent=RefItem(cref='#/tables/0'), children=[RefItem(cref='#/groups/0')], content_layer=<ContentLayer.BODY: 'body'>, meta=None, name='rich_cell_group_1_0_0', label=<GroupLabel.UNSPECIFIED: 'unspecified'>),
  2),
 (ListGroup(self_ref='#/groups/0', parent=RefItem(cref='#/groups/1'), children=[RefItem(cref='#/texts/0'), RefItem(cref='#/texts/1'), RefItem(cref='#/texts/2'), RefItem(cref='#/texts/3')], content_layer=<ContentLayer.BODY: 'body'>, meta=None, name='list', label=<GroupLabel.LIST: 'list'>),
  3),
 (ListItem(self_ref='#/texts/0', parent=RefItem(cref='#/groups/0'), children=[], content_layer=<ContentLayer.BODY: 'body'>, meta=None, label=<DocItemLabel.LIST_ITEM: 'list_item'>, prov=[], source=[], comments=[], orig='First row first bulletpoint', text='First row first bulletpoint', formatting=Formatting(bold=False, italic=False, underline=False, strikethrough=False, script=<Script.BASELINE: 'baseline'>), hyperlink=None, enumerated=False, marker=''),
  4),
 (ListItem(self_ref='#/texts/1', parent=RefItem(cref='#/groups/0'), children=[], content_layer=<ContentLayer.BODY: 'body'>, meta=None, label=<DocItemLabel.LIST_ITEM: 'list_item'>, prov=[], source=[], comments=[], orig='First row second bulletpoint', text='First row second bulletpoint', formatting=Formatting(bold=False, italic=False, underline=False, strikethrough=False, script=<Script.BASELINE: 'baseline'>), hyperlink=None, enumerated=False, marker=''),
  4),
 (ListItem(self_ref='#/texts/2', parent=RefItem(cref='#/groups/0'), children=[], content_layer=<ContentLayer.BODY: 'body'>, meta=None, label=<DocItemLabel.LIST_ITEM: 'list_item'>, prov=[], source=[], comments=[], orig='Second row first bulletpoint', text='Second row first bulletpoint', formatting=Formatting(bold=False, italic=False, underline=False, strikethrough=False, script=<Script.BASELINE: 'baseline'>), hyperlink=None, enumerated=False, marker=''),
  4),
 (ListItem(self_ref='#/texts/3', parent=RefItem(cref='#/groups/0'), children=[], content_layer=<ContentLayer.BODY: 'body'>, meta=None, label=<DocItemLabel.LIST_ITEM: 'list_item'>, prov=[], source=[], comments=[], orig='Second row second bulletpoint', text='Second row second bulletpoint', formatting=Formatting(bold=False, italic=False, underline=False, strikethrough=False, script=<Script.BASELINE: 'baseline'>), hyperlink=None, enumerated=False, marker=''),
  4)]
```

#### Expected behavior

Each table cell should preserve its own bullet list.

For example:

```html
<table><tbody><tr><th><ul>
<li>First row first bulletpoint</li>
<li>First row second bulletpoint</li>
</ul></th></tr><tr><td><ul>
<li>Second row first bulletpoint</li>
<li>Second row second bulletpoint</li>
</ul></td></tr></tbody></table>
```

#### Workaround

This issue can be avoided by using different `numId` values for the two lists.

Adding a carriage return after the bullet list in the first row also resolves the issue.



### Docling version

Docling version: 2.87.0
Docling Core version: 2.73.0
Docling IBM Models version: 3.13.0
Docling Parse version: 5.8.0
Python: cpython-313 (3.13.5)
Platform: Linux-6.6.87.2-microsoft-standard-WSL2-x86_64-with-glibc2.39

### Python version

Python 3.13.5


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MsWordDocumentBackend does not correctly extract tables when bullet lists in different cells use the same numId #3289

Bug

Steps to reproduce

Using python-docx, create a DOCX file with a 2-cells table containing bullet lists that share the same `numId`

Extract the table

Actual result

Docling item extraction

Expected behavior

Workaround

Docling version

Python version

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

MsWordDocumentBackend does not correctly extract tables when bullet lists in different cells use the same numId #3289

Description

Bug

Steps to reproduce

Using python-docx, create a DOCX file with a 2-cells table containing bullet lists that share the same numId

Extract the table

Actual result

Docling item extraction

Expected behavior

Workaround

Docling version

Python version

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

Using python-docx, create a DOCX file with a 2-cells table containing bullet lists that share the same `numId`