The two bullet lists are merged into a single list in the first table cell, and the second cell is left empty.
[(GroupItem(self_ref='#/body', parent=None, children=[RefItem(cref='#/tables/0')], content_layer=<ContentLayer.BODY: 'body'>, meta=None, name='_root_', label=<GroupLabel.UNSPECIFIED: 'unspecified'>),
0),
(TableItem(self_ref='#/tables/0', parent=RefItem(cref='#/body'), children=[RefItem(cref='#/groups/1')], content_layer=<ContentLayer.BODY: 'body'>, meta=None, label=<DocItemLabel.TABLE: 'table'>, prov=[], source=[], comments=[], captions=[], references=[], footnotes=[], image=None, data=TableData(table_cells=[RichTableCell(bbox=None, row_span=1, col_span=1, start_row_offset_idx=0, end_row_offset_idx=1, start_col_offset_idx=0, end_col_offset_idx=1, text='First row first bulletpoint\nFirst row second bulletpoint', column_header=True, row_header=False, row_section=False, fillable=False, ref=RefItem(cref='#/groups/1')), RichTableCell(bbox=None, row_span=1, col_span=1, start_row_offset_idx=1, end_row_offset_idx=2, start_col_offset_idx=0, end_col_offset_idx=1, text='Second row first bulletpoint\nSecond row second bulletpoint', column_header=False, row_header=False, row_section=False, fillable=False, ref=RefItem(cref='#/groups/1'))], num_rows=2, num_cols=1, grid=[[RichTableCell(bbox=None, row_span=1, col_span=1, start_row_offset_idx=0, end_row_offset_idx=1, start_col_offset_idx=0, end_col_offset_idx=1, text='First row first bulletpoint\nFirst row second bulletpoint', column_header=True, row_header=False, row_section=False, fillable=False, ref=RefItem(cref='#/groups/1'))], [RichTableCell(bbox=None, row_span=1, col_span=1, start_row_offset_idx=1, end_row_offset_idx=2, start_col_offset_idx=0, end_col_offset_idx=1, text='Second row first bulletpoint\nSecond row second bulletpoint', column_header=False, row_header=False, row_section=False, fillable=False, ref=RefItem(cref='#/groups/1'))]]), annotations=[]),
1),
(GroupItem(self_ref='#/groups/1', parent=RefItem(cref='#/tables/0'), children=[RefItem(cref='#/groups/0')], content_layer=<ContentLayer.BODY: 'body'>, meta=None, name='rich_cell_group_1_0_0', label=<GroupLabel.UNSPECIFIED: 'unspecified'>),
2),
(ListGroup(self_ref='#/groups/0', parent=RefItem(cref='#/groups/1'), children=[RefItem(cref='#/texts/0'), RefItem(cref='#/texts/1'), RefItem(cref='#/texts/2'), RefItem(cref='#/texts/3')], content_layer=<ContentLayer.BODY: 'body'>, meta=None, name='list', label=<GroupLabel.LIST: 'list'>),
3),
(ListItem(self_ref='#/texts/0', parent=RefItem(cref='#/groups/0'), children=[], content_layer=<ContentLayer.BODY: 'body'>, meta=None, label=<DocItemLabel.LIST_ITEM: 'list_item'>, prov=[], source=[], comments=[], orig='First row first bulletpoint', text='First row first bulletpoint', formatting=Formatting(bold=False, italic=False, underline=False, strikethrough=False, script=<Script.BASELINE: 'baseline'>), hyperlink=None, enumerated=False, marker=''),
4),
(ListItem(self_ref='#/texts/1', parent=RefItem(cref='#/groups/0'), children=[], content_layer=<ContentLayer.BODY: 'body'>, meta=None, label=<DocItemLabel.LIST_ITEM: 'list_item'>, prov=[], source=[], comments=[], orig='First row second bulletpoint', text='First row second bulletpoint', formatting=Formatting(bold=False, italic=False, underline=False, strikethrough=False, script=<Script.BASELINE: 'baseline'>), hyperlink=None, enumerated=False, marker=''),
4),
(ListItem(self_ref='#/texts/2', parent=RefItem(cref='#/groups/0'), children=[], content_layer=<ContentLayer.BODY: 'body'>, meta=None, label=<DocItemLabel.LIST_ITEM: 'list_item'>, prov=[], source=[], comments=[], orig='Second row first bulletpoint', text='Second row first bulletpoint', formatting=Formatting(bold=False, italic=False, underline=False, strikethrough=False, script=<Script.BASELINE: 'baseline'>), hyperlink=None, enumerated=False, marker=''),
4),
(ListItem(self_ref='#/texts/3', parent=RefItem(cref='#/groups/0'), children=[], content_layer=<ContentLayer.BODY: 'body'>, meta=None, label=<DocItemLabel.LIST_ITEM: 'list_item'>, prov=[], source=[], comments=[], orig='Second row second bulletpoint', text='Second row second bulletpoint', formatting=Formatting(bold=False, italic=False, underline=False, strikethrough=False, script=<Script.BASELINE: 'baseline'>), hyperlink=None, enumerated=False, marker=''),
4)]
Each table cell should preserve its own bullet list.
Adding a carriage return after the bullet list in the first row also resolves the issue.
Bug
When a DOCX table contains multiple bullet lists with the same
numId,MsWordDocumentBackenddoes not extract the table correctly.Steps to reproduce
Using python-docx, create a DOCX file with a 2-cells table containing bullet lists that share the same
numIdExtract the table
Actual result
The two bullet lists are merged into a single list in the first table cell, and the second cell is left empty.
Docling item extraction
The extracted items show that all list items were attached to a single
ListGroup:Expected behavior
Each table cell should preserve its own bullet list.
For example:
Workaround
This issue can be avoided by using different
numIdvalues for the two lists.Adding a carriage return after the bullet list in the first row also resolves the issue.
Docling version
Docling version: 2.87.0
Docling Core version: 2.73.0
Docling IBM Models version: 3.13.0
Docling Parse version: 5.8.0
Python: cpython-313 (3.13.5)
Platform: Linux-6.6.87.2-microsoft-standard-WSL2-x86_64-with-glibc2.39
Python version
Python 3.13.5