Skip to content

[Bug]: recurring dates get clubbed together in markdown output #1980

@anishd19

Description

@anishd19

crawl4ai version

0.8.5

Expected Behavior

output md to have:

elementalLONDON 2026
**Date:** 25–26 Nov 2026
**Date:** 2026-11-25
**Date:** 2026-11-26
London Build Expo
**Date:** 25–26 Nov 2026
**Date:** 2026-11-25
**Date:** 2026-11-26
emex - The Energy Management + Smart Buildings Expo
**Date:** 25–26 Nov 2026
**Date:** 2026-11-25
**Date:** 2026-11-26
100% Optical 2027
**Date:** 27 Feb–01 Mar 2027
**Date:** 2027-02-27
**Date:** 2027-03-01

Current Behavior

elementalLONDON 2026
**Date:** 25–26 Nov 2026
**Date:** 2026-11-25
**Date:** 2026-11-26
London Build Expo
emex - The Energy Management + Smart Buildings Expo
100% Optical 2027
**Date:** 27 Feb–01 Mar 2027
**Date:** 2027-02-27
**Date:** 2027-03-01

Is this reproducible?

Yes

Inputs Causing the Bug

URL: https://www.excel.london/whats-on


<a href="/whats-on/elementallondon-2026" class="item s2 woverlay image_first afterhours " id="item1537" style="position: absolute; left: 0px; top: 5400px;">
					<img src="https://www.excel.london/cdn/w_528/h_258/crop/climate-solutions-theatre_58_3144.jpg" alt="" class="bgimg">
					<span class="overlay-opacity o20"></span>
					<span class="overlay" style="background:#003040;"></span>
					<span class="name">elementalLONDON 2026</span><span class="display-date">25–26 Nov 2026</span>
					<span class="display-date2">25–26 Nov 2026</span>
					<span class="read-more">Find out more</span>
					<span class="date">2026-11-25</span>
					<span class="date2">2026-11-26</span>
				</a>
				<a href="/whats-on/london-build-expo" class="item s2 woverlay image_first afterhours " id="item1597" style="position: absolute; left: 0px; top: 5670px;">
					<img src="https://www.excel.london/cdn/w_528/h_258/crop/img_1857.jpeg" alt="" class="bgimg">
					<span class="overlay-opacity o20"></span>
					<span class="overlay" style="background:#003040;"></span>
					<span class="name">London Build Expo</span><span class="display-date">25–26 Nov 2026</span>
					<span class="display-date2">25–26 Nov 2026</span>
					<span class="read-more">Find out more</span>
					<span class="date">2026-11-25</span>
					<span class="date2">2026-11-26</span>
				</a>
				<a href="/whats-on/emex-the-energy-management-smart-buildings-expo" class="item s1 woverlay image_first afterhours " id="item1617" style="position: absolute; left: 540px; top: 5670px;">
					<img src="https://www.excel.london/cdn/w_258/h_258/crop/emex-25-lr-jamie-hodgskin-057.jpg" alt="" class="bgimg">
					<span class="overlay-opacity o0"></span>
					<span class="overlay" style="background:;"></span>
					<span class="name">emex - The Energy Management + Smart Buildings Expo</span><span class="display-date">25–26 Nov 2026</span>
					<span class="display-date2">25–26 Nov 2026</span>
					<span class="read-more">Find out more</span>
					<span class="date">2026-11-25</span>
					<span class="date2">2026-11-26</span>
				</a>
				<a href="/whats-on/100-optical-2027" class="item s1 woverlay image_first afterhours " id="item1614" style="position: absolute; left: 810px; top: 5670px;">
					<img src="https://www.excel.london/cdn/w_258/h_258/crop/100.jpg" alt="" class="bgimg">
					<span class="overlay-opacity o0"></span>
					<span class="overlay" style="background: rgb(0, 48, 64); display: inline; opacity: 0.0295596;"></span>
					<span class="name">100% Optical 2027</span><span class="display-date">27 Feb–01 Mar 2027</span>
					<span class="display-date2">27 Feb–01 Mar 2027</span>
					<span class="read-more">Find out more</span>
					<span class="date">2027-02-27</span>
					<span class="date2">2027-03-01</span>
				</a>

Steps to Reproduce

crawling https://www.excel.london/whats-on with crawl4ai with default crawler config from docs, provides me the markdown output as in screenshot.
Whenever there's recurring pattern such as date **Date:** 2026-11-25 **Date:** 2026-11-26, markdown output clubs it to a single value - which is not expected.

How do I disable this?

Code snippets

OS

Linux

Python version

3.14.2

Browser

Chrome

Browser version

No response

Error logs & Screenshots (if applicable)

Image

Metadata

Metadata

Assignees

No one assigned

    Labels

    🐞 BugSomething isn't working🩺 Needs TriageNeeds attention of maintainers

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions