Skip to content

Commit 1debe5f

Browse files
authored
Merge pull request #1885 from unclecode/develop
docs: update version references to 0.8.6
2 parents af648e1 + bcbccbe commit 1debe5f

4 files changed

Lines changed: 194 additions & 12 deletions

File tree

README.md

Lines changed: 16 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -37,15 +37,15 @@ Limited slots._
3737

3838
Crawl4AI turns the web into clean, LLM ready Markdown for RAG, agents, and data pipelines. Fast, controllable, battle tested by a 50k+ star community.
3939

40-
[✨ Check out latest update v0.8.5](#-recent-updates)
40+
[✨ Check out latest update v0.8.6](#-recent-updates)
4141

42-
**New in v0.8.5**: Anti-Bot Detection, Shadow DOM & 60+ Bug Fixes! Automatic 3-tier anti-bot detection with proxy escalation, Shadow DOM flattening, deep crawl cancellation, config defaults API, consent popup removal, and critical security patches. [Release notes →](https://github.com/unclecode/crawl4ai/blob/main/docs/blog/release-v0.8.5.md)
42+
**New in v0.8.6**: Security hotfix — replaced `litellm` with `unclecode-litellm` due to a PyPI supply chain compromise. If you're on v0.8.5, please upgrade immediately.
4343

44-
✨ Recent v0.8.0: Crash Recovery & Prefetch Mode! Deep crawl crash recovery with `resume_state` and `on_state_change` callbacks for long-running crawls. New `prefetch=True` mode for 5-10x faster URL discovery. [Release notes →](https://github.com/unclecode/crawl4ai/blob/main/docs/blog/release-v0.8.0.md)
44+
✨ Recent v0.8.5: Anti-Bot Detection, Shadow DOM & 60+ Bug Fixes! Automatic 3-tier anti-bot detection with proxy escalation, Shadow DOM flattening, deep crawl cancellation, config defaults API, consent popup removal, and critical security patches. [Release notes →](https://github.com/unclecode/crawl4ai/blob/main/docs/blog/release-v0.8.5.md)
4545

46-
✨ Previous v0.7.8: Stability & Bug Fix Release! 11 bug fixes addressing Docker API issues, LLM extraction improvements, URL handling fixes, and dependency updates. [Release notes →](https://github.com/unclecode/crawl4ai/blob/main/docs/blog/release-v0.7.8.md)
46+
✨ Previous v0.8.0: Crash Recovery & Prefetch Mode! Deep crawl crash recovery with `resume_state` and `on_state_change` callbacks for long-running crawls. New `prefetch=True` mode for 5-10x faster URL discovery. [Release notes →](https://github.com/unclecode/crawl4ai/blob/main/docs/blog/release-v0.8.0.md)
4747

48-
✨ Previous v0.7.7: Complete Self-Hosting Platform with Real-time Monitoring! Enterprise-grade monitoring dashboard, comprehensive REST API, WebSocket streaming, and smart browser pool management. [Release notes →](https://github.com/unclecode/crawl4ai/blob/main/docs/blog/release-v0.7.7.md)
48+
✨ Previous v0.7.8: Stability & Bug Fix Release! 11 bug fixes addressing Docker API issues, LLM extraction improvements, URL handling fixes, and dependency updates. [Release notes →](https://github.com/unclecode/crawl4ai/blob/main/docs/blog/release-v0.7.8.md)
4949

5050
<details>
5151
<summary>🤓 <strong>My Personal Story</strong></summary>
@@ -565,6 +565,17 @@ async def test_news_crawl():
565565
## ✨ Recent Updates
566566

567567
<details open>
568+
<summary><strong>Version 0.8.6 — Security Hotfix: litellm Supply Chain Fix</strong></summary>
569+
570+
Replaced `litellm` dependency with `unclecode-litellm` due to a PyPI supply chain compromise affecting the original package. If you're on v0.8.5 or earlier, upgrade immediately.
571+
572+
```bash
573+
pip install -U crawl4ai
574+
```
575+
576+
</details>
577+
578+
<details>
568579
<summary><strong>Version 0.8.5 Release Highlights - Anti-Bot Detection, Shadow DOM & 60+ Bug Fixes</strong></summary>
569580

570581
Our biggest release since v0.8.0. Anti-bot detection with proxy escalation, Shadow DOM flattening, deep crawl cancellation, and over 60 bug fixes.

crawl4ai/async_configs.py

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1660,6 +1660,19 @@ def __init__(
16601660
raise ValueError(
16611661
"chunking_strategy must be an instance of ChunkingStrategy"
16621662
)
1663+
if self.markdown_generator is not None and not isinstance(
1664+
self.markdown_generator, MarkdownGenerationStrategy
1665+
):
1666+
hint = ""
1667+
if isinstance(self.markdown_generator, dict):
1668+
hint = (
1669+
' The JSON format must be {"type": "<ClassName>", "params": {...}}.'
1670+
' Note: "params" is required — "options" or other keys are not recognized.'
1671+
)
1672+
raise ValueError(
1673+
"markdown_generator must be an instance of MarkdownGenerationStrategy, "
1674+
f"got {type(self.markdown_generator).__name__}.{hint}"
1675+
)
16631676

16641677
# Set default chunking strategy if None
16651678
if self.chunking_strategy is None:

deploy/docker/README.md

Lines changed: 7 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -59,13 +59,13 @@ Pull and run images directly from Docker Hub without building locally.
5959

6060
#### 1. Pull the Image
6161

62-
Our latest stable release is `0.8.5`. Images are built with multi-arch manifests, so Docker automatically pulls the correct version for your system.
62+
Our latest stable release is `0.8.6`. Images are built with multi-arch manifests, so Docker automatically pulls the correct version for your system.
6363

6464
```bash
65-
# Pull the latest stable version (0.8.5)
66-
docker pull unclecode/crawl4ai:0.8.5
65+
# Pull the latest stable version (0.8.6)
66+
docker pull unclecode/crawl4ai:0.8.6
6767

68-
# Or use the latest tag (points to 0.8.0)
68+
# Or use the latest tag
6969
docker pull unclecode/crawl4ai:latest
7070
```
7171

@@ -100,7 +100,7 @@ EOL
100100
-p 11235:11235 \
101101
--name crawl4ai \
102102
--shm-size=1g \
103-
unclecode/crawl4ai:0.8.5
103+
unclecode/crawl4ai:0.8.6
104104
```
105105

106106
* **With LLM support:**
@@ -111,7 +111,7 @@ EOL
111111
--name crawl4ai \
112112
--env-file .llm.env \
113113
--shm-size=1g \
114-
unclecode/crawl4ai:0.8.5
114+
unclecode/crawl4ai:0.8.6
115115
```
116116

117117
> The server will be available at `http://localhost:11235`. Visit `/playground` to access the interactive testing interface.
@@ -184,7 +184,7 @@ The `docker-compose.yml` file in the project root provides a simplified approach
184184
```bash
185185
# Pulls and runs the release candidate from Docker Hub
186186
# Automatically selects the correct architecture
187-
IMAGE=unclecode/crawl4ai:0.8.5 docker compose up -d
187+
IMAGE=unclecode/crawl4ai:0.8.6 docker compose up -d
188188
```
189189

190190
* **Build and Run Locally:**
Lines changed: 158 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,158 @@
1+
"""
2+
Tests for #1880: markdown_generator deserialization validation in CrawlerRunConfig
3+
4+
Ensures that:
5+
1. Correct {"type": ..., "params": {...}} format deserializes properly
6+
2. Wrong key names ("options") raise a clear ValueError, not a cryptic AttributeError
7+
3. Nested content_filter deserializes correctly
8+
"""
9+
import pytest
10+
11+
12+
class TestMarkdownGeneratorDeserialization:
13+
"""Test CrawlerRunConfig.load() with markdown_generator configs."""
14+
15+
def test_params_key_deserializes_correctly(self):
16+
"""{"type": ..., "params": {...}} should produce a real object."""
17+
from crawl4ai.async_configs import CrawlerRunConfig
18+
19+
data = {
20+
"markdown_generator": {
21+
"type": "DefaultMarkdownGenerator",
22+
"params": {},
23+
}
24+
}
25+
config = CrawlerRunConfig.load(data)
26+
from crawl4ai.markdown_generation_strategy import DefaultMarkdownGenerator
27+
assert isinstance(config.markdown_generator, DefaultMarkdownGenerator)
28+
29+
def test_params_with_content_filter(self):
30+
"""Nested BM25ContentFilter should deserialize inside markdown_generator."""
31+
from crawl4ai.async_configs import CrawlerRunConfig
32+
from crawl4ai.content_filter_strategy import BM25ContentFilter
33+
34+
data = {
35+
"markdown_generator": {
36+
"type": "DefaultMarkdownGenerator",
37+
"params": {
38+
"content_filter": {
39+
"type": "BM25ContentFilter",
40+
"params": {
41+
"user_query": "example",
42+
"bm25_threshold": 0.9,
43+
},
44+
}
45+
},
46+
}
47+
}
48+
config = CrawlerRunConfig.load(data)
49+
assert isinstance(config.markdown_generator.content_filter, BM25ContentFilter)
50+
assert config.markdown_generator.content_filter.user_query == "example"
51+
assert config.markdown_generator.content_filter.bm25_threshold == 0.9
52+
53+
def test_params_with_pruning_filter(self):
54+
"""PruningContentFilter should also work."""
55+
from crawl4ai.async_configs import CrawlerRunConfig
56+
from crawl4ai.content_filter_strategy import PruningContentFilter
57+
58+
data = {
59+
"markdown_generator": {
60+
"type": "DefaultMarkdownGenerator",
61+
"params": {
62+
"content_filter": {
63+
"type": "PruningContentFilter",
64+
"params": {},
65+
}
66+
},
67+
}
68+
}
69+
config = CrawlerRunConfig.load(data)
70+
assert isinstance(config.markdown_generator.content_filter, PruningContentFilter)
71+
72+
def test_options_key_raises_clear_error(self):
73+
"""Using "options" instead of "params" should raise ValueError with hint."""
74+
from crawl4ai.async_configs import CrawlerRunConfig
75+
76+
data = {
77+
"markdown_generator": {
78+
"type": "DefaultMarkdownGenerator",
79+
"options": {"content_filter": {}},
80+
}
81+
}
82+
with pytest.raises(ValueError, match="params.*required"):
83+
CrawlerRunConfig.load(data)
84+
85+
def test_arbitrary_key_raises_clear_error(self):
86+
"""Any non-"params" key should raise ValueError."""
87+
from crawl4ai.async_configs import CrawlerRunConfig
88+
89+
data = {
90+
"markdown_generator": {
91+
"type": "DefaultMarkdownGenerator",
92+
"settings": {},
93+
}
94+
}
95+
with pytest.raises(ValueError, match="markdown_generator must be an instance"):
96+
CrawlerRunConfig.load(data)
97+
98+
def test_plain_dict_raises_clear_error(self):
99+
"""A dict without type/params structure should raise ValueError."""
100+
from crawl4ai.async_configs import CrawlerRunConfig
101+
102+
data = {
103+
"markdown_generator": {"foo": "bar"}
104+
}
105+
with pytest.raises(ValueError, match="got dict"):
106+
CrawlerRunConfig.load(data)
107+
108+
def test_error_message_mentions_params_key(self):
109+
"""Error message should specifically mention that 'params' is required."""
110+
from crawl4ai.async_configs import CrawlerRunConfig
111+
112+
data = {
113+
"markdown_generator": {
114+
"type": "DefaultMarkdownGenerator",
115+
"options": {},
116+
}
117+
}
118+
with pytest.raises(ValueError) as exc_info:
119+
CrawlerRunConfig.load(data)
120+
msg = str(exc_info.value)
121+
assert "params" in msg
122+
assert "options" in msg or "not recognized" in msg
123+
124+
def test_none_markdown_generator_uses_default(self):
125+
"""None should use the default (DefaultMarkdownGenerator)."""
126+
from crawl4ai.async_configs import CrawlerRunConfig
127+
from crawl4ai.markdown_generation_strategy import DefaultMarkdownGenerator
128+
129+
config = CrawlerRunConfig(markdown_generator=None)
130+
# None is allowed — the crawler falls back to default behavior
131+
assert config.markdown_generator is None
132+
133+
def test_valid_instance_passes_validation(self):
134+
"""Passing an actual instance should work fine."""
135+
from crawl4ai.async_configs import CrawlerRunConfig
136+
from crawl4ai.markdown_generation_strategy import DefaultMarkdownGenerator
137+
from crawl4ai.content_filter_strategy import BM25ContentFilter
138+
139+
gen = DefaultMarkdownGenerator(
140+
content_filter=BM25ContentFilter(user_query="test")
141+
)
142+
config = CrawlerRunConfig(markdown_generator=gen)
143+
assert config.markdown_generator is gen
144+
assert config.markdown_generator.content_filter.user_query == "test"
145+
146+
147+
class TestExistingValidationStillWorks:
148+
"""Ensure existing extraction_strategy/chunking_strategy validation unchanged."""
149+
150+
def test_extraction_strategy_dict_raises(self):
151+
from crawl4ai.async_configs import CrawlerRunConfig
152+
with pytest.raises(ValueError, match="extraction_strategy"):
153+
CrawlerRunConfig(extraction_strategy={"type": "bad"})
154+
155+
def test_chunking_strategy_dict_raises(self):
156+
from crawl4ai.async_configs import CrawlerRunConfig
157+
with pytest.raises(ValueError, match="chunking_strategy"):
158+
CrawlerRunConfig(chunking_strategy={"type": "bad"})

0 commit comments

Comments
 (0)