|
27 | 27 |
|
28 | 28 | Crawl4AI turns the web into clean, LLM ready Markdown for RAG, agents, and data pipelines. Fast, controllable, battle tested by a 50k+ star community. |
29 | 29 |
|
30 | | -[✨ Check out latest update v0.7.7](#-recent-updates) |
| 30 | +[✨ Check out latest update v0.7.8](#-recent-updates) |
31 | 31 |
|
32 | | -✨ **New in v0.7.7**: Complete Self-Hosting Platform with Real-time Monitoring! Enterprise-grade monitoring dashboard, comprehensive REST API, WebSocket streaming, smart browser pool management, and production-ready observability. Full visibility and control over your crawling infrastructure. [Release notes →](https://github.com/unclecode/crawl4ai/blob/main/docs/blog/release-v0.7.7.md) |
| 32 | +✨ **New in v0.7.8**: Stability & Bug Fix Release! 11 bug fixes addressing Docker API issues (ContentRelevanceFilter, ProxyConfig, cache permissions), LLM extraction improvements (configurable backoff, HTML input format), URL handling fixes, and dependency updates (pypdf, Pydantic v2). [Release notes →](https://github.com/unclecode/crawl4ai/blob/main/docs/blog/release-v0.7.8.md) |
33 | 33 |
|
34 | | -✨ Recent v0.7.6: Complete Webhook Infrastructure for Docker Job Queue API! Real-time notifications for both `/crawl/job` and `/llm/job` endpoints with exponential backoff retry, custom headers, and flexible delivery modes. No more polling! [Release notes →](https://github.com/unclecode/crawl4ai/blob/main/docs/blog/release-v0.7.6.md) |
| 34 | +✨ Recent v0.7.7: Complete Self-Hosting Platform with Real-time Monitoring! Enterprise-grade monitoring dashboard, comprehensive REST API, WebSocket streaming, smart browser pool management, and production-ready observability. [Release notes →](https://github.com/unclecode/crawl4ai/blob/main/docs/blog/release-v0.7.7.md) |
35 | 35 |
|
36 | | -✨ Previous v0.7.5: Docker Hooks System with function-based API for pipeline customization, Enhanced LLM Integration with custom providers, HTTPS Preservation, and multiple community-reported bug fixes. [Release notes →](https://github.com/unclecode/crawl4ai/blob/main/docs/blog/release-v0.7.5.md) |
| 36 | +✨ Previous v0.7.6: Complete Webhook Infrastructure for Docker Job Queue API! Real-time notifications for both `/crawl/job` and `/llm/job` endpoints with exponential backoff retry, custom headers, and flexible delivery modes. [Release notes →](https://github.com/unclecode/crawl4ai/blob/main/docs/blog/release-v0.7.6.md) |
37 | 37 |
|
38 | 38 | <details> |
39 | 39 | <summary>🤓 <strong>My Personal Story</strong></summary> |
@@ -552,6 +552,55 @@ async def test_news_crawl(): |
552 | 552 |
|
553 | 553 | ## ✨ Recent Updates |
554 | 554 |
|
| 555 | +<details> |
| 556 | +<summary><strong>Version 0.7.8 Release Highlights - Stability & Bug Fix Release</strong></summary> |
| 557 | + |
| 558 | +This release focuses on stability with 11 bug fixes addressing issues reported by the community. No new features, but significant improvements to reliability. |
| 559 | + |
| 560 | +- **🐳 Docker API Fixes**: |
| 561 | + - Fixed `ContentRelevanceFilter` deserialization in deep crawl requests (#1642) |
| 562 | + - Fixed `ProxyConfig` JSON serialization in `BrowserConfig.to_dict()` (#1629) |
| 563 | + - Fixed `.cache` folder permissions in Docker image (#1638) |
| 564 | + |
| 565 | +- **🤖 LLM Extraction Improvements**: |
| 566 | + - Configurable rate limiter backoff with new `LLMConfig` parameters (#1269): |
| 567 | + ```python |
| 568 | + from crawl4ai import LLMConfig |
| 569 | + |
| 570 | + config = LLMConfig( |
| 571 | + provider="openai/gpt-4o-mini", |
| 572 | + backoff_base_delay=5, # Wait 5s on first retry |
| 573 | + backoff_max_attempts=5, # Try up to 5 times |
| 574 | + backoff_exponential_factor=3 # Multiply delay by 3 each attempt |
| 575 | + ) |
| 576 | + ``` |
| 577 | + - HTML input format support for `LLMExtractionStrategy` (#1178): |
| 578 | + ```python |
| 579 | + from crawl4ai import LLMExtractionStrategy |
| 580 | + |
| 581 | + strategy = LLMExtractionStrategy( |
| 582 | + llm_config=config, |
| 583 | + instruction="Extract table data", |
| 584 | + input_format="html" # Now supports: "html", "markdown", "fit_markdown" |
| 585 | + ) |
| 586 | + ``` |
| 587 | + - Fixed raw HTML URL variable - extraction strategies now receive `"Raw HTML"` instead of HTML blob (#1116) |
| 588 | + |
| 589 | +- **🔗 URL Handling**: |
| 590 | + - Fixed relative URL resolution after JavaScript redirects (#1268) |
| 591 | + - Fixed import statement formatting in extracted code (#1181) |
| 592 | + |
| 593 | +- **📦 Dependency Updates**: |
| 594 | + - Replaced deprecated PyPDF2 with pypdf (#1412) |
| 595 | + - Pydantic v2 ConfigDict compatibility - no more deprecation warnings (#678) |
| 596 | + |
| 597 | +- **🧠 AdaptiveCrawler**: |
| 598 | + - Fixed query expansion to actually use LLM instead of hardcoded mock data (#1621) |
| 599 | + |
| 600 | +[Full v0.7.8 Release Notes →](https://github.com/unclecode/crawl4ai/blob/main/docs/blog/release-v0.7.8.md) |
| 601 | + |
| 602 | +</details> |
| 603 | + |
555 | 604 | <details> |
556 | 605 | <summary><strong>Version 0.7.7 Release Highlights - The Self-Hosting & Monitoring Update</strong></summary> |
557 | 606 |
|
|
0 commit comments