Skip to content

Commit cb34dce

Browse files
vdusekclaude
andcommitted
chore: add agent instruction files as symlinks to .rules
Add .rules file with project conventions and create AGENTS.md, CLAUDE.md, and GEMINI.md as symlinks pointing to it. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
1 parent a918a34 commit cb34dce

File tree

5 files changed

+133
-3
lines changed

5 files changed

+133
-3
lines changed

.gitignore

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -15,9 +15,9 @@
1515
.serena
1616
.windsurf
1717
.zed-ai
18-
AGENTS.md
19-
CLAUDE.md
20-
GEMINI.md
18+
AGENTS.local.md
19+
CLAUDE.local.md
20+
GEMINI.local.md
2121

2222
# Cache
2323
__pycache__

.rules

Lines changed: 127 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,127 @@
1+
# Coding guidelines
2+
3+
This file provides guidance to programming agents when working with code in this repository.
4+
5+
## Development Commands
6+
7+
All commands use `uv` (package manager) and `poe` (task runner):
8+
9+
```bash
10+
# Install all dependencies (dev + extras + pre-commit + playwright)
11+
uv run poe install-dev
12+
13+
# Run full check suite (lint + type-check + unit tests)
14+
uv run poe check-code
15+
16+
# Linting (ruff format check + ruff check)
17+
uv run poe lint
18+
19+
# Auto-fix formatting
20+
uv run poe format
21+
22+
# Type checking (ty)
23+
uv run poe type-check
24+
25+
# Run all unit tests
26+
uv run poe unit-tests
27+
28+
# Run a single test file
29+
uv run pytest tests/unit/path/to/test_file.py
30+
31+
# Run a single test by name
32+
uv run pytest tests/unit/path/to/test_file.py::test_name -v
33+
34+
# Run tests with coverage XML report
35+
uv run poe unit-tests-cov
36+
37+
# Build package
38+
uv run poe build
39+
40+
# Clean build artifacts
41+
uv run poe clean
42+
```
43+
44+
Note: `uv run poe unit-tests` first runs tests marked `@pytest.mark.run_alone` in isolation, then runs the rest with `-x` (fail-fast) and parallelism via `pytest-xdist`.
45+
46+
## Code Style
47+
48+
- **Linter/formatter**: Ruff with `select = ["ALL"]` and specific ignores
49+
- **Line length**: 120 characters
50+
- **Quotes**: Single quotes (double for docstrings)
51+
- **Docstrings**: Google format (enforced by Ruff)
52+
- **Type checker**: ty (Astral's type checker), target Python 3.10
53+
- **Async mode**: pytest-asyncio in `auto` mode (no need for `@pytest.mark.asyncio`)
54+
- **Commit format**: Conventional Commits (`feat:`, `fix:`, `docs:`, `refactor:`, `test:`, etc.)
55+
56+
## Architecture
57+
58+
### Crawler Hierarchy
59+
60+
```
61+
BasicCrawler[TCrawlingContext, TStatisticsState]
62+
├── AbstractHttpCrawler → HttpCrawler, BeautifulSoupCrawler, ParselCrawler
63+
├── PlaywrightCrawler
64+
└── AdaptivePlaywrightCrawler (extends PlaywrightCrawler)
65+
```
66+
67+
- **BasicCrawler** (`src/crawlee/crawlers/_basic/`): Core request lifecycle, autoscaling pool, retries, session management, router dispatch. Generic over `TCrawlingContext`.
68+
- **AbstractHttpCrawler** (`src/crawlee/crawlers/_abstract_http/`): Adds HTTP client integration, response parsing, pre-navigation hooks. Generic over parser result type.
69+
- **PlaywrightCrawler** (`src/crawlee/crawlers/_playwright/`): Browser-based crawling with Playwright.
70+
71+
### Context Pipeline (Middleware Pattern)
72+
73+
Contexts are progressively enhanced through `ContextPipeline` middleware:
74+
75+
```
76+
BasicCrawlingContext → HttpCrawlingContext → ParsedHttpCrawlingContext → BeautifulSoupCrawlingContext
77+
```
78+
79+
Each middleware is an async generator that wraps the next handler, enabling setup/teardown around request processing.
80+
81+
### Storage Layer
82+
83+
Three-tier design:
84+
- **High-level**: `Dataset`, `KeyValueStore`, `RequestQueue` in `src/crawlee/storages/`
85+
- **Storage clients** (`src/crawlee/storage_clients/`): `FileSystemStorageClient` (default), `MemoryStorageClient`, `SqlStorageClient`, `RedisStorageClient`
86+
- **Instance caching**: `StorageInstanceManager` is a global singleton that caches storage instances by ID/name
87+
88+
### Service Locator
89+
90+
`src/crawlee/_service_locator.py` is a global singleton managing `Configuration`, `EventManager`, `StorageClient`, and `StorageInstanceManager`. Prevents double-initialization with `ServiceConflictError`.
91+
92+
### HTTP Clients
93+
94+
Pluggable via `HttpClient` interface in `src/crawlee/http_clients/`:
95+
- `ImpitHttpClient` (default), `HttpxHttpClient`, `CurlImpersonateHttpClient`
96+
- Each provides `crawl()` (for crawler pipeline) and `send_request()` (for in-handler use)
97+
98+
### Request Model
99+
100+
`Request` (`src/crawlee/_request.py`) uses `unique_key` for deduplication. Lifecycle states: `UNPROCESSED → DONE`. Crawlee-specific metadata stored in `user_data['__crawlee']`.
101+
102+
### Router
103+
104+
```python
105+
@crawler.router.default_handler
106+
async def handler(context: BeautifulSoupCrawlingContext): ...
107+
108+
@crawler.router.handler(label='detail')
109+
async def detail(context: BeautifulSoupCrawlingContext): ...
110+
```
111+
112+
Requests are routed by their `label` field; unmatched requests go to the default handler.
113+
114+
### Key Directories
115+
116+
- `src/crawlee/crawlers/` - All crawler implementations
117+
- `src/crawlee/storages/` - Dataset, KVS, RequestQueue
118+
- `src/crawlee/storage_clients/` - Backend implementations
119+
- `src/crawlee/http_clients/` - HTTP client implementations
120+
- `src/crawlee/browsers/` - Playwright browser pool and plugins
121+
- `src/crawlee/sessions/` - Session management with cookie persistence
122+
- `src/crawlee/events/` - Event system (persist state, progress, aborting)
123+
- `src/crawlee/_autoscaling/` - Autoscaled pool for concurrency control
124+
- `src/crawlee/fingerprint_suite/` - Anti-bot fingerprint generation
125+
- `src/crawlee/project_template/` - CLI scaffolding template (excluded from linting)
126+
- `tests/unit/` - Unit tests
127+
- `tests/e2e/` - End-to-end tests (require `apify-cli` + API token)

AGENTS.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
.rules

CLAUDE.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
.rules

GEMINI.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
.rules

0 commit comments

Comments
 (0)