Add REST API endpoints for crawler management#325
Add REST API endpoints for crawler management#325Terrtia merged 3 commits intoail-project:masterfrom
Conversation
…acklist Expose existing crawler management functions as Bearer-token REST API endpoints. These backend functions already exist in bin/lib/crawlers.py and are used by the web UI, but were not accessible programmatically. New endpoints: - GET /api/v1/crawler/scheduler - list scheduled tasks - DELETE /api/v1/crawler/schedule/<uuid> - delete a scheduled task - GET /api/v1/crawler/captures/status - capture statuses - GET /api/v1/crawler/stats - queue/up/down stats - GET /api/v1/crawler/blacklist - list blacklisted domains - POST /api/v1/crawler/blacklist - blacklist a domain - DELETE /api/v1/crawler/blacklist/<domain> - unblacklist a domain
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 6db13bd262
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
|
This PR was largely authored with Claude Code |
- Convert tags sets to lists in scheduler endpoint response - Stringify non-serializable objects (CrawlerSchedule) in delete error responses
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 2a0ed60cc6
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
|
Hey @cln-io ! Thanks for the contribution ! I renamed the following endpoints:
|
Summary
Expose 7 existing crawler backend functions as Bearer-token authenticated REST API endpoints. All backend logic already exists in
bin/lib/crawlers.pyand is used by the web UI — this PR simply wires them up as REST routes inapi_rest.py.Motivation: Currently, managing crawler schedules, checking capture status, and maintaining the domain blocklist can only be done through the session-authenticated web UI. This makes it difficult to build automated tooling on top of AIL's crawler (e.g. programmatically pruning dead schedules, monitoring crawl health, or integrating with external feeders).
Endpoints Added
GET/api/v1/crawler/schedulerDELETE/api/v1/crawler/schedule/<uuid>GET/api/v1/crawler/captures/statusGET/api/v1/crawler/stats?domain_type=onionfilter)GET/api/v1/crawler/blocklistPOST/api/v1/crawler/blocklist{"domain": "..."})DELETE/api/v1/crawler/blocklist/<domain>Implementation Notes
var/www/blueprints/api_rest.pycrawlersandrequestare already availableuserrole, destructive/config operations requireadminget_blacklist()returns a set, wrapped withlist()for JSON serialization<path:...>to handle UUIDs and domains with special characters