Skip to content

Add REST API endpoints for crawler management#325

Merged
Terrtia merged 3 commits intoail-project:masterfrom
cln-io:feat/crawler-rest-api-endpoints
Mar 18, 2026
Merged

Add REST API endpoints for crawler management#325
Terrtia merged 3 commits intoail-project:masterfrom
cln-io:feat/crawler-rest-api-endpoints

Conversation

@cln-io
Copy link
Copy Markdown
Contributor

@cln-io cln-io commented Mar 16, 2026

Summary

Expose 7 existing crawler backend functions as Bearer-token authenticated REST API endpoints. All backend logic already exists in bin/lib/crawlers.py and is used by the web UI — this PR simply wires them up as REST routes in api_rest.py.

Motivation: Currently, managing crawler schedules, checking capture status, and maintaining the domain blocklist can only be done through the session-authenticated web UI. This makes it difficult to build automated tooling on top of AIL's crawler (e.g. programmatically pruning dead schedules, monitoring crawl health, or integrating with external feeders).

Endpoints Added

Method Endpoint Auth Description
GET /api/v1/crawler/scheduler user List all scheduled crawl tasks
DELETE /api/v1/crawler/schedule/<uuid> admin Delete a scheduled task
GET /api/v1/crawler/captures/status user Get status of all active captures
GET /api/v1/crawler/stats user Get crawler queue/up/down stats (optional ?domain_type=onion filter)
GET /api/v1/crawler/blocklist admin List all blocklisted domains
POST /api/v1/crawler/blocklist admin Add domain to blocklist ({"domain": "..."})
DELETE /api/v1/crawler/blocklist/<domain> admin Remove domain from blocklist

Implementation Notes

  • Single file change: var/www/blueprints/api_rest.py
  • No new imports — crawlers and request are already available
  • Auth levels match the web UI: read-only operations use user role, destructive/config operations require admin
  • get_blacklist() returns a set, wrapped with list() for JSON serialization
  • Route path converters use <path:...> to handle UUIDs and domains with special characters
  • Scheduler tags (sets) are converted to lists for JSON serialization
  • Non-serializable objects in error responses are stringified

…acklist

Expose existing crawler management functions as Bearer-token REST API
endpoints. These backend functions already exist in bin/lib/crawlers.py
and are used by the web UI, but were not accessible programmatically.

New endpoints:
- GET  /api/v1/crawler/scheduler         - list scheduled tasks
- DELETE /api/v1/crawler/schedule/<uuid>  - delete a scheduled task
- GET  /api/v1/crawler/captures/status    - capture statuses
- GET  /api/v1/crawler/stats              - queue/up/down stats
- GET  /api/v1/crawler/blacklist          - list blacklisted domains
- POST /api/v1/crawler/blacklist          - blacklist a domain
- DELETE /api/v1/crawler/blacklist/<domain> - unblacklist a domain
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 6db13bd262

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread var/www/blueprints/api_rest.py
Comment thread var/www/blueprints/api_rest.py Outdated
@cln-io
Copy link
Copy Markdown
Contributor Author

cln-io commented Mar 16, 2026

This PR was largely authored with Claude Code

- Convert tags sets to lists in scheduler endpoint response
- Stringify non-serializable objects (CrawlerSchedule) in delete error responses
@cln-io cln-io changed the title Add REST API endpoints for crawler scheduler, captures, stats, and blacklist Add REST API endpoints for crawler management Mar 16, 2026
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 2a0ed60cc6

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread var/www/blueprints/api_rest.py
@Terrtia
Copy link
Copy Markdown
Member

Terrtia commented Mar 18, 2026

Hey @cln-io !

Thanks for the contribution !

I renamed the following endpoints:

Old endpoint New endpoint Method
/api/v1/crawler/schedule/ /api/v1/crawler/schedule/delete/ DELETE
/api/v1/crawler/captures/status /api/v1/crawler/captures GET
/api/v1/crawler/blocklist /api/v1/crawler/blacklist GET
/api/v1/crawler/blocklist /api/v1/crawler/blacklist/add POST
/api/v1/crawler/blocklist/ /api/v1/crawler/blacklist/delete/ DELETE

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants