Skip to content

mizcausevic-dev/aeo-crawler

Repository files navigation

aeo-crawler

A breadth-first crawler for the AEO Protocol v0.1.

Give it one seed origin. It fetches that origin's /.well-known/aeo.json, then follows every authority.primary_sources URI as a candidate origin to fetch next — up to a configurable depth and total fetch budget. Output is one JSON Lines record per origin attempted, suitable for piping into jq, a graph database, or any analytics pipeline.

Built on top of aeo-sdk-go.

Install

go install github.com/mizcausevic-dev/aeo-crawler/cmd/aeo-crawler@latest

Usage

aeo-crawler --seed https://mizcausevic-dev.github.io

Output (one JSON object per line):

{"origin":"https://mizcausevic-dev.github.io","depth":0,"success":true,"entity_name":"Miz Causevic","entity_type":"Person","claims_count":6,"audit_mode":"none","fetched_at":"2026-05-12T04:00:00Z"}
{"origin":"https://github.com","depth":1,"success":false,"error":"HTTP 404","fetched_at":"2026-05-12T04:00:01Z"}
{"origin":"https://www.linkedin.com","depth":1,"success":false,"error":"HTTP 404","fetched_at":"2026-05-12T04:00:01Z"}
{"origin":"https://mizcausevic.com","depth":1,"success":false,"error":"HTTP 404","fetched_at":"2026-05-12T04:00:01Z"}

Flags

Flag Default Description
--seed required Seed origin URL.
--depth 2 Maximum graph distance from the seed. 0 = only fetch the seed.
--max-fetches 100 Global cap on total fetches.
--concurrency 4 Maximum in-flight HTTP requests.
--timeout 10 Per-request timeout in seconds.

Useful pipelines

Count successful AEO declarations:

aeo-crawler --seed https://mizcausevic-dev.github.io | jq -c 'select(.success==true)' | wc -l

List unique entity names:

aeo-crawler --seed https://mizcausevic-dev.github.io | jq -r 'select(.success==true) | .entity_name' | sort -u

Find origins that declare an audit_mode of signature:

aeo-crawler --seed https://example.com --depth 3 | jq -c 'select(.audit_mode=="signature")'

How discovery works

For each fetched declaration, authority.primary_sources is treated as the source of next-hop candidate origins. Each URI is normalized to its scheme + host (path stripped). Already-visited origins are not re-fetched. The crawler does not currently chase citation_preferences.canonical_links or claims[].evidence — those are roadmap for v0.2.

Conformance

Operates against AEO Protocol v0.1 declarations at conformance Level 1 (Declare). Signature verification (L2) and audit-report submission (L3) are not invoked; signed documents are recorded as audit_mode: "signature" but not verified.

Dependencies

Development

go vet ./...
go test -v ./...
go build ./cmd/aeo-crawler

Tests use httptest to serve fixture AEO documents — no network is required.

Specification

Full spec at github.com/mizcausevic-dev/aeo-protocol-spec.

License

AGPL-3.0.

Kinetic Gain Protocol Suite

Spec Implementation
AEO Protocol aeo-sdk-python · aeo-sdk-typescript · aeo-sdk-rust · aeo-sdk-go · aeo-cli · aeo-crawler (this)
Prompt Provenance
Agent Cards
AI Evidence Format
MCP Tool Cards

Connect: LinkedIn · Kinetic Gain · Medium · Skills

About

BFS crawler for AEO Protocol v0.1 declaration graphs. Seed an origin, follow primary_source URIs, emit JSON Lines records of every fetch. Built on aeo-sdk-go. Concurrent, depth-limited, budget-capped, stdlib-only HTTP.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages