The Ocean Node Indexer is built with the following design principles (see Architecture.md for details):
- Single-threaded, non-blocking design: Uses Node.js async/await for concurrent execution across multiple chains
- ChainIndexer instances: Each blockchain network is monitored by a dedicated ChainIndexer instance running concurrently via the event loop
- Event-driven communication: Components communicate through EventEmitter for clean separation of concerns
- Efficient I/O handling: All RPC calls, database operations, and network requests are non-blocking, allowing high concurrency without worker threads
The refactoring strategy below maintains the core single-threaded, non-blocking architecture while introducing key improvements:
- EventEmitter → Persistent Queues: Replace synchronous EventEmitter with persistent queue system for better reliability and observability
- Event-level retry: Move retry logic from embedded operations to event-level processing
- Enhanced error tracking: Introduce comprehensive logging via
ddo_logsindex - Per-chain resilience: Isolate chain failures to prevent cascading issues
These changes preserve the efficient I/O model and concurrent ChainIndexer execution while adding production-grade error handling and monitoring.
Status: Draft proposals from Jan 27, 2026 meeting
Goal: Improve reliability, decoupling, and error handling
Current Issue: Retry logic is deeply embedded in event processing steps (e.g., inside DDO decryption)
Proposed Change:
- Move retry logic to event level (not deep inside processing steps)
- Implement queue-based retry system for all 12 event types
- Decouple retry from specific operations (e.g., decrypt, p2p, HTTP)
Implementation:
┌─────────────────────────────────────────────────────────────┐
│ EVENT ERROR PROCESSING QUEUE │
├─────────────────────────────────────────────────────────────┤
│ │
│ Event Detected │
│ ↓ │
│ Send to Processor │
│ ↓ │
│ Process Event │
│ ├─ Success → Mark complete, update DB │
│ └─ Failure → Add to Retry Queue with backoff │
│ │
│ Retry Queue (exponential backoff): │
│ - Retry 1: ~10 seconds │
│ - Retry 2: ~1 minute │
│ - Retry 3: ~10 minutes │
│ - Retry 4: ~1 hour │
│ - Retry 5: ~1 week (final attempt) │
│ │
│ Benefits: │
│ ✓ Non-blocking (doesn't halt chain indexing) │
│ ✓ Works for ALL error types (HTTP, P2P, RPC, DB) │
│ ✓ Visible retry state in monitoring │
└─────────────────────────────────────────────────────────────┘
Applies to: All event processors, especially METADATA_CREATED/UPDATED
Current Issue:
ddoStateonly tracks metadata events- Order and pricing events have no error tracking
- No unified view of all DDO-related events
Proposed Change:
- Create new DB index:
ddo_logs - Store all events related to a DID (metadata, orders, pricing)
- Similar structure to
ddoStatebut broader scope - Add handler and routes (HTTP + P2P) to query all information about a DID, transaction, or event
- Similar to existing
ddo-statehandler but for comprehensive logs - Enable querying by:
did,txHash,blockNumber,eventType - Support both HTTP API endpoints and P2P protocol for distributed querying
- Similar to existing
Schema:
interface DdoLog {
did: string // Indexed
chainId: number // Indexed
eventType: string // METADATA_CREATED, ORDER_STARTED, etc.
eventHash: string // Event signature hash
txHash: string // Transaction hash
blockNumber: number // Block number
timestamp: number // Event timestamp
status: 'success' | 'failed' | 'retrying'
error?: string // Error message if failed
retryCount: number // Number of retry attempts default 0
lastRetry?: number // Timestamp of last retry
metadata?: Record<string, any> // Event-specific data
}Benefits:
- Single source of truth for all DDO events
- Easier debugging (see all events for a DID)
- Track pricing/order event errors (not just metadata)
- Audit trail for compliance
Current Issue:
- Using
EventEmitterfor communication - Synchronous, blocking behavior
- No retry/replay capability
- Difficult to test
Proposed Change:
- Replace
EventEmitterwith persistent queue system - Use queue for:
- ✓ Newly indexed assets (instead of
eventEmitter.emit()) - ✓ Reindex requests (block & transaction level)
- ✓ Admin commands
- ✓ Newly indexed assets (instead of
Queue Types:
1. EVENT_PROCESSING_QUEUE (primary)
- New events from blockchain
- Priority: FIFO with retry backoff
2. REINDEX_QUEUE (existing, enhance)
- Block-level reindex
- Transaction-level reindex
- Priority: Admin requests > Auto-retry
3. ORDER_QUEUE (new)
- Store orders even if DDO not found
- Process when DDO becomes available
Benefits:
- Testable (can inject mock queue)
- Observable (queue depth, retry counts)
- Resilient (survives crashes)
- Decoupled (no tight coupling between components)
Current Issue:
- If DDO not found → skip order/pricing event
- Lost data if DDO indexed later
Proposed Change:
Unified Queue-Based Approach for Both Orders and Pricing Events:
IF DDO not found:
1. Check if DDO exists in database
2. If not found → add event to pending queue
3. Store event in ddo_logs with status: 'pending_ddo'
4. Link event to DID for future reconciliation
5. When DDO is successfully indexed:
→ Process all pending events for that DID (orders + pricing)
→ Update event status from 'pending_ddo' to 'success' or 'failed'
→ Maintain event order based on blockNumber and logIndex
Queue Structure:
interface PendingEvent {
did: string
eventType: string // ORDER_STARTED, ORDER_REUSED, DISPENSER_*, EXCHANGE_*
chainId: number
txHash: string
blockNumber: number
timestamp: number
retryCount: number
queuedAt: number
}Benefits:
- Consistent approach for all event types (orders + pricing)
- No data loss - all events queued and processed eventually
- Maintains event order using blockNumber and logIndex
- Automatic reconciliation when DDO becomes available
- Better observability - track pending events per DID
- Prevents orphaned records - only create records when DDO exists
Current Issue:
- Crawler startup retry in
OceanIndexer - Failure blocks entire node (all chains)
Proposed Change:
- Move
retryCrawlerWithDelay()→ ChainIndexer - Each chain fails independently
- Other chains continue indexing
Benefits:
- Resilient multi-chain indexing
- One bad RPC doesn't kill everything
- Easier debugging (per-chain logs)
Current Issue:
- Failed block retried, but
lastIndexedBlocknot updated - Same block retried indefinitely
- No expiry/max retry limit
Proposed Change:
When block added to retry queue:
1. Update lastIndexedBlock (move forward)
2. Add block to retry queue with metadata:
- blockNumber
- retryCount (starts at 0)
- maxRetries (default: 5)
- lastError
- expiryDate (when to give up)
3. Process retry queue separately (exponential backoff)
4. If maxRetries exceeded → log to failed_blocks table
Retry Queue Schema:
interface BlockRetryTask {
chainId: number
blockNumber: number
retryCount: number
maxRetries: number
lastError: string
lastRetryAt: number
}Benefits:
- Indexer moves forward (doesn't get stuck)
- Failed blocks retried in background
- Clear failure tracking
Current Issue:
- Retry only on
ECONNREFUSEDerror - Other errors (timeout, 500, p2p failures) not retried
Proposed Change:
- With event-level retry, retry ALL error types:
- ✓ RPC errors (timeout, 500, 429 rate limit)
- ✓ HTTP errors (decrypt service down)
- ✓ P2P errors (peer unreachable)
- ✓ Database errors (temp unavailable)
- ✓ Validation errors (maybe retryable)
Required Test Updates:
- Remove tests checking
EventEmitterbehavior - Add tests for queue-based processing
- Add tests for retry with exponential backoff
- Add tests for orphaned orders
- Add tests for per-chain failure isolation
- Add tests for
ddo_logsindex - Add tests for block retry with expiry
| # | Change | Current Pain | Benefit | Effort | Priority |
|---|---|---|---|---|---|
| 1 | Event-level retry + queues | Retry logic scattered, blocking | Unified, non-blocking, testable | High | 🔴 Critical |
| 2 | ddo_logs DB index |
No order/pricing error tracking | Full audit trail, debugging | Medium | 🟡 High |
| 3 | Replace EventEmitter with queues | Blocking, not testable, no replay | Observable, resilient, testable | High | 🔴 Critical |
| 4 | Handle missing DDO (orphaned orders) | Lost orders/pricing data | No data loss, reconciliation | Medium | 🟡 High |
| 5 | Per-chain startup retry (ChainIndexer) | One failure kills entire node | Isolated failures, resilient | Low | 🔴 Critical |
| 6 | Block retry queue with expiry | Indexer stuck on bad blocks | Progress continues, background retry | Medium | 🟡 High |
| 7 | Retry ALL error types (not just ECONNREFUSED) | P2P/timeout/429 not retried | Comprehensive error handling | Low | 🟡 High |
| 8 | Update tests | Tests assume old architecture | Tests match new architecture | Medium | 🟢 Medium |