Ocean Node Indexer - Event Monitoring & Error Handling

Overview
🔴 PROPOSED IMPROVEMENTS (Post-Meeting Changes)

Overview

Current Indexer Architecture

The Ocean Node Indexer is built with the following design principles (see Architecture.md for details):

Single-threaded, non-blocking design: Uses Node.js async/await for concurrent execution across multiple chains
ChainIndexer instances: Each blockchain network is monitored by a dedicated ChainIndexer instance running concurrently via the event loop
Event-driven communication: Components communicate through EventEmitter for clean separation of concerns
Efficient I/O handling: All RPC calls, database operations, and network requests are non-blocking, allowing high concurrency without worker threads

Proposed Architecture Evolution

The refactoring strategy below maintains the core single-threaded, non-blocking architecture while introducing key improvements:

EventEmitter → Persistent Queues: Replace synchronous EventEmitter with persistent queue system for better reliability and observability
Event-level retry: Move retry logic from embedded operations to event-level processing
Enhanced error tracking: Introduce comprehensive logging via ddo_logs index
Per-chain resilience: Isolate chain failures to prevent cascading issues

These changes preserve the efficient I/O model and concurrent ChainIndexer execution while adding production-grade error handling and monitoring.

🔴 PROPOSED IMPROVEMENTS (Post-Meeting Changes)

Status: Draft proposals from Jan 27, 2026 meeting
Goal: Improve reliability, decoupling, and error handling

1. 🎯 EVENT-LEVEL RETRY MECHANISM WITH QUEUES

Current Issue: Retry logic is deeply embedded in event processing steps (e.g., inside DDO decryption)

Proposed Change:

Move retry logic to event level (not deep inside processing steps)
Implement queue-based retry system for all 12 event types
Decouple retry from specific operations (e.g., decrypt, p2p, HTTP)

Implementation:

┌─────────────────────────────────────────────────────────────┐
│                 EVENT ERROR PROCESSING QUEUE                 │
├─────────────────────────────────────────────────────────────┤
│                                                              │
│  Event Detected                                              │
│       ↓                                                      │
│  Send to Processor                                           │
│       ↓                                                      │
│  Process Event                                               │
│       ├─ Success → Mark complete, update DB                  │
│       └─ Failure → Add to Retry Queue with backoff           │
│                                                              │
│  Retry Queue (exponential backoff):                          │
│    - Retry 1: ~10 seconds                                    │
│    - Retry 2: ~1 minute                                      │
│    - Retry 3: ~10 minutes                                    │
│    - Retry 4: ~1 hour                                        │
│    - Retry 5: ~1 week (final attempt)                        │
│                                                              │
│  Benefits:                                                   │
│    ✓ Non-blocking (doesn't halt chain indexing)              │
│    ✓ Works for ALL error types (HTTP, P2P, RPC, DB)         │
│    ✓ Visible retry state in monitoring                      │
└─────────────────────────────────────────────────────────────┘

Applies to: All event processors, especially METADATA_CREATED/UPDATED

2. 🗄️ NEW DATABASE INDEX: `ddo_logs`

Current Issue:

ddoState only tracks metadata events
Order and pricing events have no error tracking
No unified view of all DDO-related events

Proposed Change:

Create new DB index: ddo_logs
Store all events related to a DID (metadata, orders, pricing)
Similar structure to ddoState but broader scope
Add handler and routes (HTTP + P2P) to query all information about a DID, transaction, or event
- Similar to existing ddo-state handler but for comprehensive logs
- Enable querying by: did, txHash, blockNumber, eventType
- Support both HTTP API endpoints and P2P protocol for distributed querying

Schema:

interface DdoLog {
  did: string // Indexed
  chainId: number // Indexed
  eventType: string // METADATA_CREATED, ORDER_STARTED, etc.
  eventHash: string // Event signature hash
  txHash: string // Transaction hash
  blockNumber: number // Block number
  timestamp: number // Event timestamp
  status: 'success' | 'failed' | 'retrying'
  error?: string // Error message if failed
  retryCount: number // Number of retry attempts default 0
  lastRetry?: number // Timestamp of last retry
  metadata?: Record<string, any> // Event-specific data
}

Benefits:

Single source of truth for all DDO events
Easier debugging (see all events for a DID)
Track pricing/order event errors (not just metadata)
Audit trail for compliance

3. 🔄 REPLACE EventEmitter WITH QUEUES

Current Issue:

Using EventEmitter for communication
Synchronous, blocking behavior
No retry/replay capability
Difficult to test

Proposed Change:

Replace EventEmitter with persistent queue system
Use queue for:
- ✓ Newly indexed assets (instead of eventEmitter.emit())
- ✓ Reindex requests (block & transaction level)
- ✓ Admin commands

Queue Types:

1. EVENT_PROCESSING_QUEUE (primary)
   - New events from blockchain
   - Priority: FIFO with retry backoff

2. REINDEX_QUEUE (existing, enhance)
   - Block-level reindex
   - Transaction-level reindex
   - Priority: Admin requests > Auto-retry

3. ORDER_QUEUE (new)
   - Store orders even if DDO not found
   - Process when DDO becomes available

Benefits:

Testable (can inject mock queue)
Observable (queue depth, retry counts)
Resilient (survives crashes)
Decoupled (no tight coupling between components)

4. 📦 HANDLE MISSING DDO IN ORDER/PRICING EVENTS

Current Issue:

If DDO not found → skip order/pricing event
Lost data if DDO indexed later

Proposed Change:

Unified Queue-Based Approach for Both Orders and Pricing Events:

IF DDO not found:
  1. Check if DDO exists in database
  2. If not found → add event to pending queue
  3. Store event in ddo_logs with status: 'pending_ddo'
  4. Link event to DID for future reconciliation
  5. When DDO is successfully indexed:
     → Process all pending events for that DID (orders + pricing)
     → Update event status from 'pending_ddo' to 'success' or 'failed'
     → Maintain event order based on blockNumber and logIndex

Queue Structure:

interface PendingEvent {
  did: string
  eventType: string // ORDER_STARTED, ORDER_REUSED, DISPENSER_*, EXCHANGE_*
  chainId: number
  txHash: string
  blockNumber: number
  timestamp: number
  retryCount: number
  queuedAt: number
}

Benefits:

Consistent approach for all event types (orders + pricing)
No data loss - all events queued and processed eventually
Maintains event order using blockNumber and logIndex
Automatic reconciliation when DDO becomes available
Better observability - track pending events per DID
Prevents orphaned records - only create records when DDO exists

5. 🚫 MOVE RETRY LOGIC TO ChainIndexer (Block Only That Chain)

Current Issue:

Crawler startup retry in OceanIndexer
Failure blocks entire node (all chains)

Proposed Change:

Move retryCrawlerWithDelay() → ChainIndexer
Each chain fails independently
Other chains continue indexing

Benefits:

Resilient multi-chain indexing
One bad RPC doesn't kill everything
Easier debugging (per-chain logs)

6. 📍 BLOCK RETRY QUEUE IMPROVEMENTS

Current Issue:

Failed block retried, but lastIndexedBlock not updated
Same block retried indefinitely
No expiry/max retry limit

Proposed Change:

When block added to retry queue:
  1. Update lastIndexedBlock (move forward)
  2. Add block to retry queue with metadata:
     - blockNumber
     - retryCount (starts at 0)
     - maxRetries (default: 5)
     - lastError
     - expiryDate (when to give up)
  3. Process retry queue separately (exponential backoff)
  4. If maxRetries exceeded → log to failed_blocks table

Retry Queue Schema:

interface BlockRetryTask {
  chainId: number
  blockNumber: number
  retryCount: number
  maxRetries: number
  lastError: string
  lastRetryAt: number
}

Benefits:

Indexer moves forward (doesn't get stuck)
Failed blocks retried in background
Clear failure tracking

7. 🌐 REMOVE ECONNREFUSED-ONLY CONDITION

Current Issue:

Retry only on ECONNREFUSED error
Other errors (timeout, 500, p2p failures) not retried

Proposed Change:

With event-level retry, retry ALL error types:
- ✓ RPC errors (timeout, 500, 429 rate limit)
- ✓ HTTP errors (decrypt service down)
- ✓ P2P errors (peer unreachable)
- ✓ Database errors (temp unavailable)
- ✓ Validation errors (maybe retryable)

8. ✅ UPDATE TESTS

Required Test Updates:

Remove tests checking EventEmitter behavior
Add tests for queue-based processing
Add tests for retry with exponential backoff
Add tests for orphaned orders
Add tests for per-chain failure isolation
Add tests for ddo_logs index
Add tests for block retry with expiry

Summary Table

#	Change	Current Pain	Benefit	Effort	Priority
1	Event-level retry + queues	Retry logic scattered, blocking	Unified, non-blocking, testable	High	🔴 Critical
2	`ddo_logs` DB index	No order/pricing error tracking	Full audit trail, debugging	Medium	🟡 High
3	Replace EventEmitter with queues	Blocking, not testable, no replay	Observable, resilient, testable	High	🔴 Critical
4	Handle missing DDO (orphaned orders)	Lost orders/pricing data	No data loss, reconciliation	Medium	🟡 High
5	Per-chain startup retry (ChainIndexer)	One failure kills entire node	Isolated failures, resilient	Low	🔴 Critical
6	Block retry queue with expiry	Indexer stuck on bad blocks	Progress continues, background retry	Medium	🟡 High
7	Retry ALL error types (not just ECONNREFUSED)	P2P/timeout/429 not retried	Comprehensive error handling	Low	🟡 High
8	Update tests	Tests assume old architecture	Tests match new architecture	Medium	🟢 Medium

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ocean Node Indexer - Event Monitoring & Error Handling

Table of Contents

Overview

Current Indexer Architecture

Proposed Architecture Evolution

🔴 PROPOSED IMPROVEMENTS (Post-Meeting Changes)

1. 🎯 EVENT-LEVEL RETRY MECHANISM WITH QUEUES

2. 🗄️ NEW DATABASE INDEX: `ddo_logs`

3. 🔄 REPLACE EventEmitter WITH QUEUES

4. 📦 HANDLE MISSING DDO IN ORDER/PRICING EVENTS

5. 🚫 MOVE RETRY LOGIC TO ChainIndexer (Block Only That Chain)

6. 📍 BLOCK RETRY QUEUE IMPROVEMENTS

7. 🌐 REMOVE ECONNREFUSED-ONLY CONDITION

8. ✅ UPDATE TESTS

Summary Table

FilesExpand file tree

IndexerRefactorStrategy.md

Latest commit

History

IndexerRefactorStrategy.md

File metadata and controls

Ocean Node Indexer - Event Monitoring & Error Handling

Table of Contents

Overview

Current Indexer Architecture

Proposed Architecture Evolution

🔴 PROPOSED IMPROVEMENTS (Post-Meeting Changes)

1. 🎯 EVENT-LEVEL RETRY MECHANISM WITH QUEUES

2. 🗄️ NEW DATABASE INDEX: ddo_logs

3. 🔄 REPLACE EventEmitter WITH QUEUES

4. 📦 HANDLE MISSING DDO IN ORDER/PRICING EVENTS

5. 🚫 MOVE RETRY LOGIC TO ChainIndexer (Block Only That Chain)

6. 📍 BLOCK RETRY QUEUE IMPROVEMENTS

7. 🌐 REMOVE ECONNREFUSED-ONLY CONDITION

8. ✅ UPDATE TESTS

Summary Table

2. 🗄️ NEW DATABASE INDEX: `ddo_logs`