Skip to content

Add tx filters to indexer.GetSubscription#1074

Draft
bitcoin-coder-bob wants to merge 22 commits into
masterfrom
bob/indexer-tx-filters
Draft

Add tx filters to indexer.GetSubscription#1074
bitcoin-coder-bob wants to merge 22 commits into
masterfrom
bob/indexer-tx-filters

Conversation

@bitcoin-coder-bob
Copy link
Copy Markdown
Collaborator

@bitcoin-coder-bob bitcoin-coder-bob commented May 20, 2026

closes #1072

forked from bob/stream-single-connection which is up for review: #951


Adds CEL-based filtering of indexed transactions to IndexerService.GetSubscription / UpdateSubscription, complementing the existing script topic filter. A tx event is now dispatched to a listener when any of the listener's script topics match or when any of its CEL tx expressions evaluates to true.

Supported expression shapes (per the issue):

  • has(tx.extension) — true when the tx carries an ARK OP_RETURN extension
  • hasPacket(tx.extension, <packetType>) — true when a specific packet type is present
  • tx.extension[<packetType>].contains("<hex>") — substring match against the packet's hex-encoded body

Programs are compiled once at subscription-update time (invalid CEL → InvalidArgument, listener state untouched) and re-evaluated against each indexed tx envelope on the event hot path.

Changes

Area: api-spec/protobuf/ark/v1/indexer.proto
What changed: New messages TxFilter, ModifyTxFilters, OverwriteTxFilters, TxFilterResult. Extended SubscriptionFilter.filter oneof with txs = 2 and UpdateSubscriptionResponse.result oneof with txs = 2. Backward compatible: additive only.
────────────────────────────────────────
Area: pkg/ark-lib/indexer/celenv/ (new package)
What changed: tx CEL env declaring tx.extension as a dynamic map (runtime: map[int64]string of packet type → hex-encoded packet bytes), hasPacket custom function, Compile() (rejects non-bool expressions), Eval() building the activation from extension.NewExtensionFromTx. tx.extension is only populated when the tx actually carries an extension, so has(tx.extension) is a meaningful guard.
────────────────────────────────────────
Area: internal/interface/grpc/handlers/broker.go
What changed: Listener now holds txFilters map[string]cel.Program keyed by the original expression string. Broker methods addTxFilters / removeTxFilters / overwriteTxFilters / getTxFilters mirror the scripts API. compileTxFilters validates all expressions before mutating listener state (atomic). matchesTx takes a lazy getTx thunk — listeners with no filters skip the decode entirely.
────────────────────────────────────────
Area: internal/interface/grpc/handlers/indexer.go
What changed: applyTxFilter mirrors applyScriptsFilter. Both GetSubscription's initial-filter dispatch and UpdateSubscription route the new SubscriptionFilter_Txs case to it. listenToTxEvents lazily decodes event.Tx once per event (only if at least one listener has tx filters) and dispatches on scriptMatch || txMatch.
────────────────────────────────────────
Area: internal/interface/grpc/handlers/indexer_test.go
What changed: New TestTxFilter (17 passing subtests): initial filter via GetSubscription, Modify-Add / Modify-Remove / Overwrite via UpdateSubscription, invalid-CEL rejection at both entry points, atomic rollback on Modify-Add with invalid expression, idempotent add+remove, OR semantics fully exercised (script-only, tx-only, both → exactly once, neither → dropped), script path still works when event.Tx is unparseable, and the lazy-parse contract (matchesTx doesn't invoke getTx on a filterless listener).
────────────────────────────────────────
Area: pkg/ark-lib/indexer/celenv/tx_env_test.go (new)
What changed: 15 passing assertions covering has(tx.extension), hasPacket, .contains(), parser-rejects-invalid-CEL, parser-rejects-non-bool, and the issue's exact hasPacket(tx.extension, <type>) / tx.extension[<type>].contains("<hex>") patterns against fixture txs.

Acceptance criteria (#1072 (comment))

6/8 met as written, 2 with deviations worth flagging:

  1. Shape: filters are nested inside the SubscriptionFilter oneof (SubscriptionFilter.txs.{modify|overwrite}.expressions) rather than a flat repeated string tx_filters field on the request. This matches the unified-filter shape altafan asked for in single-connection GetSubscription flow #951's review (the criterion text predates that). Same client contract end-to-end (CEL strings in / matching events out).
  2. Map key type: tx.extension keys are CEL int (runtime int64), not uint32. Done so users can write tx.extension[0x00] directly (matching the issue's example literals); with uint keys they'd need 0x00u.

All other criteria are met and covered by tests.

Notes

  • Backward compatible vs master: all new proto fields are additive, no existing RPCs change shape.
  • Lazy tx decode: an event whose listeners have no tx filters incurs no hex.DecodeString / MsgTx.Deserialize cost.
  • CEL eval cost: programs are pre-compiled and reused; eval is dominated by extension.NewExtensionFromTx parsing per tx. Worth a benchmark before merge if event volume is high — flagging here.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 20, 2026

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 4d3ba519-b388-4459-b4dc-4a7fe45ebacc

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch bob/indexer-tx-filters

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@arkanaai arkanaai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔍 Aggressive Code Review — #1074

Reviewer: Arkana (automated, protocol-aware)


Summary

This PR adds CEL-based transaction filters to GetSubscription, introduces a new single-connection subscription flow (empty subscription_id → server creates inline), adds UpdateSubscription RPC, and adds a SubscriptionStartedEvent to the stream response. The proto changes, handler logic, CEL environment, and test coverage are generally well-structured. However, there are security and correctness issues that must be addressed before merge.


🔴 CRITICAL — CEL DoS: No limits on expression count or evaluation cost

Files: broker.go:102-177, indexer.go:532-706

The addTxFilters / overwriteTxFilters broker methods accept unbounded []string expression lists. A malicious client can:

  1. Memory exhaustion: Call UpdateSubscription repeatedly with Modify.AddExpressions, growing listener.txFilters without bound. Each compiled cel.Program is ~1-10KB.
  2. CPU exhaustion: matchesTx() (broker.go:148-166) evaluates ALL programs for EVERY transaction event for EVERY listener. With N listeners × M expressions × T transactions/sec, this is O(N·M·T) CEL evaluations.
  3. No evaluation timeout: celenv.Eval (tx_env.go:48-58) has no context/timeout. A pathological expression (e.g., deeply nested string operations on large packet hex data) blocks the single listenToTxEvents goroutine, stalling ALL subscriptions.

Required fixes:

  • Add a per-subscription expression limit (e.g., maxTxFiltersPerListener = 64).
  • Add a per-call limit on expressions in a single UpdateSubscription request.
  • Consider adding cel.EvalOption(cel.OptTrackCost) + cel.CostLimit() to cap evaluation cost.

🔴 HIGH — activation() called once per program, not once per tx

File: broker.go:148-166, tx_env.go:47-58

matchesTx calls celenv.Eval(prg, tx) in a loop, and Eval calls activation(tx) every time. activation() parses the OP_RETURN extension and serializes every packet to hex — this is non-trivial work. For a listener with M expressions, this runs M times per event instead of once.

// broker.go:162 — this loop calls Eval → activation(tx) for each program
for _, prg := range programs {
    ok, err := celenv.Eval(prg, tx)

Fix: Compute the activation map once outside the loop and pass it to a new EvalWithActivation(prg, act) function. This changes O(M·E) to O(M + E) where E = extension parse cost.


🟡 MEDIUM — Silent error swallowing in matchesTx

File: broker.go:162-165

ok, err := celenv.Eval(prg, tx)
if err == nil && ok {
    return true
}

CEL evaluation errors are silently ignored. If an expression panics, returns a non-bool, or hits an unexpected type, this silently drops the match. At minimum, log at debug level so operators can detect misbehaving expressions.


🟡 MEDIUM — OpenAPI spec has duplicate parameter names

File: indexer.openapi.json:25-46, indexer.openapi.json:136-157

filter.scripts.modify.removeScripts appears twice in both endpoint definitions. Same for filter.txs.modify.removeExpressions. This is likely a protobuf→OpenAPI codegen quirk (the add_* and remove_* fields map to the same query param name due to the parent modify wrapper), but it will confuse REST clients and may cause undefined behavior in OpenAPI code generators.

Fix: Verify the codegen output. If this is a codegen bug, file upstream. If hand-maintained, deduplicate.


🟡 MEDIUM — ScriptsFilterResult.Added/Removed returns raw input, not canonical form

File: indexer.go:541-549

return &arkv1.UpdateSubscriptionResponse{
    Result: &arkv1.UpdateSubscriptionResponse_Scripts{
        Scripts: &arkv1.ScriptsFilterResult{
            Added:   modify.GetAddScripts(),    // raw input
            Removed: modify.GetRemoveScripts(),  // raw input
            All:     h.scriptSubsHandler.getTopics(subscriptionID), // formatted
        },
    },
}

Added and Removed return the raw user input, but All returns formatTopic()-processed scripts (from getTopics). A client comparing Added entries against All entries may find mismatches if the input format differs from the stored format. Return the parsed/canonical form for all three fields.


🟡 MEDIUM — Fragile error string matching

File: indexer.go:464-467, indexer.go:570-573

if strings.Contains(err.Error(), "listener not found") {
    return status.Error(codes.NotFound, err.Error())
}

And:

if strings.Contains(err.Error(), "not found") {
    return nil, status.Error(codes.NotFound, err.Error())
}

Error classification via substring matching is brittle. If broker error messages change, these will silently produce wrong gRPC status codes. Define sentinel errors in the broker package and use errors.Is().


🟢 LOW — parseTxOnce is safe but could use sync.OnceValue

File: indexer.go:800-817

The parseTxOnce closure with parsedTxAttempted bool works correctly because listenToTxEvents processes events sequentially. However, sync.OnceValue from Go 1.21+ would make the intent clearer and be future-proof if this ever becomes concurrent.


🟢 LOW — UpdateSubscription permission is write, SubscribeForScripts is write

File: permissions.go:258-262

Both are write on EntityIndexer, which is consistent. ✅


🟢 Cross-repo Impact

Two test repos (layerzero-usdt0-arkade-demo, introspector-review) use the old two-step subscription flow (SubscribeForScriptsGetSubscription(id)). The old flow is preserved — the behavior change only applies when subscription_id is empty. The new SubscriptionStartedEvent oneof variant in GetSubscriptionResponse is additive and won't break existing gRPC clients (they'll ignore unrecognized oneof values). REST/SSE clients that don't handle the new variant should be checked.

No SDK repos found with direct indexer subscription consumers, but the proto changes will propagate on next SDK codegen. No breaking change for existing consumers.


🟢 Protocol Safety

This PR does NOT touch VTXO handling, transaction signing, forfeit paths, round lifecycle, connector trees, or exit paths. The CEL filters are read-only event filtering in the indexer subscription layer — they cannot modify transaction processing. No protocol-critical flag needed.


✅ Test Coverage

Excellent test coverage: 1428 lines of new tests covering:

  • New flow (empty subscription_id): started event, script events, cleanup, heartbeat, invalid input
  • Old flow (existing subscription_id): preserved behavior, timeout, cleanup
  • UpdateSubscription: overwrite, modify, validation, NotFound, idempotency
  • TxFilter: CEL compilation, evaluation, OR semantics, error resilience, atomicity
  • celenv package: compile rejection, eval correctness, issue examples

Missing test: no test for max-expression-count limits (because the limit doesn't exist yet — see critical finding above).


Verdict: Request Changes

The CEL DoS vector (unbounded expressions, no eval cost limit, no timeout) is the blocking issue. The activation() per-program overhead is a significant perf bug. The other items are medium/low severity.

🤖 Reviewed by Arkana

Copy link
Copy Markdown
Contributor

@arkanaai arkanaai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔍 Incremental Review — new commit 47c2f5c8 ("go mod tidy")

Only change since my last review: go.mod removes // indirect from github.com/google/cel-go v0.26.1, correctly marking it as a direct dependency now that pkg/ark-lib/indexer/celenv/ imports it. ✅ Trivial and correct.

All findings from my previous review still apply. The critical items remain unaddressed:

  1. 🔴 CEL DoS — no per-subscription expression limit, no eval cost cap
  2. 🔴 Redundant activation() call — parsed once per program instead of once per tx
  3. 🟡 Silent error swallowing in matchesTx
  4. 🟡 Fragile error string matching — use sentinel errors
  5. 🟡 OpenAPI duplicate parameters
  6. 🟡 Raw vs canonical form mismatch in filter results

Keeping my request-changes status until the critical items are resolved.

🤖 Reviewed by Arkana

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add tx filters to indexer.GetSubscription

1 participant