Skip to content

Remove chain actors (#5502, #5687)#5790

Merged
afck merged 13 commits into
linera-io:testnet_conwayfrom
afck:conway-no-chain-actor
Apr 7, 2026
Merged

Remove chain actors (#5502, #5687)#5790
afck merged 13 commits into
linera-io:testnet_conwayfrom
afck:conway-no-chain-actor

Conversation

@afck
Copy link
Copy Markdown
Contributor

@afck afck commented Mar 24, 2026

Backport of #5502 and #5687.

Motivation

The chain actors are complicated and unnecessary, and even read-only requests to them are unnecessarily run only sequentially.

Proposal

Remove the chain actors, use an RwLock instead.

Test Plan

CI should catch regressions. We should do benchmarks to see if this improves performance.

Release Plan

  • Release a new SDK.
  • Hotfix the validators.

Links

@afck afck changed the title Remove chain actors Remove chain actors (#5502, #5687) Mar 24, 2026
@afck afck force-pushed the conway-no-chain-actor branch 3 times, most recently from d4740e1 to 8bf5f36 Compare March 25, 2026 11:01
, linera-io#5687)

Backport of linera-io#5502 (Remove chain actors; handle read-only calls
concurrently) and linera-io#5687 (Fix race conditions with getting and dropping
chain workers) to testnet_conway.

Replaces the channel-based ChainWorkerActor with a direct
Arc<RwLock<ChainWorkerState>> approach, enabling concurrent read-only
operations. Uses a lock-free papaya::HashMap with
Shared<oneshot::Receiver<Weak<_>>> for race-free worker creation,
and Arc::try_unwrap in the keep-alive task for safe worker cleanup.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@afck afck force-pushed the conway-no-chain-actor branch from 8bf5f36 to 2f3d401 Compare March 25, 2026 11:38
afck and others added 3 commits March 25, 2026 12:05
There should only be one stage_block_execution, taking a policy argument.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@afck afck marked this pull request as ready for review March 25, 2026 13:19
afck and others added 8 commits March 27, 2026 12:56
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Port all new testnet_conway functionality into the no-chain-actor
architecture: RevertConfirm cross-chain requests, inbox gap detection,
outbox revert, message bundle chunking, reset-on-incorrect-outcome,
poisoned worker handling, and next_expected_events support.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Minimizes the diff by keeping methods in the same order as in the
testnet_conway branch.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
When a journal resolution failure poisons the chain worker, the view's
in-memory state is inconsistent. Rolling back would give a false sense
of consistency, so the RollbackGuard now skips rollback for poisoned
workers. Both chain_read and chain_write evict poisoned workers from
the cache so the next request reloads from storage.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Port rename of missing_height to retransmit_from in RevertConfirm,
block value cache improvements, and other recent testnet_conway changes.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@afck afck merged commit 1ca6426 into linera-io:testnet_conway Apr 7, 2026
36 checks passed
@afck afck deleted the conway-no-chain-actor branch April 7, 2026 10:46
afck added a commit that referenced this pull request Apr 14, 2026
… `actor.rs`. (#6000)

## Motivation

In #5790 I must have
forgotten to remove (or brought back in a merge attempt) `actor.rs`.

The port of #5991 to testnet_conway (#5992) modified
`linera-core/src/chain_worker/actor.rs`, but actor.rs is an orphan file
with no `mod actor` declaration in `chain_worker/mod.rs`, so it is not
part of the build and the TTL inversion was never actually fixed on this
branch.

## Proposal

Apply the same swap to `handle.rs::create_chain_worker`, which is the
code path that is actually compiled, and delete the stale `actor.rs`
file so this cannot happen again.

## Test Plan

CI

## Release Plan

- Release SDK.
- Validator hotfix.

## Links

- Original TTL fix attempt: #5992 
- Removing chain actors but not `actor.rs`: #5790 
- [reviewer
checklist](https://github.com/linera-io/linera-protocol/blob/main/CONTRIBUTING.md#reviewer-checklist)

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
ma2bd added a commit to ma2bd/linera-protocol that referenced this pull request Apr 17, 2026
The previous implementation wrote to the DB first, then updated the LRU
cache. If the future was cancelled between the two (e.g. by the
RollbackGuard introduced in linera-io#5790), the DB would have the new data but
the cache would retain stale entries. Subsequent reads would hit the
stale cache, and the next save would overwrite the DB with old data,
causing silent data loss.

Fix: invalidate cache entries BEFORE writing to the DB, then repopulate
after success. If cancelled at any point after invalidation, subsequent
reads go directly to the DB and see the correct state.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
ma2bd added a commit that referenced this pull request Apr 19, 2026
…ing task::spawn) (#6056)

## Motivation

The `LruCachingStore::write_batch` method was not cancellation-safe. It
wrote to the DB first, then updated the LRU cache. If the caller's
future was cancelled between the two steps (e.g. by a gRPC timeout or
runtime shutdown), the DB would have the new data but the cache would
retain stale entries. The subsequent `RollbackGuard` would then reset
the in-memory view state, and the next `save()` would overwrite the DB
with old data -- causing silent data loss.

This was identified as the likely root cause of the missing outbox
bucket data on validator 4 (`The front bucket is always loaded` panic).

## Proposal

Wrap the DB write and cache update in a `tokio::task::spawn` so they run
to completion even if the caller's future is cancelled. On web targets,
the task runs inline since cancellation safety is not a concern there
(no `RollbackGuard` / concurrent chain workers).

Adds `Clone + Send + 'static` bounds on the inner store type parameter,
which are already satisfied by all stores in the chain (ScyllaDB,
RocksDB, journaling, value-splitting).

## Test Plan

CI

## Links

- Alternative to #6051 (invalidate-first approach)
- Related: #5790 (Remove chain actors -- introduced `RollbackGuard`)
- Related: #6015 (Fix `BucketQueueView::delete_front` load failure
handling)
- Related: #6046 (Fix storage cache reads dropping Arc before use)
- [reviewer
checklist](https://github.com/linera-io/linera-protocol/blob/main/CONTRIBUTING.md#reviewer-checklist)

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
afck pushed a commit to afck/linera-protocol that referenced this pull request Apr 20, 2026
…ing task::spawn) (linera-io#6056)

## Motivation

The `LruCachingStore::write_batch` method was not cancellation-safe. It
wrote to the DB first, then updated the LRU cache. If the caller's
future was cancelled between the two steps (e.g. by a gRPC timeout or
runtime shutdown), the DB would have the new data but the cache would
retain stale entries. The subsequent `RollbackGuard` would then reset
the in-memory view state, and the next `save()` would overwrite the DB
with old data -- causing silent data loss.

This was identified as the likely root cause of the missing outbox
bucket data on validator 4 (`The front bucket is always loaded` panic).

## Proposal

Wrap the DB write and cache update in a `tokio::task::spawn` so they run
to completion even if the caller's future is cancelled. On web targets,
the task runs inline since cancellation safety is not a concern there
(no `RollbackGuard` / concurrent chain workers).

Adds `Clone + Send + 'static` bounds on the inner store type parameter,
which are already satisfied by all stores in the chain (ScyllaDB,
RocksDB, journaling, value-splitting).

## Test Plan

CI

## Links

- Alternative to linera-io#6051 (invalidate-first approach)
- Related: linera-io#5790 (Remove chain actors -- introduced `RollbackGuard`)
- Related: linera-io#6015 (Fix `BucketQueueView::delete_front` load failure
handling)
- Related: linera-io#6046 (Fix storage cache reads dropping Arc before use)
- [reviewer
checklist](https://github.com/linera-io/linera-protocol/blob/main/CONTRIBUTING.md#reviewer-checklist)

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants