trap host function to avoid wrong atomically() behavior by vigoo · Pull Request #3309 · golemcloud/golem

vigoo · 2026-04-29T16:43:56Z

Before this change, atomically { ... } worked in the following way:

emit a begin atomic region marker in the oplog
if succeeds, emit an end atomic region marker in the oplog
if it fails (= trap) the regions is not closed, and when Golem replays the oplog, it detects the missing end marker and invalidates the partial oplog region with a jump, switches back to live and by that retries the atomic region from scratch

This was always mostly tested with Rust with panics as the failure source, in which case it works well.

The problem is in the TS SDK (and Scala) errors are manifested as JS exceptions, thrown out of the atomic region, and only becoming a trap if uncaught on top level.

Until recently (within 2 weeks) the TS atomically was always closing the atomic region, even when the inner promise was failed. This made it completely unusable as in that case during recovery the atomic region is considered successfully closed and the switch-to-live event is not happening. So I changed this before to not close the region in case of a failure, which matches the rust sdk behavior.

The problem is that the exception can be caught outside of the atomic region and that's what Maxim's example was doing:

export async function llmCall(
  client: OpenAI,
  params: OpenAI.Chat.ChatCompletionCreateParamsNonStreaming,
): Promise<string> {
  let lastError: unknown;

  for (let attempt = 0; attempt < MAX_ATTEMPTS; attempt++) {
    if (attempt > 0) {
      await sleep(backoffMs(attempt - 1));
    }

    try {
      // Each attempt is its own atomic block so Golem records a clean
      // checkpoint rather than resuming from an incomplete HTTP write.
      const content = await atomically(async () => {
        const response = await client.chat.completions.create(params);
        return response.choices[0]?.message?.content ?? '';
      });
      return content;
    } catch (err) {
      lastError = err;
      if (!isRetryable(err)) {
        throw err;
      }
      console.error(
        `LLM call failed (attempt ${attempt + 1}/${MAX_ATTEMPTS}): ` +
        `${err instanceof APIError ? `HTTP ${err.status} ${err.message}` : String(err)}`
      );
    }
  }

  throw lastError;
}

here in case of a failure, the atomic region is NOT closed, but we never trap because we catch the exception and by that we leave the atomic region open which leads to all kind of unwanted behaviors.

I was investigating ways to make this work, it could be probably possible to introduce new oplog entries and host functions to record the exception and close the region with that, and then during replay re-emit the exception so our external retry path gets replayed - I decided this is too hard to do, especially because it requires reconstructing the exception completely, and adding new oplog entries.

So what I did in this PR instead is to never allow exceptions out of the atomically block - which will need to be further documented in a follow-up docs/skills PR - and always trap. To be able to trap in JS without having to bubble the exception to the top, I introduced a trap host function. This way the atomic regions work as intended in all languages.

mschuwalow · 2026-04-29T19:27:22Z

Note that the outer retry was only needed in the example because retryPolicy did not work here (as it was automatically closed, making the error happen outside the retrypolicy region).

But I think the changes make sense either way

mschuwalow · 2026-04-29T19:28:41Z

Does it make sense to apply the same "trap when an error bubbles up" to the retry policies? I think it would solve all issues we have there as well

vigoo · 2026-04-30T12:16:13Z

Does it make sense to apply the same "trap when an error bubbles up" to the retry policies? I think it would solve all > issues we have there as well

maybe, but I'd like to think that through separately (there is another ticket about that problem)

trap host function to avoid wrong atomically() behavior

0da36dc

vigoo linked an issue Apr 29, 2026 that may be closed by this pull request

Check how http retry is falling to "non-idempotent non-retryable" case, may be too strict #3285

Open

Merge branch 'main' into atomically-fix

8df0936

Regenerated d.ts

3e6d57d

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

trap host function to avoid wrong atomically() behavior#3309

trap host function to avoid wrong atomically() behavior#3309
vigoo wants to merge 3 commits intomainfrom
atomically-fix

vigoo commented Apr 29, 2026

Uh oh!

mschuwalow commented Apr 29, 2026

Uh oh!

mschuwalow commented Apr 29, 2026

Uh oh!

vigoo commented Apr 30, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

vigoo commented Apr 29, 2026

Uh oh!

mschuwalow commented Apr 29, 2026

Uh oh!

mschuwalow commented Apr 29, 2026

Uh oh!

vigoo commented Apr 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

vigoo commented Apr 30, 2026 •

edited

Loading