Skip to content

fix(nodes): improve error handling in remote node execution#565

Merged
stepmikhaylov merged 2 commits intorocketride-org:developfrom
charliegillet:fix/remote-node-error-handling
Apr 6, 2026
Merged

fix(nodes): improve error handling in remote node execution#565
stepmikhaylov merged 2 commits intorocketride-org:developfrom
charliegillet:fix/remote-node-error-handling

Conversation

@charliegillet
Copy link
Copy Markdown
Contributor

Summary

  • Replace fragile string-based error origin detection ('RemoteException:' not in msg) with proper type checking (isinstance(e, APERR) and e.ec == Ec.RemoteException)
  • Always send error signals to the remote side before re-raising, preventing the remote pipeline from hanging when an already-wrapped RemoteException is caught
  • Narrow exception catch from BaseException to Exception to avoid intercepting KeyboardInterrupt and SystemExit

Type

fix

Testing

  • Verify remote node pipeline execution completes normally (no regression)
  • Trigger an error in local pipeline processing and confirm the remote side receives the error signal instead of hanging
  • Trigger a KeyboardInterrupt during remote execution and confirm it is no longer caught by the error handler

Changed files

  • nodes/src/nodes/remote/base/IInstance.pycallRemote() error handler
  • nodes/src/nodes/remote/server/IInstance.pyhandleWebSocket() error handler

🤖 Generated with Claude Code

charliegillet and others added 2 commits March 30, 2026 17:11
…screen

Handle TASK_STATE.STOPPING in the control button to show "Stopping..."
with a disabled state and distinct orange styling, preventing duplicate
clicks and giving immediate visual feedback during pipeline shutdown.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace fragile string-based error origin detection with proper type
checking (isinstance + error code comparison). Always send error signals
to the remote side to prevent hanging connections. Narrow exception
catch from BaseException to Exception to avoid intercepting
KeyboardInterrupt and SystemExit.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@github-actions github-actions Bot added module:nodes Python pipeline nodes module:vscode VS Code extension labels Mar 31, 2026
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Mar 31, 2026

Warning

Rate limit exceeded

@charliegillet has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 2 minutes and 1 seconds before requesting another review.

Your organization is not enrolled in usage-based pricing. Contact your admin to enable usage-based pricing to continue reviews beyond the rate limit, or try again in 2 minutes and 1 seconds.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 69f8dc7b-b664-48bb-bdb9-b709f14d5bc8

📥 Commits

Reviewing files that changed from the base of the PR and between b6b2dc1 and 02adaef.

📒 Files selected for processing (4)
  • apps/vscode/src/providers/views/PageStatus/PageStatus.tsx
  • apps/vscode/src/providers/views/PageStatus/styles.css
  • nodes/src/nodes/remote/base/IInstance.py
  • nodes/src/nodes/remote/server/IInstance.py
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@ryan-t-christensen
Copy link
Copy Markdown
Collaborator

ryan-t-christensen commented Mar 31, 2026

Looks good @charliegillet - I did test the STOPPING button state — it transitioned in ~0.6ms, so it's never visible in practice. The UI change is correct but the STOPPING state is set in _terminated() (task_engine.py:551) after the subprocess has already exited, not when the stop is requested. To make this useful the server would need to transition to STOPPING at the moment the stop command is issued. As-is the button will never render. That probably warrants a separate issue.

cc @stepmikhaylov

@charliegillet
Copy link
Copy Markdown
Contributor Author

@ryan-t-christensen thanks for the thorough testing and the approval! Good catch on the STOPPING state timing — you're right that since _terminated() sets it after the subprocess exits, the button state is effectively invisible. I'll open a separate issue to track moving the STOPPING transition to the moment the stop command is issued on the server side, so the UI can actually reflect it.

@stepmikhaylov stepmikhaylov merged commit 25b0c0d into rocketride-org:develop Apr 6, 2026
11 checks passed
shashidharbabu pushed a commit that referenced this pull request Apr 7, 2026
* feat(vscode): improve stop button feedback in Pipeline Observability screen

Handle TASK_STATE.STOPPING in the control button to show "Stopping..."
with a disabled state and distinct orange styling, preventing duplicate
clicks and giving immediate visual feedback during pipeline shutdown.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(nodes): improve error handling in remote node execution

Replace fragile string-based error origin detection with proper type
checking (isinstance + error code comparison). Always send error signals
to the remote side to prevent hanging connections. Narrow exception
catch from BaseException to Exception to avoid intercepting
KeyboardInterrupt and SystemExit.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
anandray added a commit that referenced this pull request May 7, 2026
Add `head_branch == 'develop'` guard to the nightly workflow's init
job. The `branches: [develop]` filter on the trigger already enforces
this, but static analyzers (OSSF Scorecard's DangerousWorkflow rule)
can't trace that — they see workflow_run + head_sha checkout +
secrets: inherit and flag the pattern critical regardless of the
trigger filter.

The explicit branch check ties the trust boundary to develop's branch
protection (PR + code-owner review + required checks), which is the
actual safety guarantee. All downstream jobs depend on init via
needs:, so they inherit the gate without per-job changes.

Behavior is unchanged: workflow_run runs that wouldn't have fired
under the trigger filter still don't fire. Manual workflow_dispatch
remains unaffected.

Note: `workflow_run` triggers always execute the version of this
file from main (per GitHub's rules), so this fix takes effect for
runtime safety only after develop → main lands. The static-analysis
finding (#760, alert #565) closes once the Scorecard scan re-runs
against develop with this change.

Fixes #760
kwit75 pushed a commit that referenced this pull request May 7, 2026
…771)

* fix(security): explicit branch check on nightly workflow_run trigger

Add `head_branch == 'develop'` guard to the nightly workflow's init
job. The `branches: [develop]` filter on the trigger already enforces
this, but static analyzers (OSSF Scorecard's DangerousWorkflow rule)
can't trace that — they see workflow_run + head_sha checkout +
secrets: inherit and flag the pattern critical regardless of the
trigger filter.

The explicit branch check ties the trust boundary to develop's branch
protection (PR + code-owner review + required checks), which is the
actual safety guarantee. All downstream jobs depend on init via
needs:, so they inherit the gate without per-job changes.

Behavior is unchanged: workflow_run runs that wouldn't have fired
under the trigger filter still don't fire. Manual workflow_dispatch
remains unaffected.

Note: `workflow_run` triggers always execute the version of this
file from main (per GitHub's rules), so this fix takes effect for
runtime safety only after develop → main lands. The static-analysis
finding (#760, alert #565) closes once the Scorecard scan re-runs
against develop with this change.

Fixes #760

* fix(security): restrict workflow_dispatch to develop ref

Per CodeRabbit review on PR #771: the workflow_dispatch arm in
nightly.yaml accepted any ref, which combined with secrets: inherit
and write permissions creates a residual privilege-escalation path
even for trusted users. Tying manual dispatch to refs/heads/develop
makes the trust boundary uniform across both trigger paths
(workflow_run already gated to develop's branch protection chain).

Operational impact is negligible: hot-fix flows go through develop
already, and "republishing a known-good ref" practically always
means a develop commit. Anyone wanting to publish from a feature
branch must merge to develop first — which is the right policy
for SOC2 change-management evidence.

Refs #760

---------

Co-authored-by: Anand Ray <anand.ray@rocketride.ai>
anandray added a commit that referenced this pull request May 8, 2026
CodeRabbit flagged that the policy text said "admin permissions on the
repository" for the second-principal approver, which is the wrong
authorization scope. GitHub Delegated Alert Dismissal is an org-level
control: only organization owners, security managers, or holders of
explicitly delegated custom roles can approve dismissal requests.
Repo-admin alone is not sufficient.

This matches what actually worked for alert #565: kwit75 approved as a
rocketride-org owner, not via repo-admin.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
anandray added a commit that referenced this pull request May 9, 2026
…lerts

CodeRabbit flagged that Scorecard appeared in the scanning-tools list
but wasn't covered explicitly by any dismissal-flow section. In our
setup, Scorecard publishes SARIF that surfaces as code scanning alerts
in GitHub's UI (that's the channel by which alert #565 — Scorecard
DangerousWorkflowID — was dismissed via the two-person flow). Adds an
intro sentence making this coverage explicit so an auditor reading the
doc doesn't have to infer it.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
anandray added a commit that referenced this pull request May 10, 2026
* docs(security): document two-person control on alert dismissals

Adds a "Vulnerability & Alert Triage" section to SECURITY.md that codifies
the dismissal flow we already operate under and that GitHub's delegated
alert dismissal enforces:

- Request (any write maintainer, with justification)
- Approval (different admin maintainer; no self-approval)
- GitHub auto-applies dismissal after approval

Also documents:
- The four valid triage dispositions (Fix / Mitigated / False positive /
  Won't fix) and the evidence each requires
- Secret Scanning Push Protection bypass logging
- Dependabot dismissal owner + comment requirement

The flow was verified end-to-end on Scorecard alert #565 (anandray
requested with documented compensating controls, kwit75 approved as
second principal). This commit captures the policy that maps to that
audit trail; it's a documentation-only change with no code impact.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs(security): correct approver-role wording for delegated dismissal

CodeRabbit flagged that the policy text said "admin permissions on the
repository" for the second-principal approver, which is the wrong
authorization scope. GitHub Delegated Alert Dismissal is an org-level
control: only organization owners, security managers, or holders of
explicitly delegated custom roles can approve dismissal requests.
Repo-admin alone is not sufficient.

This matches what actually worked for alert #565: kwit75 approved as a
rocketride-org owner, not via repo-admin.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs(security): align Dependabot dismissal wording with delegated-dismissal model

Same fix as a42cf81 but for the Dependabot bullet on line 71.
CodeRabbit caught that "repository admins" was wrong for the same
reason: GitHub Delegated Alert Dismissal is an org-level control,
authorized by organization owners, security managers, or holders of
explicitly delegated custom roles — not repo-admin alone. The new
wording also explicitly points at the two-person request → approval
flow defined earlier in the section.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

module:nodes Python pipeline nodes module:vscode VS Code extension

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants