Skip to content

Implement Standalone Nexus Operations#1461

Open
VegetarianOrc wants to merge 34 commits into
mainfrom
amazzeo/sano
Open

Implement Standalone Nexus Operations#1461
VegetarianOrc wants to merge 34 commits into
mainfrom
amazzeo/sano

Conversation

@VegetarianOrc
Copy link
Copy Markdown
Contributor

@VegetarianOrc VegetarianOrc commented Apr 17, 2026

I recommend viewing this diff with the histogram diff algorithm to reduce noise and see the additions more clearly. You can view that by checking out the branch and running git diff --histogram origin/main

Add Standalone Nexus Operation Support

API Changes

  • Client.create_nexus_client() - Create a typed NexusClient from an endpoint & nexusrpc.Service
  • Client.list_nexus_operations() - List operations via visibility query
  • Client.count_nexus_operations() - Count operations via visibility query
  • Client.get_nexus_operation_handle() - Create a typed NexusOperationHandle to an existing operation
  • NexusClient.start_operation() - Start a Nexus Operation and return a NexusOperationHandle.
  • NexusClient.execute_operation() - Same as start_operation, but poll the handle for the final result.
  • NexusOperationHandle.result() / .describe() / .cancel() / .terminate() / property accessors - interact with an existing operation.
  • Supporting types like input types, enums, and errors added as appropriate.

Interceptor Support

The following methods have been added to OutboundInterceptor

  • start_nexus_operation()
  • describe_nexus_operation()
  • cancel_nexus_operation()
  • terminate_nexus_operation()
  • list_nexus_operations()
  • count_nexus_operations()

Testing

Added integration test suite tests/nexus/test_standalone_operations.py (858 lines) covering start, get result, describe, cancel, terminate, list, count, ID conflict/reuse policies, and interceptor integration

Misc Changes

  • Update ruff target version to py310 to match the minimum support Python version.

Copy link
Copy Markdown

@Evanthx Evanthx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks pretty good overall!

Comment thread temporalio/api/compute/v1/config_pb2.pyi
Comment thread temporalio/client.py Outdated
Comment thread temporalio/client.py
…n defaults for id_reuse_policy and id_conflict_policy. Update integration tests with better typing and new assertions for the newly interceptable get_nexus_operation_result
Comment thread temporalio/client.py
…ch start/execute operation. Fix hardcoded retryable=True in fallback error serialization. Add type tests and rename test/nexus/test_type_errors.py to ensure that the type test file properly executes nexus tests.
Copy link
Copy Markdown
Contributor

@THardy98 THardy98 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did a quick passthrough

Comment thread temporalio/client.py
Comment on lines +4825 to +4828
_metadata_decoded: bool = field(
kw_only=True, default=False, compare=False, repr=False
)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if we need a field for this. I know we did this for workflow description, but I think it's reasonable to just lazy decode on demand without mutating _static_details or _static_summary.

nbd either way, thought I'd mention it

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Happy to remove if we feel like it's an over optimization

Comment thread temporalio/client.py
@VegetarianOrc VegetarianOrc marked this pull request as ready for review May 12, 2026 17:45
@VegetarianOrc VegetarianOrc requested a review from a team as a code owner May 12, 2026 17:45
Copy link
Copy Markdown

@Evanthx Evanthx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a few minor comments

Comment thread temporalio/client.py Outdated
Comment thread temporalio/client.py

def __init__(self, *, cause: BaseException) -> None:
"""Create nexus operation failure error."""
super().__init__("Nexus operation execution failed")
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should you add "cause" to this to make a more helpful error? And I think we usually capitolize Nexus

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This matches the existing patterns here where the message is passed to the super constructor but the cause is set via self.__cause__.

Will fix the capitalization though!

Comment thread temporalio/nexus/_operation_context.py Outdated
) -> list[temporalio.api.common.v1.Link.WorkflowEvent]:
event_links = []
for inbound_link in self.nexus_context.inbound_links:
if inbound_link.type != _WORKFLOW_EVENT_LINK_TYPE:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we only accepting _WORKFLOW_EVENT_LINK_TYPE

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't change underlying behavior, but suppresses a warning log when a link that doesn't match the expected regex in nexus_link_to_workflow_event. In that function, invalid urls return None and are filtered out here.

This was unintentionally added to this PR from a standalone nexus operation -> standalone activity prototype I was experimenting with. I've removed this change here so we can address link support when support we cross that bridge.

# Task completion should never be dropped in case of cancellation.
# The Rust future in core must complete for shutdown to happen without
# hanging.
async def _complete_task(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this needed for stand alone nexus operations or is this fixing a different bug?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a different bug, but implementing standalone nexus operations surfaced the race condition. Specifically starting an operation and then terminating it before the completion has been sent to Core. Previously, this wasn't caught as the cancellation propagation from a caller workflow was slow enough to not trigger the race.

Happy to move it to a separate PR to limit the scope if you'd like.

Copy link
Copy Markdown
Contributor

@THardy98 THardy98 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Client impl and tests for SANO lgtm, though you may want to get additional approval from nexus folks and @tconley1428 - i wouldn't merge with only my approval

Comment on lines +436 to +439
assert desc.blocked_reason is None or isinstance(desc.blocked_reason, str)
assert desc.last_attempt_failure is None or isinstance(
desc.last_attempt_failure, BaseException
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason why we are asserting on multiple conditions?

i.e. blocked_reason is None or str

I don't see why it would ever be blocked or why last_attempt_failure would be None

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No reason, I've narrowed these assertions now.

Both blocked_reason and last_attempt_failure are expected to be None since nothing should go wrong here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants