Attribute remote SSH session WSFS activity via command origin#5774
Merged
Conversation
The SSH server bootstrap notebook writes "RemoteSshServer" to /Workspace/.proc/self/metadata/command_origin so workspace-file activity from a remote SSH session is attributed to its own WSFS command origin instead of "PythonDriver". WSFS resolves each request to its leaf-most registered ancestor, so the SSH server subprocess and the shells it spawns inherit this origin. Best-effort: never blocks server startup if .proc is unavailable. Pairs with the WsfsOperation.CommandOrigin enum value COMMAND_ORIGIN_REMOTE_SSH_SERVER added in databricks-eng/universe. Re-homed from #5728 (fork PR by @sbauersfeld) so CI can run with OIDC. Signed-off-by: Scott Bauersfeld <scott.bauersfeld@databricks.com> Co-authored-by: Isaac
Collaborator
Integration test reportCommit: 43ef425
21 interesting tests: 13 SKIP, 7 KNOWN, 1 RECOVERED
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Re-home of #5728 (fork PR by @sbauersfeld) onto an in-repo branch so CI can run with OIDC/JFrog access — fork
pull_requestruns can't obtain an OIDC token, so the requiredtest-resultandvalidate-generatedchecks fail at thesetup-jfrogstep and the PR can never enter the merge queue. The change and authorship are unchanged (Scott is the commit author).Changes
The SSH server bootstrap notebook (
experimental/ssh/internal/client/ssh-server-bootstrap.py) writesRemoteSshServerto/Workspace/.proc/self/metadata/command_originjust before launching the SSH server.Why
The bootstrap runs as a notebook job on the cluster, so without this, all workspace-file (WSFS) activity from a remote SSH session is attributed to the generic
PythonDrivercommand origin. WSFS resolves each request to its leaf-most registered ancestor, so the SSH server subprocess and the shells it spawns inherit this origin, making that activity attributable in WSFS logs.Pairs with
WsfsOperation.CommandOriginenum valueCOMMAND_ORIGIN_REMOTE_SSH_SERVERadded in databricks-eng/universe#2127479.Tests
experimental/ssh/...unit tests andTestAccept/sshacceptance pass locally on this change merged ontomain..proc/.../command_originwrite path is exercised server-side by the WSFSTestMetadataCommandOriginunit test.Closes #5728
Co-authored-by: Scott Bauersfeld scott.bauersfeld@databricks.com