Commit a934f8c
authored
experimental/ssh: surface connect failures instead of hanging (#5456)
## Why
Originated from a customer case: `databricks ssh connect` to a dedicated
cluster whose
Docker container image was missing an OpenSSH **server**
(`/usr/sbin/sshd`). The failure
surfaced terribly — either a generic `server metadata error /
metadata.json doesn't exist`,
or the client just **hung** (the local `ssh` waited on its 360s
`ConnectTimeout`). The root
cause was buried in the cluster's job-run logs.
This PR improves the diagnostics for `ssh connect` failures.
## What
1. **Surface bootstrap job-run errors.** When the SSH server bootstrap
job reaches a
terminal/failed state, fetch the run's state message, notebook
error/trace, and run-page
URL and show them — both when the task terminates before reaching
RUNNING and when it dies
after, during metadata polling.
(`experimental/ssh/internal/client/client.go`)
2. **Guard against hangs when the server is up but the handshake never
completes.** If the
container image has no `sshd`, the server can't launch `/usr/sbin/sshd`
on connect and
**holds the websocket open**, so both proxy loops block forever. The
client now runs the
proxy loops in the background and aborts after a handshake timeout (no
server response)
with an actionable hint, and also exits promptly when the server *does*
close the
connection. (`experimental/ssh/internal/proxy/client.go`)
3. **openssh-server hint** when `ssh` exits with its connection-failure
code (255).
(`spawnSSHClient`)
## Tests
- `client_internal_test.go`: failed-run message formatting (state
message + trace + run URL),
truncation, terminal-state detection (SDK mocks).
- `proxy/client_server_test.go`: fast exit when the server closes the
connection; abort on the
handshake timeout when the server sends nothing.
All `experimental/ssh/...` tests pass; lint clean.
## Status / follow-ups (WIP)
- The missing-`sshd` path still incurs a ~30s handshake-timeout wait
before failing. The
cleaner fix is a **server-side pre-flight `sshd` check** (fail the
bootstrap job immediately
with a clear message), tracked separately — that would turn this case
into an instant,
clear job failure handled by improvement #1.
- The handshake timeout (30s) is conservative and currently a package
constant; could be
shortened or made configurable.
- The proxy error and the outer 255 hint are slightly redundant; may
consolidate.
This pull request and its description were written by Isaac.1 parent 7e85efd commit a934f8c
6 files changed
Lines changed: 553 additions & 29 deletions
File tree
- experimental/ssh
- internal
- client
- proxy
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
| 45 | + | |
| 46 | + | |
| 47 | + | |
| 48 | + | |
| 49 | + | |
| 50 | + | |
| 51 | + | |
| 52 | + | |
| 53 | + | |
| 54 | + | |
| 55 | + | |
| 56 | + | |
| 57 | + | |
| 58 | + | |
| 59 | + | |
| 60 | + | |
| 61 | + | |
| 62 | + | |
| 63 | + | |
| 64 | + | |
| 65 | + | |
| 66 | + | |
| 67 | + | |
| 68 | + | |
| 69 | + | |
| 70 | + | |
| 71 | + | |
| 72 | + | |
| 73 | + | |
| 74 | + | |
| 75 | + | |
| 76 | + | |
| 77 | + | |
| 78 | + | |
| 79 | + | |
| 80 | + | |
| 81 | + | |
| 82 | + | |
| 83 | + | |
| 84 | + | |
| 85 | + | |
| 86 | + | |
| 87 | + | |
| 88 | + | |
| 89 | + | |
| 90 | + | |
| 91 | + | |
| 92 | + | |
| 93 | + | |
| 94 | + | |
| 95 | + | |
| 96 | + | |
| 97 | + | |
| 98 | + | |
| 99 | + | |
| 100 | + | |
| 101 | + | |
| 102 | + | |
| 103 | + | |
| 104 | + | |
| 105 | + | |
| 106 | + | |
| 107 | + | |
| 108 | + | |
| 109 | + | |
| 110 | + | |
| 111 | + | |
| 112 | + | |
| 113 | + | |
| 114 | + | |
| 115 | + | |
| 116 | + | |
| 117 | + | |
| 118 | + | |
| 119 | + | |
| 120 | + | |
| 121 | + | |
| 122 | + | |
| 123 | + | |
| 124 | + | |
| 125 | + | |
| 126 | + | |
| 127 | + | |
| 128 | + | |
| 129 | + | |
| 130 | + | |
| 131 | + | |
| 132 | + | |
| 133 | + | |
| 134 | + | |
| 135 | + | |
| 136 | + | |
| 137 | + | |
| 138 | + | |
| 139 | + | |
| 140 | + | |
| 141 | + | |
| 142 | + | |
| 143 | + | |
| 144 | + | |
| 145 | + | |
| 146 | + | |
| 147 | + | |
| 148 | + | |
| 149 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
23 | 23 | | |
24 | 24 | | |
25 | 25 | | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
26 | 29 | | |
27 | 30 | | |
28 | 31 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
489 | 489 | | |
490 | 490 | | |
491 | 491 | | |
492 | | - | |
| 492 | + | |
| 493 | + | |
| 494 | + | |
| 495 | + | |
493 | 496 | | |
494 | 497 | | |
495 | 498 | | |
496 | | - | |
| 499 | + | |
497 | 500 | | |
498 | 501 | | |
499 | 502 | | |
500 | 503 | | |
501 | | - | |
| 504 | + | |
502 | 505 | | |
503 | 506 | | |
504 | 507 | | |
| |||
514 | 517 | | |
515 | 518 | | |
516 | 519 | | |
517 | | - | |
| 520 | + | |
518 | 521 | | |
519 | 522 | | |
520 | 523 | | |
| |||
569 | 572 | | |
570 | 573 | | |
571 | 574 | | |
572 | | - | |
| 575 | + | |
573 | 576 | | |
574 | 577 | | |
575 | 578 | | |
576 | 579 | | |
577 | | - | |
| 580 | + | |
| 581 | + | |
578 | 582 | | |
579 | 583 | | |
580 | 584 | | |
| |||
610 | 614 | | |
611 | 615 | | |
612 | 616 | | |
613 | | - | |
| 617 | + | |
| 618 | + | |
| 619 | + | |
| 620 | + | |
| 621 | + | |
| 622 | + | |
| 623 | + | |
| 624 | + | |
| 625 | + | |
| 626 | + | |
| 627 | + | |
| 628 | + | |
614 | 629 | | |
615 | 630 | | |
616 | 631 | | |
| |||
691 | 706 | | |
692 | 707 | | |
693 | 708 | | |
694 | | - | |
| 709 | + | |
| 710 | + | |
695 | 711 | | |
696 | | - | |
| 712 | + | |
697 | 713 | | |
698 | 714 | | |
699 | 715 | | |
| |||
703 | 719 | | |
704 | 720 | | |
705 | 721 | | |
| 722 | + | |
| 723 | + | |
| 724 | + | |
| 725 | + | |
| 726 | + | |
| 727 | + | |
| 728 | + | |
| 729 | + | |
| 730 | + | |
| 731 | + | |
| 732 | + | |
| 733 | + | |
| 734 | + | |
| 735 | + | |
| 736 | + | |
| 737 | + | |
| 738 | + | |
| 739 | + | |
| 740 | + | |
| 741 | + | |
| 742 | + | |
| 743 | + | |
| 744 | + | |
| 745 | + | |
| 746 | + | |
| 747 | + | |
| 748 | + | |
| 749 | + | |
| 750 | + | |
| 751 | + | |
| 752 | + | |
| 753 | + | |
| 754 | + | |
| 755 | + | |
| 756 | + | |
| 757 | + | |
| 758 | + | |
| 759 | + | |
| 760 | + | |
| 761 | + | |
| 762 | + | |
| 763 | + | |
| 764 | + | |
| 765 | + | |
| 766 | + | |
| 767 | + | |
| 768 | + | |
| 769 | + | |
| 770 | + | |
| 771 | + | |
| 772 | + | |
| 773 | + | |
| 774 | + | |
| 775 | + | |
| 776 | + | |
| 777 | + | |
| 778 | + | |
| 779 | + | |
| 780 | + | |
| 781 | + | |
| 782 | + | |
| 783 | + | |
| 784 | + | |
| 785 | + | |
| 786 | + | |
| 787 | + | |
| 788 | + | |
| 789 | + | |
| 790 | + | |
| 791 | + | |
| 792 | + | |
| 793 | + | |
| 794 | + | |
| 795 | + | |
| 796 | + | |
| 797 | + | |
| 798 | + | |
| 799 | + | |
| 800 | + | |
| 801 | + | |
| 802 | + | |
| 803 | + | |
| 804 | + | |
| 805 | + | |
| 806 | + | |
| 807 | + | |
| 808 | + | |
| 809 | + | |
706 | 810 | | |
707 | 811 | | |
708 | 812 | | |
| |||
712 | 816 | | |
713 | 817 | | |
714 | 818 | | |
715 | | - | |
| 819 | + | |
716 | 820 | | |
717 | 821 | | |
718 | 822 | | |
| |||
729 | 833 | | |
730 | 834 | | |
731 | 835 | | |
732 | | - | |
| 836 | + | |
| 837 | + | |
| 838 | + | |
| 839 | + | |
| 840 | + | |
| 841 | + | |
| 842 | + | |
| 843 | + | |
733 | 844 | | |
734 | 845 | | |
735 | | - | |
| 846 | + | |
736 | 847 | | |
737 | 848 | | |
738 | 849 | | |
| |||
0 commit comments