Ssh fix#949
Conversation
|
@vrbagalkote, this is the initial patchset to enable ssh way to instead of console .. even it has fallback mechanism if OS not up (ssh is not working ) |
1de7a4a to
a34f6c6
Compare
4759328 to
7b3a85e
Compare
Add new SSH-based communication layer for op-test framework: - OpTestSSHConnection: Direct SSH connection management - OpTestCommandExecutor: Command execution with timeout handling - OpTestConnectionManager: Unified connection management - Update Exceptions.py with SSH exception classes - Integrate into OpTestHost for seamless operation - Add paramiko dependency for SSH support This provides the foundation for SSH-first architecture, enabling more reliable communication with HMC/LPAR systems. Signed-off-by: Praveen K Pandey <praveen@linux.vnet.ibm.com>
Add migration script to help transition test files to new SSH architecture. Includes automated test file migration capabilities. Signed-off-by: Praveen K Pandey <praveen@linux.vnet.ibm.com>
Enable direct SSH command execution for HMC operations. Implement SSH-first approach for system state management. Make HMC console connection lazy-loaded only when needed. Signed-off-by: Praveen K Pandey <praveen@linux.vnet.ibm.com>
Update OpTestSysinfo to use SSH for data collection. Improve sysinfo output formatting and command output display. Signed-off-by: Praveen K Pandey <praveen@linux.vnet.ibm.com>
Change SSH prompt from console-expect to ssh-expect for clarity. Initialize SSH pty automatically when accessed. Handle SSH connection failures during HMC initialization gracefully. Signed-off-by: Praveen K Pandey <praveen@linux.vnet.ibm.com>
Update test files to use new SSH architecture for improved reliability. Signed-off-by: Praveen K Pandey <praveen@linux.vnet.ibm.com>
3b0baae to
0055d0e
Compare
|
@PraveenPenguin was there an existing issue with pexpect way,, any reason to migrate to ssh way ? |
Yes, this change is mainly intended to gradually reduce our dependency on pexpect. In this initial patch set, we are trying to limit console-based interactions wherever feasible and instead run commands over SSH, which should help in minimising potential breakages and improving overall stability. |
|
@PraveenPenguin |
@SACHIN-BAPPALIGE Can you please share the logs of the tests that were run. If it has some sensitive info, please share it via slack. |
|
@vrbagalkot Please find the attached logs for Fadump case [ssh-expect]#sh -c 'echo c > /proc/sysrq-trigger' |
e37b442 to
4a1a59f
Compare
- Add 60s initial delay before first SSH attempt - Increase per-attempt SSH timeout from 10s to 30s - System needs more time to establish SSH after crash/reboot - Improves reliability of SSH reconnection detection Fix console thread logging for multiple kdump tests: When running a test suite with multiple kdump tests, console thread logs (from 'echo c' crash trigger) were only captured for the first test case. Subsequent tests showed no console output, making crash/reboot debugging impossible. Root cause: Console connections were not being properly cleaned up between tests, causing subsequent tests to attempt reusing stale/closed console connections. Changes: - OpTestKernelDump.py: Add explicit console deactivation before each test with 2-second delay to ensure HMC fully releases the console - OpTestUtil.py: Add console cleanup at monitoring thread start to ensure fresh connections This ensures each test in a suite gets a clean, active console connection and all crash/reboot sequences are properly captured in logs. Fix SSH vs Console usage in kdump tests: Tests were incorrectly using console (console-expect prompt) instead of SSH for pre-crash operations, causing unnecessary console connections and missing proper SSH logging. Changes: - OpTestKernelDump.py: Use host_run_command() and run_command_direct() for crash trigger instead of pty.sendline() to avoid console connections - OpTestInstallUtil.py: Use SSH direct command for reboot instead of console pty to avoid triggering console-expect prompt - OpTestSSH.py: Enhanced logging in run_command_direct() to INFO level for better visibility of SSH command execution and output This ensures all pre-crash operations use SSH (ssh-expect) and console is only used for monitoring crash/reboot sequences, providing proper separation of concerns and complete logging. Signed-off-by: Praveen K Pandey <praveen@linux.vnet.ibm.com>
6e382f7 to
3cbe0ef
Compare
- Fix console connection caching issue in OpTestHMC.connect() * Validate file descriptor before returning cached connection * Force reconnection if cached pty has invalid fd or is not alive * Prevents reusing broken pexpect Spawn objects across retries - Fix missing HMC dumprestart command execution * Change from console-based to direct SSH command execution * Use cv_HMC.ssh.run_command_direct() instead of cv_HMC.run_command() * Bypasses broken console connection entirely * Adds proper logging to show command execution - Add vmcore directory cleanup at test start * Remove old crash directories before each test run * Ensures clean test environment and proper test isolation * Handles both local and remote (net) dump locations - Include previous console monitoring and SSH improvements * Console retry logic with proper cleanup * Direct SSH command execution for reliability * Thread synchronization improvements These fixes resolve the child_fd=-1 console connection failures and ensure HMC crash tests actually trigger kernel dumps.
No description provided.