Skip to content

feat: Enhanced CLI Watcher with automatic terminal process monitoring#363

Open
vrubezhny wants to merge 1 commit into
eclipse-che:mainfrom
vrubezhny:feat-idling-interative-cli-activity-tracking
Open

feat: Enhanced CLI Watcher with automatic terminal process monitoring#363
vrubezhny wants to merge 1 commit into
eclipse-che:mainfrom
vrubezhny:feat-idling-interative-cli-activity-tracking

Conversation

@vrubezhny

Copy link
Copy Markdown
Contributor

CLI Watcher now automatically detects and tracks terminal-based development activity, distinguishing between interactive CLIs (vim, REPLs) and work processes (builds, deploys) without manual configuration.

Key Features:

  • Fully automatic: Only requires 'enabled: true' - auto-detects ALL terminal processes
  • Smart defaults: Adapts activityWindow and gracePeriod to workspace idle timeout
  • Automatic classification: Interactive (vim, python REPL) vs work (builds, deploys)
  • Backward compatible: Old configs continue working with deprecation warnings
  • Safety mechanisms: Grace period (5min), max age limit (6h), activity monitoring
  • Flexible duration parsing: Accepts 6h, 25m, 60 (seconds), or duration strings

Configuration:

  • Minimal: enabled: true (that's it!)
  • Smart defaults: Auto-calculated from workspace idle timeout
  • Time settings: checkPeriod, activityWindow, gracePeriod, maxProcessAge
  • Override options: watchedCommands (fix misclassification), ignoredCommands (skip entirely)

Technical Implementation:

  • User process filtering: Parent TTY chain analysis (filters shells, system processes)
  • Interactive detection: Foreground process group + TTY read analysis
  • Activity tracking: TTY access time (Atime) monitoring for user input
  • Multicall binary support: Uses /proc/comm for proper command detection
  • Self-skip: che-machine-exec doesn't watch itself
  • Config validation: Warns about misconfigurations without breaking

Documentation:

  • timeout/CLI-WATCHER.md - Comprehensive guide with testing scenarios
  • timeout/.noidle.minimal - Minimal config example
  • timeout/.noidle.example - Full config with all options

Issue: https://redhat.atlassian.net/browse/CRW-11505

Comment thread timeout/.noidle.example Outdated
@vrubezhny vrubezhny force-pushed the feat-idling-interative-cli-activity-tracking branch from 43e8ae4 to 6513728 Compare June 29, 2026 09:53
@vrubezhny vrubezhny requested a review from ibuziuk June 29, 2026 09:55
@tolusha

tolusha commented Jun 29, 2026

Copy link
Copy Markdown
Contributor

Hi! I'm che-ai-assistant — I help with your pull requests.

Available commands:

  • /che-ai-assistant generate-che-doc — Generate a documentation PR based on this PR's changes
  • /che-ai-assistant ok-pr-review — Run a comprehensive PR review (summary, code review, deep review, impact analysis)
  • /che-ai-assistant help — Show this help message

@tolusha

tolusha commented Jun 29, 2026

Copy link
Copy Markdown
Contributor

/che-ai-assistant ok-pr-review

Review is complete. Please check the review comments below.

@tolusha tolusha left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review Summary

Comprehensive review completed across standard code quality, deep design analysis, and system-level impact assessment.

Critical Issues (Must Fix Before Merge)

1. Hardcoded clock ticks causes incorrect process age on non-x86_64 systems

  • Location: timeout/cli-watcher.go line ~1112 in getProcessStartTime
  • Issue: clockTicks := 100 is hardcoded but can be 250 on ARM systems
  • Additionally, fmt.Sscanf errors are not checked - if parsing fails, startTicks remains 0, making processes appear ancient and incorrectly skipped
  • Impact: Grace period and max process age checks fail, potentially idling workspaces during active builds
  • Fix: Check fmt.Sscanf return value and use C.sysconf(C._SC_CLK_TCK) or read from /proc/self/auxv

2. Infinite loop risk in getMainUserProcess

  • Location: timeout/cli-watcher.go line ~1045
  • Issue: No cycle detection or maximum depth limit when walking parent chain
  • Impact: Corrupted /proc/[pid]/stat or race conditions cause goroutine hang
  • Fix: Add visited set or max depth (64) check

3. Duplicate policy logic in applyCommandPolicy vs applyDefaultPolicy

  • Location: timeout/cli-watcher.go lines 950-1011
  • Issue: Identical decision logic duplicated in two functions
  • Impact: Bug fixes must be applied twice; maintenance burden for code controlling workspace lifecycle
  • Fix: Merge into single function with InteractiveMode parameter

High Priority Warnings

4. Performance: getSystemBootTime re-reads /proc/stat hundreds of times

  • Location: timeout/cli-watcher.go line ~1119
  • Boot time never changes but is read once per process per scan
  • Fix: Cache with sync.Once

5. Documentation mismatch: per-command activityWindow not implemented

  • Location: timeout/CLI-WATCHER.md line ~561
  • Documentation shows activityWindow: 120 in watchedCommands but struct has no such field
  • Config would be silently ignored
  • Fix: Remove from docs or implement the feature

6. Breaking change: expanded alwaysIgnoredCommands

  • Location: timeout/cli-watcher.go line ~71
  • Adding watch, top, htop to exclusion list changes behavior for existing users
  • Users who configured watch will find it no longer prevents idling after upgrade
  • Fix: Warn at config load time when user explicitly lists a globally-excluded command

7. No unit tests for 698 lines of complex logic

  • Pure functions (parseDuration, applyDefaults, UnmarshalYAML, ignoreExclusions) are untested
  • Process tree walking, classification, and validation logic all untested
  • Fix: Add table-driven tests for at minimum the pure configuration functions

Minor Issues

8. Sscanf errors ignored in isInForegroundProcessGroup (safe but inconsistent)
9. Comment-code mismatch in isUserInitiatedProcess (line ~1071)

Positive Observations

✅ Excellent backward compatibility via UnmarshalYAML custom unmarshaler
✅ Smart defaults calculated from workspace idle timeout reduce config complexity
✅ Comprehensive CLI-WATCHER.md documentation with examples and test scenarios
✅ Configuration validation with warnings (doesn't override user values)
✅ Proper goroutine lifecycle management and resource cleanup

Recommendation

🔄 Request Changes - Critical issues #1 (clock ticks + Sscanf), #2 (infinite loop), and #3 (duplicate policy logic) should be fixed before merge. Issue #5 (documentation mismatch) and #6 (breaking change warning) should also be addressed to avoid user confusion. Tests (#7) would significantly improve long-term maintainability but could follow in a subsequent PR.

@vrubezhny vrubezhny force-pushed the feat-idling-interative-cli-activity-tracking branch from 6513728 to c954c98 Compare June 29, 2026 17:35
@vrubezhny

Copy link
Copy Markdown
Contributor Author

The fixes added to the issues pointed to in /che-ai-assistant 's review:

✅ Platform-portable clock ticks detection (ARM/x86 support)
✅ Infinite loop protection in getMainUserProcess
✅ Merged duplicate policy functions
✅ Boot time cached at startup (performance)
✅ Documentation fix (removed per-command activityWindow)
✅ Warning for globally-excluded commands
✅ Comprehensive unit tests (59 test cases)
✅ Error checking in isInForegroundProcessGroup
✅ Clarified comments in isUserInitiatedProcess

Files changed:

timeout/cli-watcher.go - All fixes implemented
timeout/cli-watcher_test.go - 59 new test cases (NEW)
timeout/CLI-WATCHER.md - Documentation updates + developer testing section
Plus existing: .noidle.example, .noidle.minimal

@vrubezhny

vrubezhny commented Jun 29, 2026

Copy link
Copy Markdown
Contributor Author

/che-ai-assistant ok-pr-review

Review is complete. Please check the review comments below.

@vrubezhny vrubezhny requested a review from tolusha June 29, 2026 18:16

@tolusha tolusha left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review Summary

Critical Issues (must fix before merge):

  1. Big-endian byte order bug in detectClockTicks() - will silently corrupt process age calculations on s390x
  2. YAML parse error swallowing - breaks error handling contract, leaves watcher in silent no-op state on first-load failure

Warnings (should fix):

  • Duration parsing ambiguity (typos like "30min" silently become "30s")
  • Unnecessary unsafe package import
  • Dead code in hasRecentActivity()
  • Missing unit tests for core detection logic
  • init() side effects break portability (non-Linux CI)
  • Repeated /proc/[pid]/stat parsing (3x per process)
  • Verbose INFO logging (every tick logged, could flood logs)
  • Behavioral change needs upgrade documentation

Positive Highlights:

  • Excellent backward compatibility design with deprecation warnings
  • Smart defaults that adapt to workspace idle timeout
  • Comprehensive documentation and config parsing test coverage
  • Defensive programming with cycle detection and error handling

See inline comments for details and suggested fixes.

Comment thread timeout/cli-watcher.go
Comment thread timeout/cli-watcher.go Outdated
Comment thread timeout/cli-watcher.go
Comment thread timeout/cli-watcher.go
Comment thread timeout/cli-watcher.go Outdated
Comment thread timeout/cli-watcher.go Outdated
Comment thread timeout/cli-watcher.go
Comment thread timeout/CLI-WATCHER.md
@tolusha

tolusha commented Jun 29, 2026

Copy link
Copy Markdown
Contributor

Additional Observations

Missing Test Coverage for Core Logic

The test file has excellent coverage for configuration parsing (19.1% overall coverage), but the most critical functions have no tests:

  • applyPolicy() - the core decision function
  • isUserInitiatedProcess() - the key filter
  • isInteractiveProcess() - the classification logic
  • getMainUserProcess() - process tree walking

Consider adding at least tests for applyPolicy() since it is a pure function where the process-dependent parts could be abstracted behind an interface for testability.

Performance Optimization Opportunity

getParentPID(), isInForegroundProcessGroup(), and getProcessStartTime() each independently read and parse /proc/[pid]/stat. For a process that passes all checks, this file is read 3 times. Consider creating a single parseProcStat(pid) function that returns a struct with ppid, pgrp, tpgid, and startTicks, then refactor the individual functions to use this shared parser.

This would eliminate redundant I/O on the hot path and make the code cleaner.

@vrubezhny vrubezhny force-pushed the feat-idling-interative-cli-activity-tracking branch from c954c98 to 3f961ce Compare June 29, 2026 23:21
@tolusha

tolusha commented Jun 30, 2026

Copy link
Copy Markdown
Contributor

/che-ai-assistant generate-che-doc

Created documentation PR: eclipse-che/che-docs#3137

@tolusha

tolusha commented Jun 30, 2026

Copy link
Copy Markdown
Contributor

@vrubezhny
Is there image available for testing ?

@tolusha

tolusha commented Jun 30, 2026

Copy link
Copy Markdown
Contributor

/che-ai-assistant check-pr-test-failures

Review is complete. Please check the review comments below.

@tolusha

tolusha commented Jun 30, 2026

Copy link
Copy Markdown
Contributor

PR #363 -- eclipse-che/che-machine-exec -- feat: Enhanced CLI Watcher with automatic terminal process monitoring

Check Category Status Duration CI
build Build FAILED 11m9s GHA
DEPENDENCIES.md file validation Validation PASSED 46s GHA
License Validation PASSED 17s GHA
WIP External PASSED External
add-link External PASSED 3s GHA
eclipsefdn/eca External PASSED External
unit-tests Unit Tests PASSED 1m33s GHA
workspaces.openshift.com External PASSED External

FAILED build (GHA)

The build fails during the multi-platform Docker image build, specifically on the linux/arm64 platform during the yarn installation and build step. The error is caused by an illegal instruction (SIGILL) when running under QEMU emulation: qemu: uncaught target signal 4 (Illegal instruction) - core dumped. This typically occurs when QEMU ARM64 emulation encounters incompatible CPU instructions, causing the process to fail with exit code 132 during the execution of yarn install --immutable && yarn run build.

Log: build
Rerun: Re-run failed jobs


Analysis Summary:
The only failing check is the multi-platform Docker build. The linux/amd64 build appears to succeed, but the linux/arm64 build fails due to QEMU emulation issues. This is likely an infrastructure/compatibility issue rather than a code problem. All other checks including unit tests, validation, and license checks pass successfully.

@vrubezhny vrubezhny force-pushed the feat-idling-interative-cli-activity-tracking branch from d08a2d4 to 919ce97 Compare June 30, 2026 12:23
@vrubezhny

Copy link
Copy Markdown
Contributor Author

@tolusha I've made another validation pass and added some more tests.

I thought some GH Action had been creating an image for testing and providing an URL to it... Don't see it anymore... Or do I mixing the things up?

@vrubezhny

Copy link
Copy Markdown
Contributor Author

Building the docker image locally succeeds for me. And as far as I can see the build on GH is passing with no errors now - so it looks like a temporary glitch..

If you need I can build and publish an image from this PR.

But I'd still like to add some changes to the Dockerfile in some following PRs to fix the warnings and some cloud-shell related things, f.i.:

=> [cloud_shell_builder 5/5] RUN yarn install --immutable &&     yarn run build &&     mkdir /app &&     cp -rf index.html dist node_modules /app                                                           26.0s
  • we don't need copying node_modules for the cloud-shell as it's bundled and has all the required dependencies bundled together - so it just unnecessarily to have that folder on the image

Plus some more minor issues

@vrubezhny

Copy link
Copy Markdown
Contributor Author

/che-ai-assistant help

@tolusha

tolusha commented Jun 30, 2026

Copy link
Copy Markdown
Contributor

Hi! I'm che-ai-assistant — I help with your pull requests.

Available commands:

  • /che-ai-assistant generate-che-doc — Generate a documentation PR based on this PR's changes
  • /che-ai-assistant ok-pr-review — Run a comprehensive PR review (summary, code review, deep review, impact analysis)
  • /che-ai-assistant check-pr-test-failures — Analyze failing CI checks, identify root causes, and suggest fixes
  • /che-ai-assistant help — Show this help message

@vrubezhny

Copy link
Copy Markdown
Contributor Author

@vrubezhny Is there image available for testing ?

You can try it from https://quay.io/repository/vrubezhny/che-machine-exec:

# Pull the image
docker pull quay.io/vrubezhny/che-machine-exec:cli-watcher-test

@vrubezhny

vrubezhny commented Jul 1, 2026

Copy link
Copy Markdown
Contributor Author

/che-ai-assistant ok-pr-review

Review is complete. Please check the review comments below.

@tolusha tolusha left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review Summary

Comprehensive review completed (standard + deep + impact analysis). The PR implements a valuable feature with good defensive programming and backward compatibility. However, there are critical issues that need to be addressed before merge.

Critical Issues

  1. ignoredCommands config broken - documented but non-functional (see inline comments)
  2. findUpward symlink asymmetry - config search fails if PROJECTS_ROOT is a symlink
  3. Atime dependency undocumented - interactive detection silently breaks with noatime mounts

Key Findings

  • Behavioral change (watch ALL processes) well-documented for users but needs platform admin awareness
  • Core decision pipeline lacks test coverage (isWatchedProcessRunning, isUserInitiatedProcess, etc.)
  • Configuration struct conflates concerns (YAML input + resolved state + bookkeeping)

Positive Highlights

  • Excellent backward compatibility via custom UnmarshalYAML
  • Smart adaptive defaults from workspace idle timeout
  • Defensive process tree traversal with cycle detection
  • All prior review feedback addressed

See inline comments for specific fixes and recommendations.

Verdict: 🔄 Request Changes - Fix critical bugs (#1-3) before merge.

Comment thread timeout/cli-watcher.go Outdated
Comment thread timeout/cli-watcher.go
Comment thread timeout/cli-watcher.go Outdated
Comment thread timeout/cli-watcher.go
@tolusha

tolusha commented Jul 1, 2026

Copy link
Copy Markdown
Contributor

@vrubezhny

docker pull quay.io/vrubezhny/che-machine-exec:cli-watcher-test

I'm not sure how to test using this image. I thought it would be a VS Code image.

@vrubezhny

Copy link
Copy Markdown
Contributor Author

@vrubezhny

docker pull quay.io/vrubezhny/che-machine-exec:cli-watcher-test

I'm not sure how to test using this image. I thought it would be a VS Code image.

No, I have no VSCode image built from this change (need to find instructions on how to create one).

I'm testing it by starting a WS from https://workspaces.openshift.com#https://github.com/vrubezhny/che-machine-exec/tree/feat-idling-interative-cli-activity-tracking, then executing in terminal (Run Task -> devfile):

  • build cloud-shell
  • compile

Then exporting LOG_LEVEL=debug and running ./che-machine-exec --url 0.0.0.0:5555 from /projects/che-machine-exec folder (it would be default folder in terminal).

then adding .noidle config - minimal or with some preferences, like setting up some minutes for time-values to make it to start/stop watching earlier than defults.

Then in other terminal tab (I split it to have both tabs visible) I execute interactive (like more or claude) commands and non-interactive (like sleep <timeout> or building something (./compile.sh) - and see reactions in che-machine-exec output.

@vrubezhny vrubezhny force-pushed the feat-idling-interative-cli-activity-tracking branch from 919ce97 to afaca0b Compare July 1, 2026 16:18
CLI Watcher now automatically detects and tracks terminal-based development
activity, distinguishing between interactive CLIs (vim, REPLs) and work
processes (builds, deploys) without manual configuration.

Key Features:
- Fully automatic: Only requires 'enabled: true' - auto-detects ALL terminal processes
- Smart defaults: Adapts activityWindow and gracePeriod to workspace idle timeout
- Automatic classification: Interactive (vim, python REPL) vs work (builds, deploys)
- Backward compatible: Old configs continue working with deprecation warnings
- Safety mechanisms: Grace period (5min), max age limit (6h), activity monitoring
- Flexible duration parsing: Accepts 6h, 25m, 60 (seconds), or duration strings

Configuration:
- Minimal: enabled: true (that's it!)
- Smart defaults: Auto-calculated from workspace idle timeout
- Time settings: checkPeriod, activityWindow, gracePeriod, maxProcessAge
- Override options: watchedCommands (fix misclassification), ignoredCommands (skip entirely)

Technical Implementation:
- User process filtering: Parent TTY chain analysis (filters shells, system processes)
- Interactive detection: Foreground process group + TTY read analysis
- Activity tracking: TTY access time (Atime) monitoring for user input
- Multicall binary support: Uses /proc/comm for proper command detection
- Self-skip: che-machine-exec doesn't watch itself
- Config validation: Warns about misconfigurations without breaking
- Platform portability: Detects clock ticks from /proc/self/auxv with platform-specific fallbacks
- Performance: Boot time and clock ticks cached at startup (not per process)
- Robustness: Infinite loop protection, cycle detection, error checking on all parsing

Code Quality Improvements:
- Added comprehensive unit tests (79 test cases covering pure functions and concurrency)
- Tests cover: duration parsing, YAML unmarshaling, defaults, validation, platform detection, applyPolicy, race conditions
- Merged duplicate policy logic into single unified function
- Added error checking to all fmt.Sscanf calls for consistency
- Fixed comment-code mismatches for clarity
- Warning on breaking changes (globally-excluded commands)

Test Coverage:
- Added concurrent Start/Stop tests (catches race conditions)
- Added clock skew tests (documents negative age handling)
- Enhanced /proc parsing integration tests
- All 79 tests pass with -race flag
- No data races detected

All issues verified with test-driven approach: wrote tests first to
detect bugs, then fixed implementation. Thread-safety verified with
Go race detector (-race flag).

Documentation:
- timeout/CLI-WATCHER.md - Comprehensive guide with testing scenarios
- timeout/.noidle.minimal - Minimal config example
- timeout/.noidle.example - Full config with all options
- Added auto-detection test scenario for manual testing
- Added comprehensive upgrade guide for breaking behavioral changes
- Developer testing section added

Issue: https://redhat.atlassian.net/browse/CRW-11505

Assisted-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Signed-off-by: Victor Rubezhny <vrubezhny@redhat.com>
@vrubezhny

Copy link
Copy Markdown
Contributor Author

I'm not sure how to test using this image. I thought it would be a VS Code image.

CheCode Editor images for the PR are published ✨

Editor amd64: quay.io/che-incubator-pull-requests/che-code:pr-742-amd64
Editor arm64: quay.io/che-incubator-pull-requests/che-code:pr-742-arm64
Dev image: quay.io/che-incubator-pull-requests/che-code-dev:pr-742-dev-amd64

@vrubezhny vrubezhny requested a review from tolusha July 1, 2026 23:25
@sbouchet

sbouchet commented Jul 2, 2026

Copy link
Copy Markdown
Contributor

same/similar error as #370 :
image

@tolusha

tolusha commented Jul 2, 2026

Copy link
Copy Markdown
Contributor

I have the same error

Activity tracker: Failed to ping che-machine-exec: fetch failed
2026-07-02 09:12:38.591 [error] Activating extension eclipse-che.terminal failed due to an error:
2026-07-02 09:12:38.591 [error] Error: Failed to connect to machine-exec after 30 attempts: 
	at e.init (/checode/checode-linux-libc/ubi9/extensions/che-terminal/dist/extension.js:44:850)
	at process.processTicksAndRejections (node:internal/process/task_queues:103:5)
	at async jh (/checode/checode-linux-libc/ubi9/extensions/che-terminal/dist/extension.js:44:3959)
	at async dS.n (file:///checode/checode-linux-libc/ubi9/out/vs/workbench/api/node/extensionHostProcess.js:116:13398)
	at async dS.m (file:///checode/checode-linux-libc/ubi9/out/vs/workbench/api/node/extensionHostProcess.js:116:13361)
	at async dS.l (file:///checode/checode-linux-libc/ubi9/out/vs/workbench/api/node/extensionHostProcess.js:116:12817)
2026-07-02 09:12:38.592 [info] Eager extensions activated

@sbouchet

sbouchet commented Jul 2, 2026

Copy link
Copy Markdown
Contributor

same comment than #370 (review) . test using devspace-code downstream editor definition.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants