Skip to content

Add resource monitor#58

Merged
jmcphers merged 13 commits into
mainfrom
feature/resource-usage
Jan 6, 2026
Merged

Add resource monitor#58
jmcphers merged 13 commits into
mainfrom
feature/resource-usage

Conversation

@jmcphers
Copy link
Copy Markdown
Contributor

@jmcphers jmcphers commented Jan 6, 2026

Adds resource monitoring. By default, resources are monitored and reported every 1 second. Resource usage information is emitted on the kernel's websocket as kernel messages (rather than Jupyter messages, but on the same protocol), and is also returned when querying the active session.

The interval at which resources are collected is configurable (both at the command line and in settings), and can be set to 0 to disable resource monitoring entirely if desired for performance reasons.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds comprehensive resource monitoring capabilities to Kallichore, enabling real-time tracking of kernel resource usage. The implementation monitors CPU usage, memory consumption, and thread counts for kernel processes and their children. By default, resources are sampled every 1 second and can be configured via command-line arguments or runtime settings. Setting the interval to 0 disables monitoring entirely.

  • Adds a global resource monitor that periodically collects metrics for all active kernel sessions
  • Exposes resource usage data through both WebSocket messages and HTTP session queries
  • Provides configuration API to adjust sampling intervals dynamically (minimum 100ms when enabled)

Reviewed changes

Copilot reviewed 17 out of 18 changed files in this pull request and generated 17 comments.

Show a summary per file
File Description
kallichore.json Adds resourceUsage schema and resource_sample_interval_ms configuration field to OpenAPI spec
crates/kcshared/src/kernel_message.rs Defines ResourceUpdate struct and adds ResourceUsage variant to KernelMessage enum
crates/kcserver/tests/resource_usage_test.rs Adds integration tests verifying resource usage data in HTTP responses and WebSocket messages
crates/kcserver/src/server.rs Threads resource_sample_interval_ms parameter through server initialization and configuration updates
crates/kcserver/src/resource_monitor.rs Implements global resource monitoring task with process tree traversal and metric collection
crates/kcserver/src/main.rs Adds --resource-sample-interval CLI argument with 1000ms default
crates/kcserver/src/lib.rs Exports new resource_monitor module
crates/kcserver/src/kernel_state.rs Adds resource_usage field to store latest metrics
crates/kcserver/src/kernel_session/mod.rs Includes resource_usage in session info responses
crates/kcserver/Cargo.toml Bumps version to 0.1.62
crates/kallichore_api/src/models.rs Adds ResourceUsage model and updates ActiveSession and ServerConfiguration models
crates/kallichore_api/docs/* Adds documentation for new API models
crates/kallichore_api/api/openapi.yaml Updates OpenAPI specification with resource usage schemas
crates/kallichore_api/README.md Updates build date metadata
Cargo.lock Updates version lock for kcserver

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread crates/kallichore_api/api/openapi.yaml Outdated
Comment thread crates/kallichore_api/src/models.rs Outdated
#[derive(Debug, Clone, PartialEq, serde::Serialize, serde::Deserialize, validator::Validate)]
#[cfg_attr(feature = "conversion", derive(frunk::LabelledGeneric))]
pub struct ResourceUsage {
/// The percentage of CPU used by the kernel process
Copy link

Copilot AI Jan 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The description says "The percentage of CPU used by the kernel process" but should clarify that this includes child processes, consistent with the actual implementation.

Suggested change
/// The percentage of CPU used by the kernel process
/// The percentage of CPU used by the kernel process, including its child processes

Copilot uses AI. Check for mistakes.
Comment on lines +257 to +262
#[cfg(not(any(target_os = "linux")))]
{
// On macOS and Windows, tasks() is not available
// Assume 1 thread per process as a baseline
total_threads += 1;
}
Copy link

Copilot AI Jan 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On non-Linux platforms (macOS, Windows), the thread count logic assumes 1 thread per process, which significantly underestimates the actual thread count. Modern Python kernels typically use multiple threads. Consider:

  1. Documenting this limitation in the API
  2. Using platform-specific APIs to get accurate thread counts
  3. Making the field optional when accurate data isn't available
    This could mislead users who rely on thread count metrics for capacity planning or debugging.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated docs.

------------ | ------------- | ------------- | -------------
**cpu_percent** | **i64** | The percentage of CPU used by the kernel process |
**memory_bytes** | **i64** | The amount of memory used by the kernel process in bytes |
**thread_count** | **i64** | The number of threads used by the kernel process |
Copy link

Copilot AI Jan 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The description says "The number of threads used by the kernel process" but should clarify that this includes child processes.

Suggested change
**thread_count** | **i64** | The number of threads used by the kernel process |
**thread_count** | **i64** | The number of threads used by the kernel process and its child processes |

Copilot uses AI. Check for mistakes.
Comment thread crates/kcserver/src/resource_monitor.rs Outdated
Comment thread kallichore.json Outdated
Comment thread crates/kallichore_api/api/openapi.yaml Outdated
Comment thread crates/kallichore_api/src/models.rs Outdated
#[serde(rename = "memory_bytes")]
pub memory_bytes: i64,

/// The number of threads used by the kernel process
Copy link

Copilot AI Jan 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The description says "The number of threads used by the kernel process" but should clarify that this includes child processes.

Suggested change
/// The number of threads used by the kernel process
/// The number of threads used by the kernel process, including its child processes

Copilot uses AI. Check for mistakes.
Comment on lines +226 to +241
// Collect all PIDs in process tree using BFS
let mut pids_to_check = vec![pid];
let mut visited = HashSet::new();

while let Some(check_pid) = pids_to_check.pop() {
if !visited.insert(check_pid) {
continue; // Already visited
}

// Find children by checking parent_pid
for (child_pid, proc) in system.processes() {
if proc.parent() == Some(check_pid) && !visited.contains(child_pid) {
pids_to_check.push(*child_pid);
}
}
}
Copy link

Copilot AI Jan 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The BFS process tree collection doesn't actually accumulate metrics for visited processes. The code collects all PIDs using BFS, but then only sums metrics in the second loop. However, there's a subtle issue: the visited set is built without checking if processes exist in the system. Then when summing, some PIDs might not have corresponding processes. While this won't cause errors due to the if let Some check, the logic could be clearer. Consider combining the BFS traversal with metric collection to make it more efficient and clearer.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We know that the PIDs have corresponding processes because we just found them! I think this is OK as is.

Comment on lines +1973 to +1980
// Enforce a minimum interval of 100 ms so we don't try to sample
// too frequently
let new_interval_ms = if interval_ms > 0 && interval_ms < 100 {
log::warn!("Requested resource_sample_interval_ms is too low: {}. Setting to minimum of 100 ms.", interval_ms);
100
} else {
interval_ms as u64
};
Copy link

Copilot AI Jan 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inconsistent validation logic: The code checks for negative values and enforces a minimum of 100ms, but the warning message at line 1976 says "Setting to minimum of 100 ms" while the actual enforcement allows 0 (disabled). The validation should either:

  1. Allow 0 or >= 100 (current behavior), and update the warning to clarify this
  2. Treat 0 specially before the minimum check
    The current logic is: if value is between 1-99, set to 100; if 0, keep as 0. This is correct but the warning message doesn't reflect this nuance.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

100 is the minimum when metrics collection is enabled, not 0, so I stand by the warning.

jmcphers and others added 2 commits January 6, 2026 09:19
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
@jmcphers jmcphers merged commit 6a425cb into main Jan 6, 2026
5 checks passed
@github-actions github-actions Bot locked and limited conversation to collaborators Jan 6, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants