Skip to content

Improve error reporting for Fluent launch timeouts and add structured context to LaunchFluentError #5130

@seanpearsonuk

Description

@seanpearsonuk

Summary

When a Fluent launch times out, users see a chained traceback that ends with a LaunchFluentError whose message only contains the launch command. The original TimeoutError is chained but not surfaced in the top-level message, which makes quick diagnosis and programmatic handling harder.

Observed behavior

• The code raises TimeoutError inside _await_fluent_launch(...).
• standalone_launcher catches exceptions and ultimately raises LaunchFluentError(self._launch_cmd) from the original exception.
• LaunchFluentError.__init__ builds a message that contains only the launch string; it does not include cause type/message, timeout value, server info filename, pid, or other contextual metadata.
• Logging contains some warnings, but the final top-level exception message is not actionable.

Recommendations (concise)

  1. Enrich LaunchFluentError with structured context
    • Add optional fields: cause, timeout, server_info_file, pid, and attempts (or list of attempted commands).
    • Include the cause type and message in the error message shown to users.
  2. Surface timeout as a distinct exception type
    • Add LaunchTimeoutError(LaunchFluentError) so callers can catch timeouts specifically and respond (e.g., retry with larger timeout).
  3. Improve logging at failure points
    • Log the full exception with exc_info=True.
    • Log the server_info file mtime/contents if present (or note permission issues).
    • Log both commands attempted (initial and fallback) and the final subprocess args/kw.
  4. Provide actionable hints in top-level message
    • Include brief suggestions: increase pyfluent.config.launch_fluent_timeout, verify Fluent install path/permissions, check server-info file and Fluent logs.
  5. Preserve exception chaining while making the top-level message more useful
    • Keep raise ... from ex so full traceback remains available, but ensure the LaunchFluentError message is informative.
  6. Add unit tests
    • Tests that simulate _await_fluent_launch timeout and assert:
    • LaunchTimeoutError (or LaunchFluentError) includes cause, timeout, and server_info_file.
    • Logs include attempted commands and exc_info.

Minimal suggested change locations

• ansys/fluent/core/launcher/error_handler.py: update LaunchFluentError signature and message formatting; add LaunchTimeoutError.
• ansys/fluent/core/launcher/standalone_launcher.py: when catching exceptions, pass cause, pid, server_info_file_name, and timeout into LaunchFluentError and log exc_info=True.
• ansys/fluent/core/launcher/launcher_utils.py: where _await_fluent_launch raises TimeoutError, allow callers to map to LaunchTimeoutError (or document in tests).

Rationale (why this matters)

• Faster diagnosis: Users see the real reason (TimeoutError) and relevant runtime context without digging into full tracebacks.
• Actionable guidance: Short hints reduce time-to-resolution for common issues (timeout too small, permissions, wrong path).
• Programmatic handling: Structured fields and a specific timeout exception let client code implement robust retry or fallback logic.
• Better telemetry & support: Logs with explicit attempted commands, pid, and server-info enable reproducible bug reports and easier remote troubleshooting.

Acceptance criteria

• LaunchFluentError messages include cause type/message, timeout, server_info_file, and pid.
• A distinct LaunchTimeoutError is available and raised on launch timeouts.
• Tests exist that cover timeout behavior and verify presence of context in exceptions.
• Logs produced on failure include attempted commands and exception info (exc_info=True).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    Status

    No status

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions