Skip to content

Refactor Labeller.java: improve exception handling, input validation, and code maintainability#1326

Open
akm19117012 wants to merge 1 commit into
zinggAI:mainfrom
akm19117012:main
Open

Refactor Labeller.java: improve exception handling, input validation, and code maintainability#1326
akm19117012 wants to merge 1 commit into
zinggAI:mainfrom
akm19117012:main

Conversation

@akm19117012
Copy link
Copy Markdown

Summary

This PR refactors Labeller.java to improve robustness, readability, exception safety, input validation, and maintainability while preserving the existing labelling workflow behavior.

The changes focus on eliminating unsafe patterns from the original implementation, improving dependency management, reducing mutable state, and making the CLI interaction flow safer and easier to maintain.


Key Improvements

1. Replaced Magic Numbers with LabelAction Enum

Introduced a strongly typed LabelAction enum for label actions:

  • MATCH
  • NO_MATCH
  • NOT_SURE
  • QUIT

Why

The original implementation used hardcoded numeric values (0, 1, 2, 9) throughout the codebase, which reduced readability and increased the chance of incorrect usage.

Benefits

  • Improves readability
  • Reduces magic numbers
  • Makes action handling more maintainable
  • Centralizes action definitions

2. Added Constructor-Based Dependency Injection

Refactored initialization of:

  • ITrainingDataModel
  • ILabelDataViewHelper

to constructor injection.

Why

The original implementation lazily initialized dependencies internally, which tightly coupled object creation with business logic.

Benefits

  • Improves testability
  • Supports dependency injection frameworks
  • Reduces hidden side effects
  • Makes dependencies explicit
  • Encourages immutability

3. Converted Dependencies to final

Marked core dependencies as immutable:

private final ITrainingDataModel<...>
private final ILabelDataViewHelper<...>

Why

These objects should not change during object lifecycle.

Benefits

  • Improves thread safety
  • Prevents accidental reassignment
  • Makes object state predictable

4. Improved Exception Handling

Refactored exception handling logic across:

  • execute()
  • processRecordsCli()
  • getUnmarkedRecords()

Changes

  • Replaced broad Exception handling with more specific exception handling
  • Added explicit RuntimeException wrapping
  • Improved logging messages
  • Re-threw exceptions instead of silently swallowing them

Why

The original implementation suppressed failures in several places, making debugging difficult.

Benefits

  • Better observability
  • Preserves stack traces
  • Improves failure diagnostics
  • Prevents silent execution failures

5. Added Safe CLI Input Validation

Refactored CLI input validation logic.

Improvements

  • Added validation using enum values
  • Added handling for invalid numeric parsing
  • Added handling for missing user input
  • Added infinite-loop-safe input handling

Why

The original implementation relied on regex matching and unsafe scanner usage.

Benefits

  • Safer user interaction
  • Prevents invalid state transitions
  • Improves CLI reliability

6. Added Scanner Lifecycle Management

Added scanner cleanup in finally block.

Why

The original implementation instantiated scanners repeatedly and never released resources.

Benefits

  • Prevents resource leaks
  • Improves lifecycle management

7. Replaced Hardcoded Join Type with Constant

Introduced:

private static final String LEFT_ANTI_JOIN = "left_anti";

Why

The original implementation used inline string literals for join types.

Benefits

  • Improves maintainability
  • Reduces typo risk
  • Centralizes configuration values

8. Improved Logging Messages

Updated multiple logging statements to:

  • improve clarity
  • fix grammar issues
  • provide better operational context

Examples

  • "occured" → "occurred"
  • added contextual failure messages

Benefits

  • Easier debugging
  • Cleaner operational logs

9. Added Null Validation Using Objects.requireNonNull

Added defensive validation for:

  • cluster IDs
  • current pair records

Why

These values are critical for processing flow.

Benefits

  • Fail-fast behavior
  • Easier debugging
  • Prevents unexpected downstream failures

10. Removed Unused Lazy Initialization Logic

Removed setter-based mutation and lazy initialization logic from:

  • getTrainingDataModel()
  • getLabelDataViewHelper()

Why

Dependencies are now constructor-injected and immutable.

Benefits

  • Simpler lifecycle management
  • Cleaner object ownership model

11. Improved Code Readability

Refactored:

  • variable naming
  • spacing
  • method organization
  • logging consistency

Benefits

  • Easier onboarding
  • Reduced cognitive complexity
  • Better long-term maintainability

Behavior Compatibility

This refactor preserves the existing CLI labelling workflow and does not change:

  • business logic
  • labelling semantics
  • training data flow
  • output generation behavior

The changes are primarily focused on code quality, stability, and maintainability improvements.


Areas for Future Improvement

The following improvements are intentionally left out of this PR to keep the refactor focused:

  • splitting processRecordsCli() into smaller methods
  • separating CLI logic from executor layer
  • replacing string join types with enums
  • introducing a dedicated input/output abstraction layer
  • improving caching lifecycle management with explicit unpersist()
  • adding JavaDocs and unit tests

Testing Notes

Validated:

  • successful compilation flow
  • CLI input handling
  • invalid input rejection
  • exception propagation behavior
  • record processing flow
  • quit workflow handling
  • statistics update flow

This PR refactors Labeller.java to improve robustness, readability, exception safety, input validation, and maintainability while preserving the existing labelling workflow behavior.
The changes focus on eliminating unsafe patterns from the original implementation, improving dependency management, reducing mutable state, and making the CLI interaction flow safer and easier to maintain.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant