Skip to content

Refactor Spark format_string integer conversion dispatch#22388

Open
kosiew wants to merge 3 commits into
apache:mainfrom
kosiew:refactor-duplication-02-22163
Open

Refactor Spark format_string integer conversion dispatch#22388
kosiew wants to merge 3 commits into
apache:mainfrom
kosiew:refactor-duplication-02-22163

Conversation

@kosiew
Copy link
Copy Markdown
Contributor

@kosiew kosiew commented May 20, 2026

Which issue does this PR close?

Rationale for this change

ConversionSpecifier::format contained substantial duplication across integer ScalarValue variants for %d, %x, %o, %s, and %c handling. Each integer width repeated nearly identical conversion logic, making the code harder to maintain and increasing the risk of inconsistent behavior across integer types.

This change consolidates integer formatting behavior into shared internal helpers while preserving existing Spark-compatible semantics.

What changes are included in this PR?

  • Introduced a local IntegerValue enum to normalize signed and unsigned integer handling while preserving width-specific unsigned bit behavior for %x and %o.

  • Replaced repeated per-variant integer dispatch branches in ConversionSpecifier::format with a shared format_integer helper.

  • Added shared helper methods for:

    • decimal formatting
    • unsigned bit formatting
    • %c conversion
    • decimal string conversion
  • Added small macro_rules! helpers to generate From<T> for IntegerValue implementations for signed and unsigned integer families, reducing repetitive conversion boilerplate while preserving width-specific unsigned formatting semantics.

  • Added invalid_integer_conversion helper to centralize integer conversion error generation.

  • Added table-driven regression coverage for integer formatting behavior across:

    • signed integer widths
    • unsigned integer widths
    • %d, %x, %o, %s, and %c
    • null handling behavior

Are these changes tested?

Yes.

Added test_integer_formatting_across_widths covering:

  • Signed integer formatting across Int8, Int16, Int32, and Int64
  • Unsigned integer formatting across UInt8, UInt16, UInt32, and UInt64
  • %d, %x, %o, %s, and %c formatting behavior
  • Null integer formatting behavior

Are there any user-facing changes?

No intended user-facing behavior changes. This PR is a structural refactor intended to preserve existing Spark-compatible integer formatting semantics.

LLM-generated code disclosure

This PR includes LLM-generated code and comments. All LLM-generated content has been manually reviewed and tested.

…ize match arms

- Added a local IntegerValue adapter to format_string functionality.
- Collapsed duplicated match arms for %d, %x, %o, %s, %c to improve code efficiency.
- Ensured preservation of signed width behavior for negative hex and octal values.
- Introduced a regression table test covering integer widths and null values.
@github-actions github-actions Bot added the spark label May 20, 2026
kosiew added 2 commits May 20, 2026 16:44
- Removed IntegerFormatValue.
- Added direct IntegerValue helpers.
- Introduced private From implementations via local macros.
- Shortened integer ScalarValue arms for clarity.
- Deduplicated invalid integer conversion error handling.
- Made test argument counts explicit for better readability.
@kosiew kosiew marked this pull request as ready for review May 20, 2026 09:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Refactor: Centralize numeric %c formatting dispatch in format_string.rs

1 participant