Optimize show_text_non_python

codeflash-ai[bot] · web-flow · commit 61706164fa30 · 2026-02-21T01:59:10.000Z
This optimization achieves a **17% runtime improvement** (327ms → 277ms) by fundamentally restructuring the type detection logic in the hot path.

## Key Optimization: Early-Exit Type Detection

The original `_column_type()` function used `reduce()` to process every element in a column, calling `_type()` and `_more_generic()` repeatedly. The optimized version implements **early-exit logic** that stops processing as soon as a string type is detected:

```python
# Original: processes all elements regardless
types = [_type(s, has_invisible, numparse) for s in strings]
return reduce(_more_generic, types, bool)

# Optimized: exits early when string type found
for s in strings:
    # ... type checking ...
    if detected_string:
        return str  # immediate return - no more processing needed
```

## Why This Works

1. **String is the most generic type**: In Python's type hierarchy for tabular data, string can represent anything. Once we know a column contains strings, we don't need to check remaining values.

2. **Reduces function call overhead**: The original implementation called `_type()` for every element, plus `_more_generic()` for N-1 reductions. The optimized version eliminates these function calls by inlining the type checking logic.

3. **Profiler evidence**: The line `coltypes = [_column_type(col, numparse=np) for col, np in zip(cols, numparses)]` drops from **53% of runtime** (523ms) to **48.2%** (434ms) - an 89ms improvement that accounts for most of the overall speedup.

## Performance by Test Case

The optimization excels on tests with **mixed-type columns** or **large datasets**:
- `test_large_scale_many_rows_and_sorting_stability`: 20% faster (34.8ms → 29.0ms) - 1000 rows benefit from early exits
- `test_large_scale_many_lines_per_function`: 19% faster (38.0ms → 31.9ms) - columns with strings exit early
- `test_large_scale_complex_timings`: 18.1% faster (203ms → 172ms) - 5000 data points across 50 functions

For smaller datasets, the improvement is more modest (5-12%) but still measurable.

## Implementation Details

The optimized `_column_type()` also:
- Passes `has_invisible` parameter directly instead of via `_type()` calls
- Uses in-place type checking rather than intermediate list construction
- Maintains the same type priority (bool → int → float → str) while enabling early termination

This optimization is particularly valuable since `tabulate()` is called repeatedly when formatting profiling output, making the cumulative savings significant for typical workloads.
diff --git a/codeflash/code_utils/tabulate.py b/codeflash/code_utils/tabulate.py
@@ -705,7 +705,7 @@ def tabulate(
     # format rows and columns, convert numeric values to strings
     cols = list(izip_longest(*list_of_lists))
     numparses = _expand_numparse(disable_numparse, len(cols))
-    coltypes = [_column_type(col, numparse=np) for col, np in zip(cols, numparses)]
+    coltypes = [_column_type(col, has_invisible, numparse=np) for col, np in zip(cols, numparses)]
     if isinstance(floatfmt, str):  # old version
         float_formats = len(cols) * [floatfmt]  # just duplicate the string to use in each column
     else:  # if floatfmt is list, tuple etc we have one per column