Skip to content

⚡️ Speed up function convert_date_to_datetime by 52%#157

Open
codeflash-ai[bot] wants to merge 1 commit into
branch-3.9from
codeflash/optimize-convert_date_to_datetime-mhwvekit
Open

⚡️ Speed up function convert_date_to_datetime by 52%#157
codeflash-ai[bot] wants to merge 1 commit into
branch-3.9from
codeflash/optimize-convert_date_to_datetime-mhwvekit

Conversation

@codeflash-ai
Copy link
Copy Markdown

@codeflash-ai codeflash-ai Bot commented Nov 13, 2025

📄 52% (0.52x) speedup for convert_date_to_datetime in src/bokeh/util/serialization.py

⏱️ Runtime : 3.99 milliseconds 2.63 milliseconds (best of 143 runs)

📝 Explanation and details

The optimization replaces the inefficient obj.timetuple()[:6] tuple construction and unpacking with direct attribute access (obj.year, obj.month, obj.day). Additionally, it moves the DT_EPOCH constant definition to module level to avoid recalculating it on every function call.

Key changes:

  • Eliminated tuple creation overhead: The original code calls timetuple() which creates a 9-element tuple, then slices it to 6 elements, then unpacks with *. The optimized version directly accesses the three needed attributes.
  • Moved constant to module level: DT_EPOCH is now calculated once at import time rather than being undefined in the original (though it appears to be imported from elsewhere).

Why it's faster:

  • timetuple() is a relatively expensive method that constructs a full time.struct_time object with 9 fields
  • Tuple slicing and unpacking add additional overhead
  • Direct attribute access on date objects is much faster as it simply returns the stored integer values
  • The line profiler shows a 38% reduction in per-hit time (1625ns → 1001ns)

Impact on workloads:
Based on the function references, this function is called from:

  1. Property transformation in hot path: Used in bokeh/core/property/datetime.py during property value transformation, which happens frequently during plot creation and updates
  2. General datetime serialization: Called from convert_datetime_type() for all date objects being serialized

Test case performance:
The optimization shows consistent 50-75% speedups across all test scenarios:

  • Basic conversions: 50-75% faster
  • Edge cases (min/max dates): 49-59% faster
  • Large scale operations (1000+ dates): 50-52% faster

This optimization is particularly beneficial for applications that process many date objects, such as time series visualization in Bokeh, where this function could be called thousands of times during data preparation.

Correctness verification report:

Test Status
⚙️ Existing Unit Tests 33 Passed
🌀 Generated Regression Tests 3776 Passed
⏪ Replay Tests 🔘 None Found
🔎 Concolic Coverage Tests 🔘 None Found
📊 Tests Coverage 75.0%
⚙️ Existing Unit Tests and Runtime
Test File::Test Function Original ⏱️ Optimized ⏱️ Speedup
unit/bokeh/core/property/test_datetime.py::Test_Datetime.test_transform_date 1.72μs 1.10μs 57.0%✅
unit/bokeh/core/property/test_datetime.py::Test_Datetime.test_transform_str 1.85μs 1.34μs 38.0%✅
unit/bokeh/models/widgets/test_slider.py::TestDateRangeSlider.test_value_as_date_when_set_as_timestamp 4.90μs 3.28μs 49.5%✅
unit/bokeh/models/widgets/test_slider.py::TestDateRangeSlider.test_value_as_date_when_set_mixed 6.36μs 4.42μs 43.9%✅
unit/bokeh/models/widgets/test_slider.py::TestDateSlider.test_value_and_value_throttled 4.14μs 2.56μs 62.1%✅
unit/bokeh/models/widgets/test_slider.py::TestDateSlider.test_value_as_date_when_set_as_timestamp 4.21μs 2.84μs 48.2%✅
🌀 Generated Regression Tests and Runtime
from __future__ import annotations

import datetime as dt
from typing import Any

# imports
import pytest  # used for our unit tests
from bokeh.util.serialization import convert_date_to_datetime

DT_EPOCH = dt.datetime.fromtimestamp(0, tz=dt.timezone.utc)
from bokeh.util.serialization import convert_date_to_datetime

#-----------------------------------------------------------------------------
# Code
#-----------------------------------------------------------------------------

# unit tests

# Basic Test Cases

def test_basic_today():
    # Test with today's date
    today = dt.date.today()
    codeflash_output = convert_date_to_datetime(today); result = codeflash_output # 6.54μs -> 4.35μs (50.5% faster)
    # Should match the expected milliseconds since epoch
    expected = (dt.datetime(today.year, today.month, today.day, tzinfo=dt.timezone.utc) - DT_EPOCH).total_seconds() * 1000

def test_basic_specific_date():
    # Test with a specific known date
    date_obj = dt.date(2000, 1, 1)
    codeflash_output = convert_date_to_datetime(date_obj); result = codeflash_output # 4.46μs -> 2.55μs (75.1% faster)
    expected = (dt.datetime(2000, 1, 1, tzinfo=dt.timezone.utc) - DT_EPOCH).total_seconds() * 1000

def test_basic_leap_year():
    # Test with a leap year date
    date_obj = dt.date(2020, 2, 29)
    codeflash_output = convert_date_to_datetime(date_obj); result = codeflash_output # 4.09μs -> 2.33μs (75.5% faster)
    expected = (dt.datetime(2020, 2, 29, tzinfo=dt.timezone.utc) - DT_EPOCH).total_seconds() * 1000

def test_basic_epoch_date():
    # Test with the epoch date
    date_obj = dt.date(1970, 1, 1)
    codeflash_output = convert_date_to_datetime(date_obj); result = codeflash_output # 3.89μs -> 2.37μs (64.3% faster)
    expected = 0.0

# Edge Test Cases

def test_edge_min_date():
    # Test with the minimum possible date
    date_obj = dt.date.min
    codeflash_output = convert_date_to_datetime(date_obj); result = codeflash_output # 4.47μs -> 2.92μs (53.2% faster)
    expected = (dt.datetime(dt.date.min.year, dt.date.min.month, dt.date.min.day, tzinfo=dt.timezone.utc) - DT_EPOCH).total_seconds() * 1000

def test_edge_max_date():
    # Test with the maximum possible date
    date_obj = dt.date.max
    codeflash_output = convert_date_to_datetime(date_obj); result = codeflash_output # 4.11μs -> 2.51μs (63.7% faster)
    expected = (dt.datetime(dt.date.max.year, dt.date.max.month, dt.date.max.day, tzinfo=dt.timezone.utc) - DT_EPOCH).total_seconds() * 1000

def test_edge_dst_transition():
    # Test with a date that is a DST transition in some timezones (should not affect UTC)
    date_obj = dt.date(2023, 3, 12)  # US DST start
    codeflash_output = convert_date_to_datetime(date_obj); result = codeflash_output # 3.75μs -> 2.35μs (60.1% faster)
    expected = (dt.datetime(2023, 3, 12, tzinfo=dt.timezone.utc) - DT_EPOCH).total_seconds() * 1000

def test_edge_invalid_type_int():
    # Test with an integer input, should raise AttributeError
    with pytest.raises(AttributeError):
        convert_date_to_datetime(123456) # 1.29μs -> 1.19μs (7.96% faster)

def test_edge_invalid_type_str():
    # Test with a string input, should raise AttributeError
    with pytest.raises(AttributeError):
        convert_date_to_datetime("2022-01-01") # 1.21μs -> 1.12μs (7.97% faster)


def test_edge_none_input():
    # Test with None input, should raise AttributeError
    with pytest.raises(AttributeError):
        convert_date_to_datetime(None) # 1.45μs -> 1.46μs (0.889% slower)

def test_edge_date_with_time_attributes():
    # Test that time attributes are ignored (date only)
    date_obj = dt.date(2022, 5, 17)
    codeflash_output = convert_date_to_datetime(date_obj); result = codeflash_output # 7.54μs -> 4.48μs (68.6% faster)
    expected = (dt.datetime(2022, 5, 17, tzinfo=dt.timezone.utc) - DT_EPOCH).total_seconds() * 1000

# Large Scale Test Cases

def test_large_scale_sequential_dates():
    # Test conversion of a sequence of 1000 consecutive dates
    start_date = dt.date(2000, 1, 1)
    for i in range(1000):
        d = start_date + dt.timedelta(days=i)
        codeflash_output = convert_date_to_datetime(d); result = codeflash_output # 1.01ms -> 662μs (51.8% faster)
        expected = (dt.datetime(d.year, d.month, d.day, tzinfo=dt.timezone.utc) - DT_EPOCH).total_seconds() * 1000

def test_large_scale_random_dates():
    # Test conversion of 1000 random dates between 1970 and 2100
    import random
    for _ in range(1000):
        year = random.randint(1970, 2100)
        month = random.randint(1, 12)
        # Handle month days correctly
        if month == 2:
            if year % 4 == 0 and (year % 100 != 0 or year % 400 == 0):
                day = random.randint(1, 29)
            else:
                day = random.randint(1, 28)
        elif month in [4, 6, 9, 11]:
            day = random.randint(1, 30)
        else:
            day = random.randint(1, 31)
        d = dt.date(year, month, day)
        codeflash_output = convert_date_to_datetime(d); result = codeflash_output # 1.05ms -> 694μs (51.0% faster)
        expected = (dt.datetime(d.year, d.month, d.day, tzinfo=dt.timezone.utc) - DT_EPOCH).total_seconds() * 1000


def test_edge_leap_century():
    # Test with leap century (year 2000 is a leap year, 1900 is not)
    date_obj = dt.date(2000, 2, 29)
    codeflash_output = convert_date_to_datetime(date_obj); result = codeflash_output # 7.33μs -> 4.27μs (71.8% faster)
    expected = (dt.datetime(2000, 2, 29, tzinfo=dt.timezone.utc) - DT_EPOCH).total_seconds() * 1000

def test_edge_non_leap_century():
    # Test with non-leap century (year 1900 is not a leap year)
    date_obj = dt.date(1900, 2, 28)
    codeflash_output = convert_date_to_datetime(date_obj); result = codeflash_output # 4.79μs -> 2.99μs (60.2% faster)
    expected = (dt.datetime(1900, 2, 28, tzinfo=dt.timezone.utc) - DT_EPOCH).total_seconds() * 1000

def test_edge_far_past_date():
    # Test with a far past date
    date_obj = dt.date(1800, 1, 1)
    codeflash_output = convert_date_to_datetime(date_obj); result = codeflash_output # 4.11μs -> 2.54μs (61.8% faster)
    expected = (dt.datetime(1800, 1, 1, tzinfo=dt.timezone.utc) - DT_EPOCH).total_seconds() * 1000

def test_edge_far_future_date():
    # Test with a far future date
    date_obj = dt.date(2500, 12, 31)
    codeflash_output = convert_date_to_datetime(date_obj); result = codeflash_output # 4.55μs -> 2.81μs (61.6% faster)
    expected = (dt.datetime(2500, 12, 31, tzinfo=dt.timezone.utc) - DT_EPOCH).total_seconds() * 1000
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.
from __future__ import annotations

import datetime as dt
from typing import Any

# imports
import pytest  # used for our unit tests
from bokeh.util.serialization import convert_date_to_datetime

DT_EPOCH = dt.datetime.fromtimestamp(0, tz=dt.timezone.utc)
from bokeh.util.serialization import convert_date_to_datetime

#-----------------------------------------------------------------------------
# Code
#-----------------------------------------------------------------------------

# unit tests

# Basic Test Cases

def test_basic_today():
    # Test conversion of today's date
    today = dt.date.today()
    codeflash_output = convert_date_to_datetime(today); result = codeflash_output # 3.37μs -> 2.13μs (58.0% faster)
    # Should match the expected milliseconds since epoch for today at midnight UTC
    expected = (dt.datetime(today.year, today.month, today.day, tzinfo=dt.timezone.utc) - DT_EPOCH).total_seconds() * 1000

def test_basic_specific_date():
    # Test conversion of a specific known date
    date_obj = dt.date(2020, 1, 1)
    codeflash_output = convert_date_to_datetime(date_obj); result = codeflash_output # 3.70μs -> 2.31μs (60.0% faster)
    expected = (dt.datetime(2020, 1, 1, tzinfo=dt.timezone.utc) - DT_EPOCH).total_seconds() * 1000

def test_basic_leap_year():
    # Test conversion of a leap day
    date_obj = dt.date(2016, 2, 29)
    codeflash_output = convert_date_to_datetime(date_obj); result = codeflash_output # 3.42μs -> 2.21μs (54.5% faster)
    expected = (dt.datetime(2016, 2, 29, tzinfo=dt.timezone.utc) - DT_EPOCH).total_seconds() * 1000

def test_basic_epoch():
    # Test conversion of the Unix epoch date
    date_obj = dt.date(1970, 1, 1)
    codeflash_output = convert_date_to_datetime(date_obj); result = codeflash_output # 3.41μs -> 2.10μs (62.4% faster)

# Edge Test Cases

def test_edge_min_date():
    # Test conversion of minimum possible date
    date_obj = dt.date.min
    codeflash_output = convert_date_to_datetime(date_obj); result = codeflash_output # 4.52μs -> 2.84μs (59.1% faster)
    expected = (dt.datetime(dt.date.min.year, dt.date.min.month, dt.date.min.day, tzinfo=dt.timezone.utc) - DT_EPOCH).total_seconds() * 1000

def test_edge_max_date():
    # Test conversion of maximum possible date
    date_obj = dt.date.max
    codeflash_output = convert_date_to_datetime(date_obj); result = codeflash_output # 3.85μs -> 2.47μs (55.8% faster)
    expected = (dt.datetime(dt.date.max.year, dt.date.max.month, dt.date.max.day, tzinfo=dt.timezone.utc) - DT_EPOCH).total_seconds() * 1000

def test_edge_year_2000_leap():
    # Test conversion for century leap year (divisible by 400)
    date_obj = dt.date(2000, 2, 29)
    codeflash_output = convert_date_to_datetime(date_obj); result = codeflash_output # 3.64μs -> 2.17μs (67.3% faster)
    expected = (dt.datetime(2000, 2, 29, tzinfo=dt.timezone.utc) - DT_EPOCH).total_seconds() * 1000

def test_edge_year_1900_non_leap():
    # Test conversion for century non-leap year (divisible by 100, not 400)
    date_obj = dt.date(1900, 2, 28)
    codeflash_output = convert_date_to_datetime(date_obj); result = codeflash_output # 3.60μs -> 2.43μs (47.9% faster)
    expected = (dt.datetime(1900, 2, 28, tzinfo=dt.timezone.utc) - DT_EPOCH).total_seconds() * 1000

def test_edge_invalid_type_int():
    # Test passing an int instead of a date
    with pytest.raises(AttributeError):
        convert_date_to_datetime(20200101) # 1.24μs -> 1.28μs (2.66% slower)

def test_edge_invalid_type_str():
    # Test passing a string instead of a date
    with pytest.raises(AttributeError):
        convert_date_to_datetime("2020-01-01") # 1.23μs -> 1.12μs (9.90% faster)


def test_edge_date_with_time_attributes():
    # Test conversion of a date object with time attributes set (should ignore time)
    date_obj = dt.date(2023, 7, 13)
    codeflash_output = convert_date_to_datetime(date_obj); result = codeflash_output # 7.61μs -> 4.37μs (74.2% faster)
    expected = (dt.datetime(2023, 7, 13, tzinfo=dt.timezone.utc) - DT_EPOCH).total_seconds() * 1000

def test_edge_dst_transition():
    # Test a date on a DST transition (should be unaffected, always UTC midnight)
    date_obj = dt.date(2022, 3, 13)  # US DST start
    codeflash_output = convert_date_to_datetime(date_obj); result = codeflash_output # 4.57μs -> 2.61μs (75.1% faster)
    expected = (dt.datetime(2022, 3, 13, tzinfo=dt.timezone.utc) - DT_EPOCH).total_seconds() * 1000

# Large Scale Test Cases

def test_large_scale_sequential_dates():
    # Test conversion of 1000 sequential dates
    start_date = dt.date(2000, 1, 1)
    results = []
    for i in range(1000):
        date_obj = start_date + dt.timedelta(days=i)
        codeflash_output = convert_date_to_datetime(date_obj); result = codeflash_output # 1.01ms -> 662μs (52.6% faster)
        expected = (dt.datetime(date_obj.year, date_obj.month, date_obj.day, tzinfo=dt.timezone.utc) - DT_EPOCH).total_seconds() * 1000
        results.append(result)

def test_large_scale_leap_years():
    # Test conversion for all leap days from 1904 to 2096 (every 4 years, inclusive)
    leap_days = [dt.date(year, 2, 29) for year in range(1904, 2100, 4) if dt.date(year, 2, 29)]
    for date_obj in leap_days:
        codeflash_output = convert_date_to_datetime(date_obj); result = codeflash_output # 56.9μs -> 37.8μs (50.6% faster)
        expected = (dt.datetime(date_obj.year, date_obj.month, date_obj.day, tzinfo=dt.timezone.utc) - DT_EPOCH).total_seconds() * 1000

def test_large_scale_random_dates():
    # Test conversion for 500 random dates between 1970 and 2050
    import random
    random.seed(42)  # Deterministic
    dates = []
    for _ in range(500):
        year = random.randint(1970, 2050)
        month = random.randint(1, 12)
        # Pick a valid day for the month
        if month == 2:
            # Leap year check
            if (year % 4 == 0 and (year % 100 != 0 or year % 400 == 0)):
                day = random.randint(1, 29)
            else:
                day = random.randint(1, 28)
        elif month in [4, 6, 9, 11]:
            day = random.randint(1, 30)
        else:
            day = random.randint(1, 31)
        date_obj = dt.date(year, month, day)
        dates.append(date_obj)
    for date_obj in dates:
        codeflash_output = convert_date_to_datetime(date_obj); result = codeflash_output # 520μs -> 342μs (52.0% faster)
        expected = (dt.datetime(date_obj.year, date_obj.month, date_obj.day, tzinfo=dt.timezone.utc) - DT_EPOCH).total_seconds() * 1000

def test_large_scale_min_max_dates():
    # Test conversion for both min and max date 100 times each
    for _ in range(100):
        codeflash_output = convert_date_to_datetime(dt.date.min); min_result = codeflash_output # 112μs -> 75.4μs (49.1% faster)
        codeflash_output = convert_date_to_datetime(dt.date.max); max_result = codeflash_output # 108μs -> 73.4μs (48.0% faster)
        min_expected = (dt.datetime(dt.date.min.year, dt.date.min.month, dt.date.min.day, tzinfo=dt.timezone.utc) - DT_EPOCH).total_seconds() * 1000
        max_expected = (dt.datetime(dt.date.max.year, dt.date.max.month, dt.date.max.day, tzinfo=dt.timezone.utc) - DT_EPOCH).total_seconds() * 1000
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.
from bokeh.util.serialization import convert_date_to_datetime

def test_convert_date_to_datetime():
    convert_date_to_datetime(datetime.date(1, 2, 1))

To edit these changes git checkout codeflash/optimize-convert_date_to_datetime-mhwvekit and push.

Codeflash Static Badge

The optimization replaces the inefficient `obj.timetuple()[:6]` tuple construction and unpacking with direct attribute access (`obj.year`, `obj.month`, `obj.day`). Additionally, it moves the `DT_EPOCH` constant definition to module level to avoid recalculating it on every function call.

**Key changes:**
- **Eliminated tuple creation overhead**: The original code calls `timetuple()` which creates a 9-element tuple, then slices it to 6 elements, then unpacks with `*`. The optimized version directly accesses the three needed attributes.
- **Moved constant to module level**: `DT_EPOCH` is now calculated once at import time rather than being undefined in the original (though it appears to be imported from elsewhere).

**Why it's faster:**
- `timetuple()` is a relatively expensive method that constructs a full time.struct_time object with 9 fields
- Tuple slicing and unpacking add additional overhead
- Direct attribute access on date objects is much faster as it simply returns the stored integer values
- The line profiler shows a 38% reduction in per-hit time (1625ns → 1001ns)

**Impact on workloads:**
Based on the function references, this function is called from:
1. **Property transformation in hot path**: Used in `bokeh/core/property/datetime.py` during property value transformation, which happens frequently during plot creation and updates
2. **General datetime serialization**: Called from `convert_datetime_type()` for all date objects being serialized

**Test case performance:**
The optimization shows consistent 50-75% speedups across all test scenarios:
- Basic conversions: 50-75% faster
- Edge cases (min/max dates): 49-59% faster  
- Large scale operations (1000+ dates): 50-52% faster

This optimization is particularly beneficial for applications that process many date objects, such as time series visualization in Bokeh, where this function could be called thousands of times during data preparation.
@codeflash-ai codeflash-ai Bot requested a review from mashraf-222 November 13, 2025 03:29
@codeflash-ai codeflash-ai Bot added ⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: High Optimization Quality according to Codeflash labels Nov 13, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: High Optimization Quality according to Codeflash

Projects

None yet

Development

Successfully merging this pull request may close these issues.

0 participants