Hotfix for tiktoken removal#231
Conversation
PR Reviewer Guide 🔍Here are some key observations to aid the review process:
|
PR Code Suggestions ✨Explore these optional code suggestions:
|
KRRT7
left a comment
There was a problem hiding this comment.
this already broke once, we should add extensive test cases using various samples of code to ensure we don't get more regressions
…tiktoken`) Here is an optimized version of your code. The bottleneck is minimal as the computation is a single multiplication and a cast to int, which is already fast. However, a very minor optimization can be done by avoiding the `int()` call for many cases by using integer division directly. You can also remove the `__future__` import, as `annotations` is default since Python 3.7. Here is an optimized version. This avoids floating point multiplication and conversion overhead, and gives the same result as `int(len(s)*0.25)` for non-negative integer `len(s)`.
⚡️ Codeflash found optimizations for this PR📄 70% (0.70x) speedup for
|
…tiktoken`) Here is an optimized version of your code. The multiplication and conversion to int are very fast, but calling `len()` on a Python string first computes the length. To minimize overhead, we can use integer arithmetic to avoid the float operations in `len(s)*0.3`. Multiplying by 0.3 is equivalent to multiplying by 3 and integer dividing by 10. Here's the optimized code. This avoids floating point multiplication and `int()` casting, and is slightly faster. All comments and signatures are preserved.
⚡️ Codeflash found optimizations for this PR📄 39% (0.39x) speedup for
|
User description
We increase the compression ratio from 0.5 to 0.3
PR Type
Enhancement
Description
Changes walkthrough 📝
code_utils.py
Update token length estimation factorcodeflash/code_utils/code_utils.py