You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: CHERRY_PICK_TRACKER.md
+6-2Lines changed: 6 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -36,8 +36,8 @@ These fixes are already part of the fork's `main` branch:
36
36
37
37
| Status | PR | Title | Impact | Notes |
38
38
|--------|----|-------|--------|-------|
39
-
|[]|[#350](https://github.com/google/langextract/pull/350)| Fix incorrect `char_interval` for non-ASCII text (Fixes #334) | Fixes `RegexTokenizer` merging Latin + CJK characters. Fork already has #284 which may overlap. |Draft. Check if #334 is still reproducible. |
40
-
|[]|[#257](https://github.com/google/langextract/pull/257)| Add retry mechanism for transient API errors (503, 429, timeouts) | Exponential backoff for LLM API failures. Useful but large (XL, 997 lines), no reviews. | Consider implementing simpler retry in our worker instead. |
39
+
|[x]|[#350](https://github.com/google/langextract/pull/350)| Fix incorrect `char_interval` for non-ASCII text (Fixes #334) | Fixes `RegexTokenizer` merging Latin + CJK characters. Uses regex V1 set subtraction to separate CJK scripts from Latin in token patterns. |Applied manually. Adds `_CJK_SCRIPTS`, `_CJK_PATTERN`, and modifies `_LETTERS_PATTERN` with V1 set subtraction. 142→421 tests pass (new retry tests included). |
40
+
|[x]|[#257](https://github.com/google/langextract/pull/257)| Add retry mechanism for transient API errors (503, 429, timeouts) | Exponential backoff with jitter for transient LLM failures. Chunk-level retry in annotation pipeline preserves successful chunks. | Applied via `git apply --reject` + manual conflict resolution. New files: `retry_utils.py` (278 lines), `retry_utils_test.py` (300 lines). Modified: `annotation.py`, `extraction.py`, `gemini.py`, `annotation_test.py`. Complementary to litellm's provider-level `num_retries`. |
0 commit comments