Conversation
|
/gemini review |
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request migrates the JavaScript execution engine used in Beam YAML from js2py to py-mini-racer. This change addresses compatibility issues with newer Python versions and improves the robustness of JavaScript UDF execution by implementing thread-safe context management and better type conversion between JavaScript and Python. Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request replaces the js2py dependency with py_mini_racer for JavaScript mapping functions in Beam YAML, improving compatibility with Python 3.12. The implementation introduces a js_to_py conversion utility and a thread-based caching mechanism for JavaScript contexts. Review feedback suggests optimizing performance by moving local imports to the top level and using threading.local() for context management. Additionally, it is recommended to use the ctx.call API for better argument handling and to rely on standard collection protocols instead of internal library class names for type checking.
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request replaces the js2py library with py_mini_racer for executing JavaScript mapping functions in Beam YAML, which improves compatibility with Python 3.12. The changes introduce a thread-local context for JavaScript execution and a new conversion utility, js_to_py, to handle the mapping of JavaScript objects to Python types. Feedback identifies several issues: missing imports for Mapping and Iterable that will cause runtime errors, an unused json import, and a logic flaw in js_to_py that would incorrectly process bytes objects as generic iterables. Additionally, a new test case requires an update to its assertions to match the datetime objects now returned by the implementation.
34a4b7a to
4db9046
Compare
|
Run Python PreCommit 3.10 |
|
Run Python PreCommit 3.12 |
|
Run Python_Coverage PreCommit 3.10 |
|
Run Python_ML PreCommit 3.12 |
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request replaces the js2py library with py_mini_racer to support JavaScript mapping functions in Beam YAML, addressing Python 3.12 compatibility. Key changes include a thread-local JavaScript context and a js_to_py utility for type conversion. Feedback highlights a critical issue with missing imports for Mapping and Iterable, which will cause runtime errors. Additionally, the automatic conversion of ISO-formatted strings to datetime objects in js_to_py is flagged as potentially too aggressive, risking unintended data type changes.
|
Run Python PreCommit 3.12 |
|
Run Python_Coverage PreCommit 3.10 |
|
Run Python_Runners PreCommit 3.11 |
|
Run Python_Runners PreCommit 3.12 |
|
Run Python_Coverage PreCommit 3.10 |
|
Run Python PreCommit 3.12 |
|
Run Python PreCommit 3.14 |
1 similar comment
|
Run Python PreCommit 3.14 |
|
Run Python_Coverage PreCommit 3.10 |
|
Run PythonDocker PreCommit 3.12 |
|
Run Python_Coverage PreCommit 3.10 |
|
Run PythonDocker PreCommit 3.12 |
b02460e to
5a91930
Compare
|
Run Python PreCommit 3.10 |
|
Run Python_Transforms PreCommit 3.11 |
|
Run PythonDocker PreCommit 3.13 |
|
Run Python PreCommit 3.13 |
|
Run Python PreCommit 3.10 |
|
Run Python_Examples PreCommit 3.14 |
|
Run Python_Runners PreCommit 3.10 |
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request replaces the 'js2py' library with 'py-mini-racer' to provide JavaScript UDF support in Beam YAML, which improves compatibility with Python 3.12. The changes introduce new DoFn implementations for filtering and mapping, along with a utility to convert JavaScript objects to Python types. Review feedback identifies several critical issues: missing imports for 'FileSystems' and 'schema_from_element_type', undefined references to 'Mapping' and 'Iterable' in the conversion logic, and the omission of error handling for the new JS transforms. Additionally, reading UDF files should be moved to the expansion phase for efficiency, and the package dependency name in 'setup.py' needs to be corrected to 'py-mini-racer'.
| import datetime | ||
| import itertools | ||
| import re | ||
| import threading | ||
| import uuid |
There was a problem hiding this comment.
| elif isinstance(obj, Mapping): | ||
| return {k: js_to_py(v) for k, v in obj.items()} | ||
| elif not isinstance(obj, (str, bytes)) and isinstance(obj, Iterable): |
There was a problem hiding this comment.
The names Mapping and Iterable are not defined in this scope. Since collections.abc is imported as abc on line 25, you should use abc.Mapping and abc.Iterable to avoid a NameError at runtime.
| elif isinstance(obj, Mapping): | |
| return {k: js_to_py(v) for k, v in obj.items()} | |
| elif not isinstance(obj, (str, bytes)) and isinstance(obj, Iterable): | |
| elif isinstance(obj, abc.Mapping): | |
| return {k: js_to_py(v) for k, v in obj.items()} | |
| elif not isinstance(obj, (str, bytes)) and isinstance(obj, abc.Iterable): |
| [f.name for f in schema_from_element_type(pcoll.element_type).fields], | ||
| **keep | ||
| ) | ||
| return pcoll | beam.ParDo(JsFilterDoFn(udf_code, function_name)) |
There was a problem hiding this comment.
The use of beam.ParDo(JsFilterDoFn(...)) directly bypasses the error_handling configuration. You should wrap this transform with maybe_with_exception_handling_transform_fn to ensure that errors in JavaScript execution can be caught and handled according to the YAML specification (e.g., redirected to a dead-letter queue).
| return pcoll | beam.ParDo( | ||
| JsMapToFieldsDoFn(fields, original_fields, input_schema)) |
| elif 'path' in expr and 'name' in expr: | ||
| path = expr['path'] | ||
| func_name = expr['name'] | ||
| udf_code = FileSystems.open(path).read().decode() | ||
| script.append(udf_code) | ||
| self.field_funcs[name] = func_name |
There was a problem hiding this comment.
Reading the UDF file inside JsMapToFieldsDoFn.setup is inefficient as it happens on every worker/bundle initialization. It's also inconsistent with JsFilterDoFn where the code is read at expansion time. It is recommended to read the file content during pipeline construction (expansion) and pass the code string to the DoFn constructor.
| 'virtualenv-clone>=0.5,<1.0', | ||
| # https://github.com/PiotrDabkowski/Js2Py/issues/317 | ||
| 'js2py>=0.74,<1; python_version<"3.12"', | ||
| 'mini-racer', |
There was a problem hiding this comment.
Please add a meaningful description for your change here
Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:
addresses #123), if applicable. This will automatically add a link to the pull request in the issue. If you would like the issue to automatically close on merging the pull request, commentfixes #<ISSUE NUMBER>instead.CHANGES.mdwith noteworthy changes.See the Contributor Guide for more tips on how to make review process smoother.
To check the build health, please visit https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md
GitHub Actions Tests Status (on master branch)
See CI.md for more information about GitHub Actions CI or the workflows README to see a list of phrases to trigger workflows.