Skip to content

[wip] - js2py to mini-racer#38236

Draft
derrickaw wants to merge 7 commits intoapache:masterfrom
derrickaw:20260417_miniracerJs2Py
Draft

[wip] - js2py to mini-racer#38236
derrickaw wants to merge 7 commits intoapache:masterfrom
derrickaw:20260417_miniracerJs2Py

Conversation

@derrickaw
Copy link
Copy Markdown
Collaborator

Please add a meaningful description for your change here


Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:

  • Mention the appropriate issue in your description (for example: addresses #123), if applicable. This will automatically add a link to the pull request in the issue. If you would like the issue to automatically close on merging the pull request, comment fixes #<ISSUE NUMBER> instead.
  • Update CHANGES.md with noteworthy changes.
  • If this contribution is large, please file an Apache Individual Contributor License Agreement.

See the Contributor Guide for more tips on how to make review process smoother.

To check the build health, please visit https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md

GitHub Actions Tests Status (on master branch)

Build python source distribution and wheels
Python tests
Java tests
Go tests

See CI.md for more information about GitHub Actions CI or the workflows README to see a list of phrases to trigger workflows.

@derrickaw
Copy link
Copy Markdown
Collaborator Author

/gemini review

@gemini-code-assist
Copy link
Copy Markdown
Contributor

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request migrates the JavaScript execution engine used in Beam YAML from js2py to py-mini-racer. This change addresses compatibility issues with newer Python versions and improves the robustness of JavaScript UDF execution by implementing thread-safe context management and better type conversion between JavaScript and Python.

Highlights

  • Dependency Migration: Replaced the js2py dependency with py-mini-racer in setup.py to support modern Python versions.
  • Implementation Update: Updated yaml_mapping.py to use MiniRacer for JavaScript execution, including a new thread-safe context caching mechanism and a utility to convert JavaScript objects to Python types.
  • Test Suite Updates: Updated existing tests to use MiniRacer and added a new test case to verify correct handling of JavaScript Date objects.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request replaces the js2py dependency with py_mini_racer for JavaScript mapping functions in Beam YAML, improving compatibility with Python 3.12. The implementation introduces a js_to_py conversion utility and a thread-based caching mechanism for JavaScript contexts. Review feedback suggests optimizing performance by moving local imports to the top level and using threading.local() for context management. Additionally, it is recommended to use the ctx.call API for better argument handling and to rely on standard collection protocols instead of internal library class names for type checking.

Comment thread sdks/python/apache_beam/yaml/yaml_mapping.py Outdated
Comment thread sdks/python/apache_beam/yaml/yaml_mapping.py Outdated
Comment thread sdks/python/apache_beam/yaml/yaml_mapping.py Outdated
Comment thread sdks/python/apache_beam/yaml/yaml_mapping.py Outdated
@derrickaw
Copy link
Copy Markdown
Collaborator Author

/gemini review

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request replaces the js2py library with py_mini_racer for executing JavaScript mapping functions in Beam YAML, which improves compatibility with Python 3.12. The changes introduce a thread-local context for JavaScript execution and a new conversion utility, js_to_py, to handle the mapping of JavaScript objects to Python types. Feedback identifies several issues: missing imports for Mapping and Iterable that will cause runtime errors, an unused json import, and a logic flaw in js_to_py that would incorrectly process bytes objects as generic iterables. Additionally, a new test case requires an update to its assertions to match the datetime objects now returned by the implementation.

Comment thread sdks/python/apache_beam/yaml/yaml_mapping.py
Comment thread sdks/python/apache_beam/yaml/yaml_mapping.py Outdated
Comment thread sdks/python/apache_beam/yaml/yaml_mapping.py Outdated
Comment thread sdks/python/apache_beam/yaml/yaml_udf_test.py Outdated
@derrickaw derrickaw force-pushed the 20260417_miniracerJs2Py branch from 34a4b7a to 4db9046 Compare April 17, 2026 19:31
@derrickaw
Copy link
Copy Markdown
Collaborator Author

Run Python PreCommit 3.10

@derrickaw
Copy link
Copy Markdown
Collaborator Author

Run Python PreCommit 3.12

@derrickaw
Copy link
Copy Markdown
Collaborator Author

Run Python_Coverage PreCommit 3.10

@derrickaw
Copy link
Copy Markdown
Collaborator Author

Run Python_ML PreCommit 3.12

@derrickaw
Copy link
Copy Markdown
Collaborator Author

/gemini review

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request replaces the js2py library with py_mini_racer to support JavaScript mapping functions in Beam YAML, addressing Python 3.12 compatibility. Key changes include a thread-local JavaScript context and a js_to_py utility for type conversion. Feedback highlights a critical issue with missing imports for Mapping and Iterable, which will cause runtime errors. Additionally, the automatic conversion of ISO-formatted strings to datetime objects in js_to_py is flagged as potentially too aggressive, risking unintended data type changes.

Comment thread sdks/python/apache_beam/yaml/yaml_mapping.py
Comment thread sdks/python/apache_beam/yaml/yaml_mapping.py Outdated
@derrickaw
Copy link
Copy Markdown
Collaborator Author

Run Python PreCommit 3.12

@derrickaw
Copy link
Copy Markdown
Collaborator Author

Run Python_Coverage PreCommit 3.10

@derrickaw
Copy link
Copy Markdown
Collaborator Author

Run Python_Runners PreCommit 3.11

@derrickaw
Copy link
Copy Markdown
Collaborator Author

Run Python_Runners PreCommit 3.12

@derrickaw
Copy link
Copy Markdown
Collaborator Author

Run Python_Coverage PreCommit 3.10

@derrickaw
Copy link
Copy Markdown
Collaborator Author

Run Python PreCommit 3.12

@derrickaw
Copy link
Copy Markdown
Collaborator Author

Run Python PreCommit 3.14

1 similar comment
@derrickaw
Copy link
Copy Markdown
Collaborator Author

Run Python PreCommit 3.14

@derrickaw
Copy link
Copy Markdown
Collaborator Author

Run Python_Coverage PreCommit 3.10

@derrickaw
Copy link
Copy Markdown
Collaborator Author

Run PythonDocker PreCommit 3.12

@derrickaw
Copy link
Copy Markdown
Collaborator Author

Run Python_Coverage PreCommit 3.10

@derrickaw
Copy link
Copy Markdown
Collaborator Author

Run PythonDocker PreCommit 3.12

@derrickaw derrickaw force-pushed the 20260417_miniracerJs2Py branch from b02460e to 5a91930 Compare April 22, 2026 13:13
@derrickaw
Copy link
Copy Markdown
Collaborator Author

Run Python PreCommit 3.10

@derrickaw
Copy link
Copy Markdown
Collaborator Author

Run Python_Transforms PreCommit 3.11

@derrickaw
Copy link
Copy Markdown
Collaborator Author

Run PythonDocker PreCommit 3.13

@derrickaw
Copy link
Copy Markdown
Collaborator Author

Run Python PreCommit 3.13

@derrickaw
Copy link
Copy Markdown
Collaborator Author

Run Python PreCommit 3.10

@derrickaw
Copy link
Copy Markdown
Collaborator Author

Run Python_Examples PreCommit 3.14

@derrickaw
Copy link
Copy Markdown
Collaborator Author

Run Python_Runners PreCommit 3.10

@derrickaw
Copy link
Copy Markdown
Collaborator Author

/gemini review

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request replaces the 'js2py' library with 'py-mini-racer' to provide JavaScript UDF support in Beam YAML, which improves compatibility with Python 3.12. The changes introduce new DoFn implementations for filtering and mapping, along with a utility to convert JavaScript objects to Python types. Review feedback identifies several critical issues: missing imports for 'FileSystems' and 'schema_from_element_type', undefined references to 'Mapping' and 'Iterable' in the conversion logic, and the omission of error handling for the new JS transforms. Additionally, reading UDF files should be moved to the expansion phase for efficiency, and the package dependency name in 'setup.py' needs to be corrected to 'py-mini-racer'.

Comment on lines +20 to +24
import datetime
import itertools
import re
import threading
import uuid
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Several required imports are missing for the new functionality: FileSystems (from apache_beam.io.filesystems) and schema_from_element_type (from apache_beam.typehints.schemas). Additionally, threading and uuid appear to be unused in this file and should be removed to keep the code clean.

Comment on lines +209 to +211
elif isinstance(obj, Mapping):
return {k: js_to_py(v) for k, v in obj.items()}
elif not isinstance(obj, (str, bytes)) and isinstance(obj, Iterable):
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The names Mapping and Iterable are not defined in this scope. Since collections.abc is imported as abc on line 25, you should use abc.Mapping and abc.Iterable to avoid a NameError at runtime.

Suggested change
elif isinstance(obj, Mapping):
return {k: js_to_py(v) for k, v in obj.items()}
elif not isinstance(obj, (str, bytes)) and isinstance(obj, Iterable):
elif isinstance(obj, abc.Mapping):
return {k: js_to_py(v) for k, v in obj.items()}
elif not isinstance(obj, (str, bytes)) and isinstance(obj, abc.Iterable):

[f.name for f in schema_from_element_type(pcoll.element_type).fields],
**keep
)
return pcoll | beam.ParDo(JsFilterDoFn(udf_code, function_name))
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The use of beam.ParDo(JsFilterDoFn(...)) directly bypasses the error_handling configuration. You should wrap this transform with maybe_with_exception_handling_transform_fn to ensure that errors in JavaScript execution can be caught and handled according to the YAML specification (e.g., redirected to a dead-letter queue).

Comment on lines +764 to +765
return pcoll | beam.ParDo(
JsMapToFieldsDoFn(fields, original_fields, input_schema))
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

This ParDo also ignores the error_handling parameter. It should be wrapped with maybe_with_exception_handling_transform_fn to maintain consistency with other YAML transforms and support error redirection for failed JS evaluations.

Comment on lines +273 to +278
elif 'path' in expr and 'name' in expr:
path = expr['path']
func_name = expr['name']
udf_code = FileSystems.open(path).read().decode()
script.append(udf_code)
self.field_funcs[name] = func_name
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Reading the UDF file inside JsMapToFieldsDoFn.setup is inefficient as it happens on every worker/bundle initialization. It's also inconsistent with JsFilterDoFn where the code is read at expansion time. It is recommended to read the file content during pipeline construction (expansion) and pass the code string to the DoFn constructor.

Comment thread sdks/python/setup.py
'virtualenv-clone>=0.5,<1.0',
# https://github.com/PiotrDabkowski/Js2Py/issues/317
'js2py>=0.74,<1; python_version<"3.12"',
'mini-racer',
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The correct package name for the V8 bridge is py-mini-racer. The mini-racer package is often an older or different version. Given the import from py_mini_racer import MiniRacer used in the SDK, py-mini-racer should be specified as the dependency.

Suggested change
'mini-racer',
'py-mini-racer',

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant