Skip to content

Upgrade Struct2Tensor to TensorFlow 2.21.0#59

Merged
vkarampudi merged 47 commits intogoogle:masterfrom
vkarampudi:testing1
May 7, 2026
Merged

Upgrade Struct2Tensor to TensorFlow 2.21.0#59
vkarampudi merged 47 commits intogoogle:masterfrom
vkarampudi:testing1

Conversation

@vkarampudi
Copy link
Copy Markdown
Collaborator

@vkarampudi vkarampudi commented Apr 10, 2026

This PR upgrades struct2tensor to support TensorFlow 2.21.0 and resolves various build system incompatibilities, dependency conflicts, and compilation errors encountered during the process.

Key Changes

1. TensorFlow 2.21.0 Dependency Resolution Conflicts

  • Updated .bazelversion to 7.7.0 to align with TensorFlow 2.21.0 requirements.
  • Updated WORKSPACE to fetch TensorFlow 2.21.0
  • Problem: The build failed with numerous errors regarding missing or conflicting external repositories (e.g., rules_java, local_config_cuda, local_config_tensorrt, llvm-raw, etc.).
  • Reason: Upgrading to TensorFlow 2.21.0 introduced new transitive dependencies and assumed the presence of toolchains (like Java and GPU-specific configurations) not directly needed by struct2tensor.
  • Fix: We created multiple dummy repositories in the WORKSPACE file to satisfy these unresolved references without adding unnecessary overhead. We also applied custom patches to TensorFlow’s source files to remove Java, CUDA, ROCm, and TensorRT dependencies that were causing failures.

2. Python 3.12, 3.13 Support and Python 3.9 Drop

  • Problem: The codebase needed to be updated to support Python 3.12, 3.13 and drop support for the deprecated Python 3.9, aligning with newer TensorFlow requirements.
  • Reason: Evolving ecosystem standards and dependency compatibility demanded a Python version upgrade.
  • Fix: We modified setup.py and environment.yml to declare support for Python 3.12, 3.13 dropped Python 3.9, and pinned compatible versions for critical libraries like pyarrow (>=14), h5py, and ml-dtypes.

3. Protobuf Version Mismatch (Compiler vs. Headers)

  • Problem: Build failed with fatal errors stating that generated Protobuf files (e.g., field_mask.pb.h) were generated by a newer version of protoc than the Protocol Buffer headers being included in compilation.
  • Reason: Bazel 7’s default Bzlmod behavior ignored the manual overrides specified in WORKSPACE, causing it to fetch mismatched Protobuf compilers and headers.
  • Fix: We explicitly disabled Bzlmod in .bazelrc using common --noenable_bzlmod and moved the register_toolchains call to the very top of the WORKSPACE file to enforce correct resolution priority. We also applied a specialized patch (protobuf_tensorflow.patch) to fix macro and namespace issues.

4. Test Sharding Validation Errors

  • Problem: Running benchmark tests threw errors stating that "Sharding was requested, but the test runner did not advertise support".
  • Reason: Bazel 7 enforces stricter checks on test sharding protocols by default.
  • Fix: Added test --noincompatible_check_sharding_support to .bazelrc to allow the tests to skip this check and execute successfully.

5. CI Assembler Error (--gsframe)

  • Problem: The CI build failed with an error from the assembler: as: option '--gsframe' doesn't allow an argument.
  • Reason: The flag --gsframe=no was added to fixed local build issues on GCC 15, but the CI runner uses an older assembler that does not support this flag.
  • Fix: We moved the flag to a specific --config=sframe_fix in .bazelrc. This allows default builds (including CI) to run without it, while local users with GCC 15 can use bazel build --config=sframe_fix to avoid linker errors.

6. Added CI Pre-Commit workflow

  • Problem: we don't have a workflow to check the lint Issues
  • Fix: We Implemented new pre-commit workflow that check lint Issue on each PR workflow run to main branch

7. Simplified Dependencies in setup.py

  • Problem: Conditional dependencies for protobuf based on Python version were present in setup.py.
  • Reason: Historical compatibility constraints that are no longer strictly necessary for the targeted versions.
  • Fix: Simplified the protobuf dependency to a single constraint protobuf>=4.25.2,<7.0.0 which satisfies Python 3.10, 3.11, and 3.12.

8. Updated CI Workflow for Multi-Version Testing

  • Problem: The CI workflow failed to resolve Python version conflicts and specific dependency versions (Keras) when using a matrix.
  • Reason: environment.yml had a fixed Python version (3.12) and fixed Keras version (3.13.2) which conflicted with matrix versions or Python 3.10 compatibility.
  • Fix: Added a python-version matrix ["3.10", "3.11", "3.12"] to build.yml. Updated environment.yml
    to use a Python version range (>=3.10,<3.13) and relaxed Keras constraint to keras>=3.0.0

9. Docker TF-Serving Build Infrastructure Refactoring

  • Problem: Struct2Tensor's Docker TF-Serving build environment was locked to Python 3.9 and Bazel 6, causing build failures and toolchain resolution mismatch issues during the TF 2.21.0 transition.
  • Reason: Brittle inline patches and hardcoded configurations in the Dockerfile could not fetch or adapt to modern dependencies like rules_apple and rules_cc under Bazel 7.
  • Fix: Completely refactored struct2tensor/tools/tf_serving_docker/Dockerfile:
    • Upgraded default system Python from 3.9 to 3.13.
    • Injected triple toolchain overrides—rules_apple-3.13.0, rules_swift-1.18.0, and apple_support-1.15.1—directly into the WORKSPACE during container build to resolve platform translation load failures.
    • Created an automated CI job docker-serving-build in .github/workflows/build.yml to build, compile, and verify the serving image on every pull request and push.

10. Modern Release Wheel Compilation (manylinux_2_28 & manylinux_2_35)

  • Problem: Compilation of portable release wheels was locked to the legacy manylinux2014 (glibc 2.17) environment, which is incompatible with modern high-performance runtime standards like tfx-bsl.
  • Reason: Hardcoded compilation parameters in the build scripts prevented compatibility with newer glibc systems.
  • Fix: Built a modern multi-platform release infrastructure:
    • Created Dockerfile.manylinux_2_28 utilizing devtoolsets (/dt10/usr/bin/gcc) and upgraded native auditwheel to 6.6.0.
    • refactored build_manylinux.sh to make the platform tags dynamic using an environment variable (AUDITWHEEL_PLAT_TAG).
    • Set and verified dynamic wheel stamping using auditwheel repair targeting manylinux_2_35 due to modern symbol requirements inside the compiled C++ operator files (e.g., _parquet_dataset_op.so), producing struct2tensor-0.49.0.dev0-cp312-cp312-manylinux_2_35_x86_64.whl.
    • Appended manylinux_2_28 services natively to docker-compose.yml.

Verification (Additional Results)

  • Local Serving Verification: Successfully verified serving compilation through a background container build, compiling all 11,907 targets cleanly under Bazel 7 overrides and tagging the compiled production serving image as struct2tensor-serving:latest.
  • Manylinux_2_28 Wheel Compilation: Successfully compiled all 1,003 optimized targets under double sandboxing, automatically unbundled external references, repaired dependencies, and verified that the dynamic repacked wheel file contains all required _parquet_dataset_op.so and _decode_proto_map_op.so C++ operator assets.

Verification

All tests passed successfully locally:

bazel test //... --noincompatible_check_sharding_support ...

@vkarampudi vkarampudi changed the title Testing1 Upgrade Struct2Tensor to TensorFlow 2.21.0 Apr 11, 2026
Comment thread environment.yml
- h5py==3.14.0
- idna==3.11
- keras==3.13.2
- keras>=3.0.0
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this lowered? The vulnerabilities ask for this to be 3.12.1+.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Keras is lowered because 3.13.2 dosn't support python 3.10 version, since we have python 3.10 workflows, Instead of removing it, I loosen the dependency to satisfy the requirement.

Comment thread setup.py Outdated
'protobuf>=4.25.2,<6.0.0;python_version>="3.11"',
'protobuf>=4.21.6,<6.0.0;python_version<"3.11"',
'tensorflow>=2.17,<2.18',
'protobuf>=4.25.2,<7.0.0',
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In environment.yml you set protobuf to 6.31. Why isn't this set to that level?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've updated the protobuf version range to >=6.0.0,<7.0.0 in setup.py. Since 6.31.1 falls within this range, it should satisfy the requirement. As we generally avoid pinning exact versions in setup.py, I believe maintaining version ranges is better. If we pinned an exact version, anyone building S2T who requires a different protobuf version than 6.31.1 might encounter issues.

Comment thread environment.yml
- python>=3.10,<3.13

- pip
- pip:
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this file include explicit specifications of component versions needed to address vulnerabilities? I don't see components like cryptography, jaraco-context, ipython, pyopenssl.

Also, I see some that have values that don't meet the "fix" level for the vulnerabilities

  • requests needs to be 2.33 (not 2.32.5)
  • pygments needs to be at 2.20 (not 2.19.2)

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've also updated the below dependencies to fix vulnerabilities, meanwhile these are transitive dependency's [cryptography, jaraco-context, ipython, pyopenssl.] that might be coming from diff components, I think we can't include them here..!

  • requests to 2.33
  • pygments to 2.20

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Noted.

Copy link
Copy Markdown

@rwitcher rwitcher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few clarifications

@vkarampudi vkarampudi requested review from rwitcher April 14, 2026 17:49
Copy link
Copy Markdown

@rwitcher rwitcher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Copy Markdown

@rwitcher rwitcher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Copy Markdown

@rwitcher rwitcher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@vkarampudi vkarampudi merged commit 169c8bf into google:master May 7, 2026
12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants