System Information:
- OS: macOS (Apple Silicon)
- Python Version: 3.12
- Environment: Local
pytest execution
Bug Description:
When running the gemma/gm/text test suite locally on macOS Apple Silicon, the pytest-xdist workers crash with the following unhandled C++ exception during interpreter shutdown:
libc++abi: terminating due to uncaught exception of type std::__1::system_error: mutex lock failed: Invalid argument
Root Cause:
The issue stems from gemma/gm/utils/_file_cache.py. Currently, maybe_get_from_cache does not actually download and save the tokenizer model to the local cache directory if it is missing. Instead, it returns the gs:// path, causing epath.Path('gs://...').read_bytes() to be executed.
This directly invokes the TensorFlow C++ GCS client via gRPC to read the bytes over the network. On macOS Apple Silicon, there is a known bug where tearing down the SentencePieceProcessor and the TensorFlow C++ GCS client threads during interpreter shutdown causes a mutex lock failure and segfaults the process.
Proposed Fix:
We should update _file_cache.py to intercept gs:// paths, download them over standard HTTP (e.g., using urllib.request), and save them to the local ~/.gemma/tokenizer/ cache directory before returning the path.
This ensures epath only ever performs local disk I/O, completely avoiding the C++ GCS client and preventing the crash on macOS. As an added benefit, this introduces true local caching, meaning the test suite (and user code) won't have to download the tokenizer over the network on every single run.
I will be raising a PR shortly with this fix!
System Information:
pytestexecutionBug Description:
When running the
gemma/gm/texttest suite locally on macOS Apple Silicon, thepytest-xdistworkers crash with the following unhandled C++ exception during interpreter shutdown:libc++abi: terminating due to uncaught exception of type std::__1::system_error: mutex lock failed: Invalid argumentRoot Cause:
The issue stems from
gemma/gm/utils/_file_cache.py. Currently,maybe_get_from_cachedoes not actually download and save the tokenizer model to the local cache directory if it is missing. Instead, it returns thegs://path, causingepath.Path('gs://...').read_bytes()to be executed.This directly invokes the TensorFlow C++ GCS client via gRPC to read the bytes over the network. On macOS Apple Silicon, there is a known bug where tearing down the
SentencePieceProcessorand the TensorFlow C++ GCS client threads during interpreter shutdown causes a mutex lock failure and segfaults the process.Proposed Fix:
We should update
_file_cache.pyto interceptgs://paths, download them over standard HTTP (e.g., usingurllib.request), and save them to the local~/.gemma/tokenizer/cache directory before returning the path.This ensures
epathonly ever performs local disk I/O, completely avoiding the C++ GCS client and preventing the crash on macOS. As an added benefit, this introduces true local caching, meaning the test suite (and user code) won't have to download the tokenizer over the network on every single run.I will be raising a PR shortly with this fix!