fix: is_valid_module_name rejects names containing '.'#730
Open
kmosher wants to merge 1 commit into
Open
Conversation
get_model_name_parts synthesizes candidates like "parakeet_tdt_0.6b"
(joining "parakeet", "tdt", "0.6b" with underscores). The previous
implementation only validated the first character, so dotted candidates
passed through. get_model_category then handed them to
importlib.util.find_spec(f"mlx_audio.{cat}.models.{candidate}"), which
treats '.' as a package separator and raises ModuleNotFoundError on the
synthetic parent ('mlx_audio.tts.models.parakeet_tdt_0' for the example
above). The exception aborts model resolution before the loop can reach
the valid 'parakeet' / 'whisper' candidate.
Use str.isidentifier() so any non-identifier candidate (dots, hyphens,
leading digits) is rejected and the loops continue to the next hint.
Affects every HF repo whose name contains '.', including all parakeet
*-0.6b / *-1.1b variants and most whisper-large-v3 mlx-community
conversions.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
is_valid_module_namecurrently only validates the first character of a candidate, so it accepts strings that contain.(e.g.parakeet_tdt_0.6b).get_model_categorypasses such candidates straight intoimportlib.util.find_spec(f"mlx_audio.{category}.models.{candidate}"), which interprets the dot as a package separator and tries to import a parent module that doesn't exist — raisingModuleNotFoundErrorand aborting model resolution.The result: every model whose repo name contains a
.(which is nearly every parakeet variant —*-0.6b-*,*-1.1b-*— and most Whisper variants likewhisper-large-v3) crashes during load, even though theparakeet/whispercandidate would have resolved correctly on a later iteration.Reproduction
The candidate
parakeet_tdt_0.6b(synthesized byget_model_name_partsjoiningparakeet,tdt, and0.6bwith_) is passed tofind_specwhich splits on.and looks formlx_audio.tts.models.parakeet_tdt_0.Fix
Replace the partial first-character check in
is_valid_module_namewith Python's canonical identifier check,str.isidentifier(). This correctly rejects any string containing.(or other non-identifier characters), soget_model_category's loops skip the dotted candidate and continue to the validparakeet/whisperhint.def is_valid_module_name(name: str) -> bool: """Check if a string is a valid Python module name.""" if not name or not isinstance(name, str): return False - return name[0].isalpha() or name[0] == "_" + # Must be a valid Python identifier: letter/underscore start, then + # only letters/digits/underscores. Crucially rejects names containing + # '.' (e.g. "parakeet_tdt_0.6b"), which importlib.util.find_spec would + # otherwise interpret as a package path and crash with + # ModuleNotFoundError on the synthetic parent. + return name.isidentifier()After the fix, the same
load_model("mlx-community/parakeet-tdt-0.6b-v3")call resolves via theparakeetcandidate →mlx_audio.stt.models.parakeet, downloads weights, and serves transcription requests normally.Test plan
mlx_audio/tests/test_utils.py::test_is_valid_module_name_rejects_dotscovering the regression and a handful of valid/invalid identifier cases.mlx_audio.serverrunning on:8890,POST /v1/audio/transcriptionswithmodel=mlx-community/parakeet-tdt-0.6b-v3andmodel=mlx-community/parakeet-tdt_ctc-110mboth load and transcribe successfully.uv run pytest mlx_audio/tests/).Affected models (non-exhaustive)
Any HuggingFace repo whose name contains
., including:mlx-community/parakeet-tdt-0.6b-v3,-v2mlx-community/parakeet-ctc-0.6b,parakeet-rnnt-0.6b,parakeet-tdt-1.1b,parakeet-rnnt-1.1b,parakeet-ctc-1.1bmlx-community/parakeet-tdt_ctc-0.6b-ja,parakeet-tdt_ctc-1.1b