Skip to content

fix: is_valid_module_name rejects names containing '.'#730

Open
kmosher wants to merge 1 commit into
Blaizzy:mainfrom
kmosher:fix/is-valid-module-name-rejects-dots
Open

fix: is_valid_module_name rejects names containing '.'#730
kmosher wants to merge 1 commit into
Blaizzy:mainfrom
kmosher:fix/is-valid-module-name-rejects-dots

Conversation

@kmosher
Copy link
Copy Markdown

@kmosher kmosher commented May 19, 2026

Summary

is_valid_module_name currently only validates the first character of a candidate, so it accepts strings that contain . (e.g. parakeet_tdt_0.6b). get_model_category passes such candidates straight into importlib.util.find_spec(f"mlx_audio.{category}.models.{candidate}"), which interprets the dot as a package separator and tries to import a parent module that doesn't exist — raising ModuleNotFoundError and aborting model resolution.

The result: every model whose repo name contains a . (which is nearly every parakeet variant — *-0.6b-*, *-1.1b-* — and most Whisper variants like whisper-large-v3) crashes during load, even though the parakeet / whisper candidate would have resolved correctly on a later iteration.

Reproduction

from mlx_audio.utils import load_model
load_model("mlx-community/parakeet-tdt-0.6b-v3")
Traceback (most recent call last):
  File ".../mlx_audio/utils.py", line 778, in get_model_category
    if importlib.util.find_spec(module_path) is not None:
  File ".../importlib/util.py", line 94, in find_spec
    parent = __import__(parent_name, fromlist=['__path__'])
ModuleNotFoundError: No module named 'mlx_audio.tts.models.parakeet_tdt_0'

The candidate parakeet_tdt_0.6b (synthesized by get_model_name_parts joining parakeet, tdt, and 0.6b with _) is passed to find_spec which splits on . and looks for mlx_audio.tts.models.parakeet_tdt_0.

Fix

Replace the partial first-character check in is_valid_module_name with Python's canonical identifier check, str.isidentifier(). This correctly rejects any string containing . (or other non-identifier characters), so get_model_category's loops skip the dotted candidate and continue to the valid parakeet / whisper hint.

 def is_valid_module_name(name: str) -> bool:
     """Check if a string is a valid Python module name."""
     if not name or not isinstance(name, str):
         return False
 
-    return name[0].isalpha() or name[0] == "_"
+    # Must be a valid Python identifier: letter/underscore start, then
+    # only letters/digits/underscores. Crucially rejects names containing
+    # '.' (e.g. "parakeet_tdt_0.6b"), which importlib.util.find_spec would
+    # otherwise interpret as a package path and crash with
+    # ModuleNotFoundError on the synthetic parent.
+    return name.isidentifier()

After the fix, the same load_model("mlx-community/parakeet-tdt-0.6b-v3") call resolves via the parakeet candidate → mlx_audio.stt.models.parakeet, downloads weights, and serves transcription requests normally.

Test plan

  • Added mlx_audio/tests/test_utils.py::test_is_valid_module_name_rejects_dots covering the regression and a handful of valid/invalid identifier cases.
  • Manually verified end-to-end: mlx_audio.server running on :8890, POST /v1/audio/transcriptions with model=mlx-community/parakeet-tdt-0.6b-v3 and model=mlx-community/parakeet-tdt_ctc-110m both load and transcribe successfully.
  • Existing tests pass (uv run pytest mlx_audio/tests/).

Affected models (non-exhaustive)

Any HuggingFace repo whose name contains ., including:

  • mlx-community/parakeet-tdt-0.6b-v3, -v2
  • mlx-community/parakeet-ctc-0.6b, parakeet-rnnt-0.6b, parakeet-tdt-1.1b, parakeet-rnnt-1.1b, parakeet-ctc-1.1b
  • mlx-community/parakeet-tdt_ctc-0.6b-ja, parakeet-tdt_ctc-1.1b

get_model_name_parts synthesizes candidates like "parakeet_tdt_0.6b"
(joining "parakeet", "tdt", "0.6b" with underscores). The previous
implementation only validated the first character, so dotted candidates
passed through. get_model_category then handed them to
importlib.util.find_spec(f"mlx_audio.{cat}.models.{candidate}"), which
treats '.' as a package separator and raises ModuleNotFoundError on the
synthetic parent ('mlx_audio.tts.models.parakeet_tdt_0' for the example
above). The exception aborts model resolution before the loop can reach
the valid 'parakeet' / 'whisper' candidate.

Use str.isidentifier() so any non-identifier candidate (dots, hyphens,
leading digits) is rejected and the loops continue to the next hint.

Affects every HF repo whose name contains '.', including all parakeet
*-0.6b / *-1.1b variants and most whisper-large-v3 mlx-community
conversions.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant