Skip to content

Commit f482dfb

Browse files
committed
align media local path autodetection
1 parent 4d981fb commit f482dfb

6 files changed

Lines changed: 46 additions & 36 deletions

File tree

docs/colab_notebooks/4-providing-images-as-context.ipynb

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -311,7 +311,7 @@
311311
"]\n",
312312
"```\n",
313313
"\n",
314-
"URL-backed media can use `data_type=dd.ModalityDataType.URL`, subject to the provider's URL support and file-size limits. Local audio/video paths in URL mode require the model endpoint to have filesystem access to the same paths, typically a colocated vLLM server configured for local media access."
314+
"URL-backed media can use `data_type=dd.ModalityDataType.URL`, subject to the provider's URL support and file-size limits. Local audio/video paths require explicit URL mode and require the model endpoint to have filesystem access to the same paths, typically a colocated vLLM server configured for local media access."
315315
]
316316
},
317317
{

docs/notebook_source/4-providing-images-as-context.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -184,7 +184,7 @@ def convert_image_to_chat_format(record, height: int) -> dict:
184184
# ]
185185
# ```
186186
#
187-
# URL-backed media can use `data_type=dd.ModalityDataType.URL`, subject to the provider's URL support and file-size limits. Local audio/video paths in URL mode require the model endpoint to have filesystem access to the same paths, typically a colocated vLLM server configured for local media access.
187+
# URL-backed media can use `data_type=dd.ModalityDataType.URL`, subject to the provider's URL support and file-size limits. Local audio/video paths require explicit URL mode and require the model endpoint to have filesystem access to the same paths, typically a colocated vLLM server configured for local media access.
188188

189189
# %%
190190
# Add a column to generate detailed image descriptions

fern/versions/latest/pages/concepts/models/default-model-settings.mdx

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -75,7 +75,7 @@ The following model configurations are automatically available when `OPENROUTER_
7575
| `openrouter-embedding` | `openai/text-embedding-3-large` | Text embeddings | `encoding_format="float"` |
7676

7777
<Note title="Modality support depends on the model">
78-
The `multi_modal_context` field can include image, audio, and video contexts, but each model/provider combination has its own accepted input formats, media-size limits, and modality mix. Use an image-capable model for image-only workflows, and use an omni or otherwise multimodal model before sending audio or video context. Local audio/video paths in URL mode require the model endpoint to have filesystem access to the same paths, typically a colocated vLLM server configured for local media access.
78+
The `multi_modal_context` field can include image, audio, and video contexts, but each model/provider combination has its own accepted input formats, media-size limits, and modality mix. Use an image-capable model for image-only workflows, and use an omni or otherwise multimodal model before sending audio or video context. Local audio/video paths require explicit URL mode (`data_type=url`) and require the model endpoint to have filesystem access to the same paths, typically a colocated vLLM server configured for local media access.
7979
</Note>
8080

8181

fern/versions/latest/pages/concepts/models/model-configs.mdx

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@ Model configurations define the specific models you use for synthetic data gener
99

1010
A `ModelConfig` specifies which LLM model to use and how it should behave during generation. When you create column configurations (like `LLMText`, `LLMCode`, or `LLMStructured`), you reference a model by its alias. Data Designer uses the model configuration to determine which model to call and with what parameters.
1111

12-
When a column includes `multi_modal_context`, the `ModelConfig` alias must point to a model that supports the media types you send. Data Designer can serialize image, audio, and video context blocks, but model capability is still provider-specific. Local audio/video paths in URL mode require the model endpoint to have filesystem access to the same paths, typically a colocated vLLM server configured for local media access.
12+
When a column includes `multi_modal_context`, the `ModelConfig` alias must point to a model that supports the media types you send. Data Designer can serialize image, audio, and video context blocks, but model capability is still provider-specific. Local audio/video paths require explicit URL mode (`data_type=url`) and require the model endpoint to have filesystem access to the same paths, typically a colocated vLLM server configured for local media access.
1313

1414
## ModelConfig Structure
1515

packages/data-designer-config/src/data_designer/config/models.py

Lines changed: 14 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -180,10 +180,9 @@ def _image_formats_match(configured_format: ImageFormat, detected_format: ImageF
180180
class AudioContext(ModalityContext):
181181
"""Configuration for providing audio context to multimodal models.
182182
183-
Audio context values are URL, local path, or base64 media values. Local
184-
paths are passed through so colocated vLLM servers can read them directly.
185-
``audio_format`` is consulted only for base64 sources; URL and local-path
186-
sources are passed through unchanged.
183+
Audio context values are URL or base64 media values. Local paths may be
184+
passed through only in explicit URL mode so colocated model endpoints can
185+
read them directly. ``audio_format`` is consulted only for base64 sources.
187186
"""
188187

189188
modality: Literal[Modality.AUDIO] = Modality.AUDIO
@@ -193,7 +192,7 @@ def get_contexts(self, record: dict, *, base_path: str | None = None) -> list[di
193192
"""Get audio contexts.
194193
195194
``base_path`` is accepted for signature compatibility with ``ImageContext``
196-
but unused; local audio paths are passed through unchanged.
195+
but unused; audio contexts do not resolve local files to base64.
197196
"""
198197
return [self._build_context(value) for value in normalize_media_context_values(record[self.column_name])]
199198

@@ -202,7 +201,7 @@ def _build_context(self, context_value: Any) -> dict[str, Any]:
202201
self._validate_url_context_value(context_value)
203202
return get_media_url_context(Modality.AUDIO.value, context_value)
204203

205-
if self.data_type is None and (is_audio_path(context_value) or is_media_url(context_value)):
204+
if self.data_type is None and is_media_url(context_value):
206205
return get_media_url_context(Modality.AUDIO.value, context_value)
207206

208207
media_type, data = self._resolve_base64_parts(context_value)
@@ -223,8 +222,8 @@ def _resolve_base64_parts(self, context_value: Any) -> tuple[str, Any]:
223222

224223
if is_audio_path(context_value):
225224
raise ValueError(
226-
"audio base64 context values must be base64 audio data; use data_type=url "
227-
"or omit data_type to pass local audio paths through"
225+
"audio context values that look like local paths must use data_type=url; "
226+
"otherwise provide base64 audio data"
228227
)
229228

230229
if self.audio_format is None:
@@ -245,10 +244,9 @@ def _validate_audio_format(self) -> Self:
245244
class VideoContext(ModalityContext):
246245
"""Configuration for providing video context to multimodal models.
247246
248-
Video context values are URL, local path, or base64 media values. Local
249-
paths are passed through so colocated vLLM servers can read them directly.
250-
``video_format`` is consulted only for base64 sources; URL and local-path
251-
sources are passed through unchanged.
247+
Video context values are URL or base64 media values. Local paths may be
248+
passed through only in explicit URL mode so colocated model endpoints can
249+
read them directly. ``video_format`` is consulted only for base64 sources.
252250
"""
253251

254252
modality: Literal[Modality.VIDEO] = Modality.VIDEO
@@ -258,7 +256,7 @@ def get_contexts(self, record: dict, *, base_path: str | None = None) -> list[di
258256
"""Get video contexts.
259257
260258
``base_path`` is accepted for signature compatibility with ``ImageContext``
261-
but unused; local video paths are passed through unchanged.
259+
but unused; video contexts do not resolve local files to base64.
262260
"""
263261
return [self._build_context(value) for value in normalize_media_context_values(record[self.column_name])]
264262

@@ -267,7 +265,7 @@ def _build_context(self, context_value: Any) -> dict[str, Any]:
267265
self._validate_url_context_value(context_value)
268266
return get_media_url_context(Modality.VIDEO.value, context_value)
269267

270-
if self.data_type is None and (is_video_path(context_value) or is_media_url(context_value)):
268+
if self.data_type is None and is_media_url(context_value):
271269
return get_media_url_context(Modality.VIDEO.value, context_value)
272270

273271
media_type, data = self._resolve_base64_parts(context_value)
@@ -288,8 +286,8 @@ def _resolve_base64_parts(self, context_value: Any) -> tuple[str, Any]:
288286

289287
if is_video_path(context_value):
290288
raise ValueError(
291-
"video base64 context values must be base64 video data; use data_type=url "
292-
"or omit data_type to pass local video paths through"
289+
"video context values that look like local paths must use data_type=url; "
290+
"otherwise provide base64 video data"
293291
)
294292

295293
if self.video_format is None:

packages/data-designer-config/tests/config/test_models.py

Lines changed: 28 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -246,6 +246,9 @@ def test_audio_context_get_contexts_single_string() -> None:
246246
assert audio_context.get_contexts({"audio_url": "recordings/speech.mp3"}) == [
247247
get_media_url_context(Modality.AUDIO.value, "recordings/speech.mp3")
248248
]
249+
assert audio_context.get_contexts({"audio_url": "file:///data/recordings/speech.mp3"}) == [
250+
get_media_url_context(Modality.AUDIO.value, "file:///data/recordings/speech.mp3")
251+
]
249252

250253

251254
def test_audio_context_get_contexts_list_json_and_numpy() -> None:
@@ -276,10 +279,6 @@ def test_audio_context_auto_detect_url_and_data_uri() -> None:
276279
get_media_url_context(Modality.AUDIO.value, "https://example.com/audio.mp3")
277280
]
278281

279-
assert AudioContext(column_name="audio_col").get_contexts({"audio_col": "recordings/speech.wav"}) == [
280-
get_media_url_context(Modality.AUDIO.value, "recordings/speech.wav")
281-
]
282-
283282
assert AudioContext(column_name="audio_col").get_contexts({"audio_col": "https://example.com/download?id=123"}) == [
284283
get_media_url_context(Modality.AUDIO.value, "https://example.com/download?id=123")
285284
]
@@ -289,6 +288,12 @@ def test_audio_context_auto_detect_url_and_data_uri() -> None:
289288
]
290289

291290

291+
@pytest.mark.parametrize("audio_path", ["recordings/speech.wav", "file:///data/recordings/speech.mp3"])
292+
def test_audio_context_auto_detect_local_path_rejected(audio_path: str) -> None:
293+
with pytest.raises(ValueError, match="audio context values that look like local paths must use data_type=url"):
294+
AudioContext(column_name="audio_col").get_contexts({"audio_col": audio_path})
295+
296+
292297
def test_audio_context_validate_audio_format() -> None:
293298
with pytest.raises(ValueError, match="audio_format is required when data_type is base64"):
294299
AudioContext(column_name="audio_base64", data_type=ModalityDataType.BASE64)
@@ -304,11 +309,12 @@ def test_audio_context_validate_audio_format() -> None:
304309
{"audio_base64": "data:audio/mpeg;base64,audio1base64"}
305310
)
306311

307-
assert AudioContext(column_name="audio_base64", audio_format=AudioFormat.MP3).get_contexts(
308-
{"audio_base64": "screen_recording.mp3"}
309-
) == [get_media_url_context(Modality.AUDIO.value, "screen_recording.mp3")]
312+
with pytest.raises(ValueError, match="audio context values that look like local paths must use data_type=url"):
313+
AudioContext(column_name="audio_base64", audio_format=AudioFormat.MP3).get_contexts(
314+
{"audio_base64": "screen_recording.mp3"}
315+
)
310316

311-
with pytest.raises(ValueError, match="audio base64 context values must be base64 audio data"):
317+
with pytest.raises(ValueError, match="audio context values that look like local paths must use data_type=url"):
312318
AudioContext(
313319
column_name="audio_base64", data_type=ModalityDataType.BASE64, audio_format=AudioFormat.MP3
314320
).get_contexts({"audio_base64": "screen_recording.mp3"})
@@ -329,6 +335,9 @@ def test_video_context_get_contexts_single_string() -> None:
329335
assert video_context.get_contexts({"video_url": "clips/screen_recording.mp4"}) == [
330336
get_media_url_context(Modality.VIDEO.value, "clips/screen_recording.mp4")
331337
]
338+
assert video_context.get_contexts({"video_url": "file:///data/clips/screen_recording.mp4"}) == [
339+
get_media_url_context(Modality.VIDEO.value, "file:///data/clips/screen_recording.mp4")
340+
]
332341

333342

334343
def test_video_context_get_contexts_list_json_and_numpy() -> None:
@@ -359,10 +368,6 @@ def test_video_context_auto_detect_url_and_data_uri() -> None:
359368
get_media_url_context(Modality.VIDEO.value, "https://example.com/video.mp4")
360369
]
361370

362-
assert VideoContext(column_name="video_col").get_contexts({"video_col": "clips/screen_recording.webm"}) == [
363-
get_media_url_context(Modality.VIDEO.value, "clips/screen_recording.webm")
364-
]
365-
366371
assert VideoContext(column_name="video_col").get_contexts({"video_col": "https://example.com/download?id=123"}) == [
367372
get_media_url_context(Modality.VIDEO.value, "https://example.com/download?id=123")
368373
]
@@ -372,6 +377,12 @@ def test_video_context_auto_detect_url_and_data_uri() -> None:
372377
]
373378

374379

380+
@pytest.mark.parametrize("video_path", ["clips/screen_recording.webm", "file:///data/clips/screen_recording.mp4"])
381+
def test_video_context_auto_detect_local_path_rejected(video_path: str) -> None:
382+
with pytest.raises(ValueError, match="video context values that look like local paths must use data_type=url"):
383+
VideoContext(column_name="video_col").get_contexts({"video_col": video_path})
384+
385+
375386
def test_video_context_validate_video_format() -> None:
376387
with pytest.raises(ValueError, match="video_format is required when data_type is base64"):
377388
VideoContext(column_name="video_base64", data_type=ModalityDataType.BASE64)
@@ -387,11 +398,12 @@ def test_video_context_validate_video_format() -> None:
387398
{"video_base64": "data:video/mp4;base64,video1base64"}
388399
)
389400

390-
assert VideoContext(column_name="video_base64", video_format=VideoFormat.MP4).get_contexts(
391-
{"video_base64": "screen_recording.mp4"}
392-
) == [get_media_url_context(Modality.VIDEO.value, "screen_recording.mp4")]
401+
with pytest.raises(ValueError, match="video context values that look like local paths must use data_type=url"):
402+
VideoContext(column_name="video_base64", video_format=VideoFormat.MP4).get_contexts(
403+
{"video_base64": "screen_recording.mp4"}
404+
)
393405

394-
with pytest.raises(ValueError, match="video base64 context values must be base64 video data"):
406+
with pytest.raises(ValueError, match="video context values that look like local paths must use data_type=url"):
395407
VideoContext(
396408
column_name="video_base64", data_type=ModalityDataType.BASE64, video_format=VideoFormat.MP4
397409
).get_contexts({"video_base64": "screen_recording.mp4"})

0 commit comments

Comments
 (0)