Skip to content

Improve trust_remote_code#13448

Open
hlky wants to merge 5 commits intohuggingface:mainfrom
hlky:trust-remote-code
Open

Improve trust_remote_code#13448
hlky wants to merge 5 commits intohuggingface:mainfrom
hlky:trust-remote-code

Conversation

@hlky
Copy link
Copy Markdown
Contributor

@hlky hlky commented Apr 12, 2026

What does this PR do?

As per #13446 trust_remote_code fails under several circumstances:

  • pretrained_model_name_or_path as Hub repo A and custom_pipeline as Hub repo B, trust_remote_code is bypassed and remote code runs from repo B
from diffusers import DiffusionPipeline

DiffusionPipeline.from_pretrained(
    "google/ddpm-cifar10-32", custom_pipeline="XManFromXlab/diffuser-custom-pipeline", trust_remote_code=False
)
  • pretrained_model_name_or_path as local directory and custom_pipeline as Hub repo B, trust_remote_code is never checked and remote code runs from repo B
from diffusers import DiffusionPipeline
from huggingface_hub import snapshot_download

snapshot_path = snapshot_download(repo_id="google/ddpm-cifar10-32")
DiffusionPipeline.from_pretrained(
    snapshot_path, custom_pipeline="XManFromXlab/diffuser-custom-pipeline", trust_remote_code=False
)
  • pretrained_model_name_or_path as local directory with custom components, trust_remote_code is never checked
from diffusers import DiffusionPipeline
from huggingface_hub import snapshot_download

snapshot_path = snapshot_download(repo_id="hf-internal-testing/tiny-sdxl-custom-components")
pipeline = DiffusionPipeline.from_pretrained(
    snapshot_path, trust_remote_code=False
)
assert pipeline.config.unet == ("diffusers_modules.local.my_unet_model", "MyUNetModel")
assert pipeline.config.scheduler == ("diffusers_modules.local.my_scheduler", "MyScheduler")

This moves trust_remote_code checks into get_cached_module_file where the actual custom module loading takes place. trust_remote_code is passed from several code paths for complete coverage.

I've added 3 separate ValueError in get_cached_module_file to account for the different sources of remote code: local, git and hub. git path could be considered trusted as these are from https://huggingface.co/datasets/diffusers/community-pipelines-mirror, hub path covers "custom pipelines" and local path is reached for "custom components" (these are added to the allowed files in download so become local files).

With PR the above 3 cases are resolved:

The repository for XManFromXlab/diffuser-custom-pipeline contains custom code in pipeline.py which must be executed to correctly load the model. You can inspect the repository content at https://hf.co/XManFromXlab/diffuser-custom-pipeline/blob/main/pipeline.py.
Please pass the argument `trust_remote_code=True` to allow custom code to be run.
The repository for XManFromXlab/diffuser-custom-pipeline contains custom code in pipeline.py which must be executed to correctly load the model. You can inspect the repository content at https://hf.co/XManFromXlab/diffuser-custom-pipeline/blob/main/pipeline.py.
Please pass the argument `trust_remote_code=True` to allow custom code to be run.
ValueError: The directory C:\Users\user\.cache\huggingface\hub\models--hf-internal-testing--tiny-sdxl-custom-components\snapshots\ce2b9d3f819e7791f53053646ebe37d7e87d73d3\unet contains custom code in my_unet_model.py which must be executed to correctly load the model. You can inspect the file content at C:\Users\user\.cache\huggingface\hub\models--hf-internal-testing--tiny-sdxl-custom-components\snapshots\ce2b9d3f819e7791f53053646ebe37d7e87d73d3\unet\my_unet_model.py.
Please pass the argument `trust_remote_code=True` to allow custom code to be run.

Fixes #13446

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

@@ -1674,21 +1678,6 @@ def download(cls, pretrained_model_name, **kwargs) -> str | os.PathLike:
custom_class_name = config_dict["_class_name"][1]

load_pipe_from_hub = custom_pipeline is not None and f"{custom_pipeline}.py" in filenames
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to note this remains to control repo_id:

repo_id=pretrained_model_name if load_pipe_from_hub else None,

def _get_custom_pipeline_class(
custom_pipeline,
repo_id=None,
hub_revision=None,
class_name=None,
cache_dir=None,
revision=None,
):
if custom_pipeline.endswith(".py"):
path = Path(custom_pipeline)
# decompose into folder & file
file_name = path.name
custom_pipeline = path.parent.absolute()
elif repo_id is not None:
file_name = f"{custom_pipeline}.py"
custom_pipeline = repo_id
else:
file_name = CUSTOM_PIPELINE_FILE_NAME

It helps distinguish between:
a) custom_pipeline is e.g. my_pipeline and that filename exists in pretrained_model_name's files
b) custom_pipeline is Hub repo (and pipeline.py is used)

Maybe could be renamed load_pipe_from_hub -> hub_contains_custom_pipeline?


class CustomPipelineTests(unittest.TestCase):
def test_load_custom_pipeline(self):
with self.assertRaises(ValueError):
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we investigate the ValueError messaging as well (it should have something related to the use of trust_remote_code or not something else)?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And on main, this should not have yielded a ValueError, right? That is how we know, for one instance, that it's broken.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On main:

pretrained custom_pipeline trust_remote_code?
hub/repoA my_pipeline
hub/repoA one_step_unet[1]
hub/repoA hub/repoB
any local directory any

PR:

pretrained custom_pipeline trust_remote_code?
hub/repoA my_pipeline
hub/repoA one_step_unet[1]
hub/repoA hub/repoB
any local directory any

[1] or any community pipeline name

This case is more implicit vs explicit consent, but on main there is potential for misuse by combining the "trusted" nature of community pipeline names and third party Hub repos.

A user may copy an example like:

pipeline = DiffusionPipeline.from_pretrained("google/ddpm-cifar10-32", custom_pipeline="one_step_unet")

or

pipeline = DiffusionPipeline.from_pretrained("google/ddpm-cifar10-32", custom_pipeline="pipeline_stable_diffusion_xl_controlnet_adapter_inpaint")

or

pipeline = DiffusionPipeline.from_pretrained("google/ddpm-cifar10-32", custom_pipeline="pipeline_stable_diffusion_x/_controlnet_adapter_inpaint")

The first two are harmless from diffusers/community-pipelines-mirror, the third is malicious with a user registered as pipeline_stable_diffusion_x with a repo name _controlnet_adapter_inpaint. There are many community pipelines so many potential username/repo name combinations that could easily be missed.

Considering that I think community pipeline names should remain trusted, WDYT? We can just remove this to do so.

if not trust_remote_code:
raise ValueError(
f"The community pipeline for {pretrained_model_name_or_path} contains custom code which must be executed to correctly "
f"load the model. You can inspect the repository content at https://hf.co/datasets/{COMMUNITY_PIPELINES_MIRROR_ID}/blob/main/{revision}/{pretrained_model_name_or_path}.py.\n"
f"Please pass the argument `trust_remote_code=True` to allow custom code to be run."
)

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hlky Let's keep community pipelines trusted. I think that path is safe. The check here should catch cases like custom_pipeline="pipeline_stable_diffusion_x/_controlnet_adapter_inpaint"
https://github.com/hlky/diffusers/blob/5eb5aaf842bde5f07bd459ef4ef19e07d94b0f93/src/diffusers/utils/dynamic_modules_utils.py#L321

revision: str | None = None,
local_files_only: bool = False,
local_dir: str | None = None,
trust_remote_code: bool = False,
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense to add the ValueError to the caller sites of get_cached_module_file instead? Because the function itself isn't specifically tied to custom pipelines, I think.

custom_class_name = config_dict["_class_name"][1]

load_pipe_from_hub = custom_pipeline is not None and f"{custom_pipeline}.py" in filenames
load_components_from_hub = len(custom_components) > 0
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this going?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See case 3 - this code is in download which is only reached when we use a Hub path with from_pretrained. It is replaced by the check in get_cached_module_file, specifically the local code path.

Consider this scenario:

hf download rotcasuoicilam/SuperCoolNewModel --local-dir rotcasuoicilam/SuperCoolNewModel
from diffusers import DiffusionPipeline
pipeline = DiffusionPipeline.from_pretrained("rotcasuoicilam/SuperCoolNewModel")

rotcasuoicilam/SuperCoolNewModel contains malicious custom components, user downloads the Hub repo assuming it is safe, Diffusers loads the custom components without the user's consent.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't get it.

hf download rotcasuoicilam/SuperCoolNewModel --local-dir rotcasuoicilam/SuperCoolNewModel

is agnostic to DiffusionPipeline.from_pretrained(...).

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. A malicious actor uploads a model with malicious custom components
  2. Either:
    2a. The user follows instructions that say to download the model first then run from the local path
    2b. The user chooses to download the model first out of personal preference
  3. DiffusionPipeline.from_pretrained(the_local_path)
  4. pwned

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's add a test case for this scenario then.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

c0e0731

On main:

FAILED tests/pipelines/test_pipelines.py::CustomPipelineTests::test_custom_components_from_local_dir - AssertionError: ValueError not raised

PR:

1 passed

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A bit more elaborate explanation for other reviewers (feel free to correct).

The critical branching point is:

if not os.path.isdir(pretrained_model_name_or_path):                                                                                     
    # ... calls cls.download() which had the trust_remote_code check                                                                     
else:                                                                                                                                    
    cached_folder = pretrained_model_name_or_path  

When you call from_pretrained("rotcasuoicilam/SuperCoolNewModel"):

  1. os.path.isdir("rotcasuoicilam/SuperCoolNewModel") is checked.
  2. If the user previously ran hf download ... --local-dir rotcasuoicilam/SuperCoolNewModel, that directory exists locally.
  3. So os.path.isdir() returns True, and the code takes the else branch at line 871 — it just sets cached_folder = pretrained_model_name_or_path directly.
  4. The download() method is never called.

The old trust_remote_code check for custom components lived inside download(). Since download() is skipped entirely when the path is a
local directory, the check never runs. The custom components in that local folder get loaded without any consent gate.

That's why the fix moves the trust_remote_code check into get_cached_module_file — that's where the actual import of custom .py files
happens, and it runs regardless of whether the code came through download() or the local else branch.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well just to be clear, using the same local-dir as Hub repo is a possible trick to hide the attack as anyone who didn't pre-download wouldn't be affected but any local directory is affected, and the local directory could be from other sources like snapshot_download or git clone.

@sayakpaul sayakpaul requested a review from DN6 April 13, 2026 03:06
@github-actions github-actions bot added size/M PR with diff < 200 LOC and removed size/M PR with diff < 200 LOC labels Apr 13, 2026
@sayakpaul
Copy link
Copy Markdown
Member

@bot /style

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 13, 2026

Style bot fixed some files and pushed the changes.

@github-actions github-actions bot added size/M PR with diff < 200 LOC and removed size/M PR with diff < 200 LOC labels Apr 13, 2026
@HuggingFaceDocBuilderDev
Copy link
Copy Markdown

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@github-actions github-actions bot added size/M PR with diff < 200 LOC and removed size/M PR with diff < 200 LOC labels Apr 13, 2026
@hlky
Copy link
Copy Markdown
Contributor Author

hlky commented Apr 13, 2026

I was curious how common custom components are on the Hub. The results are limited but I managed to scrape 9602 Hub repo paths from the model pages. 190 were gated, 4685 actually had model_index.json - the rest must be LoRA or mis-tagged. Out of those, 58 had custom components with a total of 97 custom components. It is unlikely any of these are malicious but it is still interesting that if loading from a local path any of these Hub repos would currently load custom code without requiring trust_remote_code=True.
has_module.json

@sayakpaul sayakpaul requested a review from yiyixuxu April 17, 2026 15:48
@sayakpaul
Copy link
Copy Markdown
Member

@claude could you do a review?

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 17, 2026

Claude finished @sayakpaul's task in 2m 31s —— View job


Review of trust_remote_code improvements

  • Read repository review guidelines (AGENTS.md, etc.)
  • Read PR diff and changed files
  • Analyze security fix correctness
  • Provide detailed review feedback

Overall this is a good security fix. Moving trust_remote_code enforcement into get_cached_module_file is the right architectural choice — it guards the actual code-loading choke point rather than relying on callers to remember to check. The three bypass scenarios described in the PR body are real and well-documented.

I found one bug and a few issues to address:


Bug: AutoModel.from_pretrained missing trust_remote_code passthrough

src/diffusers/models/auto_model.py:318-324 — In the from_pretrained method, get_class_from_dynamic_module is called without trust_remote_code:

if has_remote_code and trust_remote_code:  # trust_remote_code is True here
    ...
    model_cls = get_class_from_dynamic_module(
        pretrained_model_or_path,
        subfolder=subfolder,
        module_file=module_file,
        class_name=class_name,
        **hub_kwargs,  # does NOT contain trust_remote_code
    )

hub_kwargs is constructed at lines 262-270 from only cache_dir, force_download, local_files_only, proxies, revision, token — it does not include trust_remote_code.

With this PR, get_cached_module_file now defaults trust_remote_code=False and raises ValueError. So AutoModel.from_pretrained(..., trust_remote_code=True) with remote code will always fail with a confusing error telling the user to pass trust_remote_code=True even though they already did.

Fix: add trust_remote_code=trust_remote_code to the call at line 318, same as was done for from_config at line 123. Fix this →


Behavioral change: community pipelines now require trust_remote_code=True

src/diffusers/utils/dynamic_modules_utils.py:340-345 — The new check in the pretrained_model_name_or_path.count("/") == 0 branch (community pipeline path) now requires trust_remote_code=True for community pipelines like one_step_unet. This is a breaking change for existing users of community pipelines.

This was discussed in the review thread where @hlky raised the question of whether community pipeline names should remain trusted. The PR currently requires trust for them. While this is the more secure default, it's a notable breaking change — every existing custom_pipeline="some_community_pipeline" call will need trust_remote_code=True.

This should be an explicit decision: if community pipelines should remain trusted (since they're from the controlled diffusers/community-pipelines-mirror dataset), the check at lines 340-345 can be removed. If they should require trust (for the typo-squatting concern @hlky raised), this is a valid trade-off but should be called out in the PR description as a breaking change.


Test assertions should verify error message content

tests/pipelines/test_pipelines.py:1014, 1061, 1116 — As @sayakpaul noted, the tests should verify the ValueError message mentions trust_remote_code. For example:

with self.assertRaises(ValueError) as cm:
    pipeline = DiffusionPipeline.from_pretrained(
        "google/ddpm-cifar10-32", custom_pipeline="hf-internal-testing/diffusers-dummy-pipeline"
    )
self.assertIn("trust_remote_code", str(cm.exception))

This ensures the test actually catches the new security gate, not some other ValueError.


Minor: pre-existing typo in docstring

src/diffusers/models/auto_model.py:235 has trust_remote_cocde instead of trust_remote_code. Not introduced by this PR but worth a drive-by fix since the file is already touched.


Summary

The core approach is sound — centralizing trust_remote_code enforcement in get_cached_module_file is correct and closes the three bypass vectors. The AutoModel.from_pretrained missing passthrough is the critical fix needed before merge. The community pipeline trust decision and test assertion improvements are secondary but should be addressed.

@github-actions github-actions bot added size/M PR with diff < 200 LOC and removed size/M PR with diff < 200 LOC labels Apr 17, 2026
@yiyixuxu
Copy link
Copy Markdown
Collaborator

yiyixuxu commented Apr 19, 2026

thanks for looking into this @hlky

looping @DN6 into the discussion

Should we start to look into deprecating trust_remote_code in DiffusionPipeline.from_pretrained()? particularly to deprecate the loading remote components, but still support custom defined pipeline. Currently, we support two separate methods of hosting & holding remote components in diffusers and it is not very nice
e.g. custom defined models like this one (https://huggingface.co/hf-internal-testing/tiny-sdxl-custom-components/tree/main/unet) cannot be loaded with

AutoModel.from_pretrained("hf-internal-testing/tiny-sdxl-custom-components", sub_folder="unet", trust_remote_code=True)

but it can be loaded as part of the pipeline using

pipeline = DiffusionPipeline.from_pretrained("hf-internal-testing/tiny-sdxl-custom-components", trust_remote_code=True)

I think AutoModel should be the only way to load remote model moving forward, no?
we should still support the custom_pipeline = argument , but should deprecate loading the custom model components so if they have a custom pipeline contains custom models, they should load the model first using AutoModel and pass it to the pipeline.

haven't thought it through yet and it'd be really tricky to derepcate but here is some of my initial thoughts on a few scenarios I think we should support & should not support:

(1) DiffusionPipeline.from_pretrained(repo1, trust_remote_code=true): this we should deprecate, ask to load model using AutoModel and load custom pipeline code using custom_pipeline argment, e.g.

    unet = AutoModel.from_pretrained(repo1, subfolder=..., trust_remote_code=True)
    pipeline = DiffusionPipeline.from_pretrained(unet=unet, custom_pipeline=..,)
   

(2) DiffusionPipeline.from_pretrained(repo1, custom_pipeline=repo2): currently, this just works, even though the trust_remote_code is default to False; but maybe we shouldn't allow it unless the user explicitly turns on the trust_remote_code flag?

let me know what you think

Copy link
Copy Markdown
Collaborator

@DN6 DN6 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for addressing @hlky 👍🏽 Some minor comments.


class CustomPipelineTests(unittest.TestCase):
def test_load_custom_pipeline(self):
with self.assertRaises(ValueError):
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hlky Let's keep community pipelines trusted. I think that path is safe. The check here should catch cases like custom_pipeline="pipeline_stable_diffusion_x/_controlnet_adapter_inpaint"
https://github.com/hlky/diffusers/blob/5eb5aaf842bde5f07bd459ef4ef19e07d94b0f93/src/diffusers/utils/dynamic_modules_utils.py#L321

assert pipeline.__class__.__name__ == "UnetSchedulerOneForwardPipeline"

def test_run_custom_pipeline(self):
with self.assertRaises(ValueError):
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we add a check for the error message content

e.g.

 with self.assertRaises(ValueError) as cm:
      DiffusionPipeline.from_pretrained(
          "google/ddpm-cifar10-32", custom_pipeline="hf-internal-testing/diffusers-dummy-pipeline"
      )
  self.assertIn(
      "Please pass the argument `trust_remote_code=True` to allow custom code to be run.",
      str(cm.exception),
  )

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Insufficient trust check for custom_pipeline parameter of DiffusionPipeline.from_pretrained method

5 participants