Skip to content

huggingface/ml-intern

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

446 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

smolagents logo

License Website

ML Intern

An ML intern that autonomously researches, writes, and ships good quality ML related code using the Hugging Face ecosystem β€” with deep access to docs, papers, datasets, and cloud compute.

Quick Start

Installation

git clone git@github.com:huggingface/ml-intern.git
cd ml-intern
uv sync
uv tool install -e .

That's it. Now ml-intern works from any directory:

ml-intern

Create a .env file in the project root (or export these in your shell):

ANTHROPIC_API_KEY=<your-anthropic-api-key> # if using anthropic models
OPENAI_API_KEY=<your-openai-api-key> # if using openai models
LOCAL_LLM_BASE_URL=http://localhost:8000 # shared fallback for local model prefixes
LOCAL_LLM_API_KEY=<optional-local-api-key> # optional shared local API key
HF_TOKEN=<your-hugging-face-token>
GITHUB_TOKEN=<github-personal-access-token> 

If no HF_TOKEN is set, the CLI will prompt you to paste one on first launch unless you start on a local model. To get a GITHUB_TOKEN follow the tutorial here.

Usage

Interactive mode (start a chat session):

ml-intern

Headless mode (single prompt, auto-approve):

ml-intern "fine-tune llama on my dataset"

Options:

ml-intern --model anthropic/claude-opus-4-7 "your prompt"   # requires ANTHROPIC_API_KEY
ml-intern --model openai/gpt-5.5 "your prompt"              # requires OPENAI_API_KEY
ml-intern --model ollama/llama3.1:8b "your prompt"
ml-intern --model vllm/meta-llama/Llama-3.1-8B-Instruct "your prompt"
ml-intern --sandbox-tools "your prompt"                         # use HF Space sandbox tools
ml-intern --max-iterations 100 "your prompt"
ml-intern --no-stream "your prompt"

Run ml-intern then /model to see the full list of suggested model ids (Claude, GPT, HF-router models like MiniMax, Kimi, GLM, DeepSeek, and local model prefixes).

Local models:

Local model support uses OpenAI-compatible HTTP endpoints through LiteLLM. The agent does not load model weights directly from disk; start your inference server first, then select it with a provider-specific model prefix:

ml-intern --model ollama/llama3.1:8b "your prompt"
ml-intern --model vllm/meta-llama/Llama-3.1-8B-Instruct "your prompt"

Inside interactive mode, switch with /model:

/model ollama/llama3.1:8b
/model lm_studio/google/gemma-3-4b
/model llamacpp/llama-3.1-8b-instruct

Supported local prefixes are ollama/, vllm/, lm_studio/, and llamacpp/. Set LOCAL_LLM_BASE_URL and optional LOCAL_LLM_API_KEY to use one shared local endpoint, or override a specific provider with its matching *_BASE_URL / *_API_KEY variable, such as OLLAMA_BASE_URL or VLLM_API_KEY. Provider-specific variables take precedence over the shared local variables. Base URLs may include or omit /v1.

CLI tool runtime:

By default, the CLI runs bash, read, write, and edit on your local filesystem. To use HF Space sandbox tools instead, including sandbox_create, opt in with --sandbox-tools:

ml-intern --sandbox-tools "test this training script in a GPU sandbox"
ml-intern --model llamacpp/ggml-org/gemma-3-1b-it-GGUF --sandbox-tools

Sandbox tool runtime requires HF_TOKEN, even when the selected model is local, because it creates private HF Spaces. You can also make sandbox tools your CLI default in ~/.config/ml-intern/cli_agent_config.json:

{ "tool_runtime": "sandbox" }

Use the default local runtime when you want tools to inspect or edit files in your checkout. Use sandbox runtime when you want the agent to create or replace an HF Space sandbox, test code remotely, or request GPU sandbox hardware before launching larger HF Jobs.

Sharing Traces

Every session is auto-uploaded to your own private Hugging Face dataset in Claude Code JSONL format, which the HF Agent Trace Viewer auto-detects so you can browse turns, tool calls, and model responses directly on the Hub.

By default the dataset is named {your-hf-username}/ml-intern-sessions and is created private. You can flip it to public from inside the CLI:

/share-traces            # show current visibility + dataset URL
/share-traces public     # publish (anyone can view)
/share-traces private    # lock it back down

You can also flip visibility from the dataset page on huggingface.co β€” the agent honours whatever you set there for subsequent uploads.

To opt out entirely, set in your CLI config (e.g. configs/cli_agent_config.json or ~/.config/ml-intern/cli_agent_config.json):

{ "share_traces": false }

To override the destination repo, set:

{ "personal_trace_repo_template": "{hf_user}/my-custom-traces" }

The shared smolagents/ml-intern-sessions dataset is unrelated and only receives anonymized telemetry rows used by the backend KPI scheduler.

Supported Gateways

ML Intern currently supports one-way notification gateways from CLI sessions. These gateways send out-of-band status updates; they do not accept inbound chat messages.

Slack

Slack notifications use the Slack Web API to post messages when the agent needs approval, hits an error, or completes a turn. Create a Slack app with a bot token that has chat:write, invite the bot to the target channel, then set:

SLACK_BOT_TOKEN=xoxb-...
SLACK_CHANNEL_ID=C...

The CLI automatically creates a slack.default destination when both variables are present. Optional environment variables for the env-only default:

ML_INTERN_SLACK_NOTIFICATIONS=false
ML_INTERN_SLACK_DESTINATION=slack.ops
ML_INTERN_SLACK_AUTO_EVENTS=approval_required,error,turn_complete
ML_INTERN_SLACK_ALLOW_AGENT_TOOL=true
ML_INTERN_SLACK_ALLOW_AUTO_EVENTS=true

For a persistent user-level config, put overrides in ~/.config/ml-intern/cli_agent_config.json or point ML_INTERN_CLI_CONFIG at a JSON file:

{
  "messaging": {
    "enabled": true,
    "auto_event_types": ["approval_required", "error", "turn_complete"],
    "destinations": {
      "slack.ops": {
        "provider": "slack",
        "token": "${SLACK_BOT_TOKEN}",
        "channel": "${SLACK_CHANNEL_ID}",
        "allow_agent_tool": true,
        "allow_auto_events": true
      }
    }
  }
}

Architecture

Component Overview

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                         User/CLI                            β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
             β”‚ Operations                          β”‚ Events
             ↓ (user_input, exec_approval,         ↑
      submission_queue  interrupt, compact, ...)  event_queue
             β”‚                                          β”‚
             ↓                                          β”‚
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚
β”‚            submission_loop (agent_loop.py)         β”‚  β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚  β”‚
β”‚  β”‚  1. Receive Operation from queue             β”‚  β”‚  β”‚
β”‚  β”‚  2. Route to handler (run_agent/compact/...) β”‚  β”‚  β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚  β”‚
β”‚                      ↓                             β”‚  β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚  β”‚
β”‚  β”‚         Handlers.run_agent()                 β”‚  β”œβ”€β”€β”€
β”‚  β”‚                                              β”‚  β”‚  β”‚
β”‚  β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚  β”‚  β”‚
β”‚  β”‚  β”‚  Agentic Loop (max 300 iterations)     β”‚  β”‚  β”‚  β”‚
β”‚  β”‚  β”‚                                        β”‚  β”‚  β”‚  β”‚
β”‚  β”‚  β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚  β”‚  β”‚  β”‚
β”‚  β”‚  β”‚  β”‚ Session                          β”‚  β”‚  β”‚  β”‚  β”‚
β”‚  β”‚  β”‚  β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚  β”‚  β”‚  β”‚  β”‚
β”‚  β”‚  β”‚  β”‚  β”‚ ContextManager             β”‚  β”‚  β”‚  β”‚  β”‚  β”‚
β”‚  β”‚  β”‚  β”‚  β”‚ β€’ Message history          β”‚  β”‚  β”‚  β”‚  β”‚  β”‚
β”‚  β”‚  β”‚  β”‚  β”‚   (litellm.Message[])      β”‚  β”‚  β”‚  β”‚  β”‚  β”‚
β”‚  β”‚  β”‚  β”‚  β”‚ β€’ Auto-compaction (170k)   β”‚  β”‚  β”‚  β”‚  β”‚  β”‚
β”‚  β”‚  β”‚  β”‚  β”‚ β€’ Session upload to HF     β”‚  β”‚  β”‚  β”‚  β”‚  β”‚
β”‚  β”‚  β”‚  β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚  β”‚  β”‚  β”‚  β”‚
β”‚  β”‚  β”‚  β”‚                                  β”‚  β”‚  β”‚  β”‚  β”‚
β”‚  β”‚  β”‚  β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚  β”‚  β”‚  β”‚  β”‚
β”‚  β”‚  β”‚  β”‚  β”‚ ToolRouter                 β”‚  β”‚  β”‚  β”‚  β”‚  β”‚
β”‚  β”‚  β”‚  β”‚  β”‚  β”œβ”€ HF docs & research     β”‚  β”‚  β”‚  β”‚  β”‚  β”‚
β”‚  β”‚  β”‚  β”‚  β”‚  β”œβ”€ HF repos, datasets,    β”‚  β”‚  β”‚  β”‚  β”‚  β”‚
β”‚  β”‚  β”‚  β”‚  β”‚  β”‚  jobs, papers           β”‚  β”‚  β”‚  β”‚  β”‚  β”‚
β”‚  β”‚  β”‚  β”‚  β”‚  β”œβ”€ GitHub code search     β”‚  β”‚  β”‚  β”‚  β”‚  β”‚
β”‚  β”‚  β”‚  β”‚  β”‚  β”œβ”€ Sandbox & local tools  β”‚  β”‚  β”‚  β”‚  β”‚  β”‚
β”‚  β”‚  β”‚  β”‚  β”‚  β”œβ”€ Planning               β”‚  β”‚  β”‚  β”‚  β”‚  β”‚
β”‚  β”‚  β”‚  β”‚  β”‚  └─ MCP server tools       β”‚  β”‚  β”‚  β”‚  β”‚  β”‚
β”‚  β”‚  β”‚  β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚  β”‚  β”‚  β”‚  β”‚
β”‚  β”‚  β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚  β”‚  β”‚  β”‚
β”‚  β”‚  β”‚                                        β”‚  β”‚  β”‚  β”‚
β”‚  β”‚  β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚  β”‚  β”‚  β”‚
β”‚  β”‚  β”‚  β”‚ Doom Loop Detector               β”‚  β”‚  β”‚  β”‚  β”‚
β”‚  β”‚  β”‚  β”‚ β€’ Detects repeated tool patterns β”‚  β”‚  β”‚  β”‚  β”‚
β”‚  β”‚  β”‚  β”‚ β€’ Injects corrective prompts     β”‚  β”‚  β”‚  β”‚  β”‚
β”‚  β”‚  β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚  β”‚  β”‚  β”‚
β”‚  β”‚  β”‚                                        β”‚  β”‚  β”‚  β”‚
β”‚  β”‚  β”‚  Loop:                                 β”‚  β”‚  β”‚  β”‚
β”‚  β”‚  β”‚    1. LLM call (litellm.acompletion)   β”‚  β”‚  β”‚  β”‚
β”‚  β”‚  β”‚       ↓                                β”‚  β”‚  β”‚  β”‚
β”‚  β”‚  β”‚    2. Parse tool_calls[]               β”‚  β”‚  β”‚  β”‚
β”‚  β”‚  β”‚       ↓                                β”‚  β”‚  β”‚  β”‚
β”‚  β”‚  β”‚    3. Approval check                   β”‚  β”‚  β”‚  β”‚
β”‚  β”‚  β”‚       (jobs, sandbox, destructive ops) β”‚  β”‚  β”‚  β”‚
β”‚  β”‚  β”‚       ↓                                β”‚  β”‚  β”‚  β”‚
β”‚  β”‚  β”‚    4. Execute via ToolRouter           β”‚  β”‚  β”‚  β”‚
β”‚  β”‚  β”‚       ↓                                β”‚  β”‚  β”‚  β”‚
β”‚  β”‚  β”‚    5. Add results to ContextManager    β”‚  β”‚  β”‚  β”‚
β”‚  β”‚  β”‚       ↓                                β”‚  β”‚  β”‚  β”‚
β”‚  β”‚  β”‚    6. Repeat if tool_calls exist       β”‚  β”‚  β”‚  β”‚
β”‚  β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚  β”‚  β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚  β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”˜

Agentic Loop Flow

User Message
     ↓
[Add to ContextManager]
     ↓
     ╔═══════════════════════════════════════════╗
     β•‘      Iteration Loop (max 300)             β•‘
     β•‘                                           β•‘
     β•‘  Get messages + tool specs                β•‘
     β•‘         ↓                                 β•‘
     β•‘  litellm.acompletion()                    β•‘
     β•‘         ↓                                 β•‘
     β•‘  Has tool_calls? ──No──> Done             β•‘
     β•‘         β”‚                                 β•‘
     β•‘        Yes                                β•‘
     β•‘         ↓                                 β•‘
     β•‘  Add assistant msg (with tool_calls)      β•‘
     β•‘         ↓                                 β•‘
     β•‘  Doom loop check                          β•‘
     β•‘         ↓                                 β•‘
     β•‘  For each tool_call:                      β•‘
     β•‘    β€’ Needs approval? ──Yes──> Wait for    β•‘
     β•‘    β”‚                         user confirm β•‘
     β•‘    No                                     β•‘
     β•‘    ↓                                      β•‘
     β•‘    β€’ ToolRouter.execute_tool()            β•‘
     β•‘    β€’ Add result to ContextManager         β•‘
     β•‘         ↓                                 β•‘
     β•‘  Continue loop ─────────────────┐         β•‘
     β•‘         ↑                       β”‚         β•‘
     β•‘         β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜         β•‘
     β•šβ•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•

Events

The agent emits the following events via event_queue:

  • processing - Starting to process user input
  • ready - Agent is ready for input
  • assistant_chunk - Streaming token chunk
  • assistant_message - Complete LLM response text
  • assistant_stream_end - Token stream finished
  • tool_call - Tool being called with arguments
  • tool_output - Tool execution result
  • tool_log - Informational tool log message
  • tool_state_change - Tool execution state transition
  • approval_required - Requesting user approval for sensitive operations
  • turn_complete - Agent finished processing
  • error - Error occurred during processing
  • interrupted - Agent was interrupted
  • compacted - Context was compacted
  • undo_complete - Undo operation completed
  • shutdown - Agent shutting down

Development

Pre-commit Checks

Run Ruff before every commit:

uv run ruff check .
uv run ruff format --check .

If the format check fails, run uv run ruff format . and re-run the checks before committing.

Adding Built-in Tools

Edit agent/core/tools.py:

def create_builtin_tools() -> list[ToolSpec]:
    return [
        ToolSpec(
            name="your_tool",
            description="What your tool does",
            parameters={
                "type": "object",
                "properties": {
                    "param": {"type": "string", "description": "Parameter description"}
                },
                "required": ["param"]
            },
            handler=your_async_handler
        ),
        # ... existing tools
    ]

Adding MCP Servers

Edit configs/cli_agent_config.json for CLI defaults, or configs/frontend_agent_config.json for web-session defaults:

{
  "model_name": "anthropic/claude-sonnet-4-5-20250929",
  "mcpServers": {
    "your-server-name": {
      "transport": "http",
      "url": "https://example.com/mcp",
      "headers": {
        "Authorization": "Bearer ${YOUR_TOKEN}"
      }
    }
  }
}

Note: Environment variables like ${YOUR_TOKEN} are auto-substituted from .env.

Cite ml-intern

If you use ml-intern in your work, please cite it by using the following BibTeX entry or similar.

@Misc{ml-intern,
  title =        {ml-intern: an agent that autonomously researches, writes, and ships good quality ML related code using the Hugging Face ecosystem},
  author =       {Aksel Joonas Reedi, Henri Bonamy, Yoan Di Cosmo, Leandro von Werra, Lewis Tunstall},
  howpublished = {\url{https://github.com/huggingface/ml-intern}},
  year =         {2026}
}

About

πŸ€— ml-intern: an open-source ML engineer that reads papers, trains models, and ships ML models

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors