Skip to content

Add simplified APIs for model obtaining maxtext models#3450

Merged
copybara-service[bot] merged 1 commit intomainfrom
anisha-from-pretrained3
Apr 20, 2026
Merged

Add simplified APIs for model obtaining maxtext models#3450
copybara-service[bot] merged 1 commit intomainfrom
anisha-from-pretrained3

Conversation

@A9isha
Copy link
Copy Markdown
Collaborator

@A9isha A9isha commented Mar 18, 2026

Description

Adding a simplified API of from_pretrained() which can work minimally as

For creating all the necessary configs and mesh for RL, we can use the APIs.

import maxtext as mt

config= mt.pyconfig(model_name="llama3.1-8b-Instruct")
model,mesh = mt.from_pretrained(config)

#OR
config= mt.pyconfig(model_name="llama3.1-8b-Instruct")
## your own mesh
model = mt.from_pretrained(config, mesh)


# OR for train_rl.py
trainer_config, sampler_config, trainer_devices, sampler_devices = mt.setup_configs_and_devices(model_name="llama3.1-8b-Instruct")

reference_model, reference_mesh, actor_model, actor_mesh, rollout_mesh = mt.create_models_and_meshes(
      trainer_config, sampler_config, trainer_devices, sampler_devices
  )

# OR regular invocation of train_rl.py

run_name=maz-8b-$RANDOM python3 -m src.maxtext.trainers.post_train.rl.train_rl \
 model_name=llama3.1-8b-Instruct run_name=$run_name\
    steps=4 rollout_tensor_parallelism=-1\
 rollout_data_parallelism=1 rollout_expert_parallelism=1\
 test_batch_start_index=10  num_test_batches=15 
base_output_directory=/path/to/say/gcs/bucket

# OR standalone script where you want to invoke 

    MAXTEXT_CONFIGS_DIR = "src/maxtext/configs"
    import maxtext as mt

    config= mt.pyconfig(model_name=MAXTEXT_MODEL_VERSION, 
    hf_access_token=args.hf_token,
    base_output_directory="gs://path/to/save/artifacts",
    base_config=f"{MAXTEXT_CONFIGS_DIR}/post_train/rl.yml",
    )
    qwen2_actor, _= mt.from_pretrained(config) # ref_mesh and train_mesh are the same for us
  else:
    qwen2_actor = params_lib.create_model_from_safe_tensors(
        MODEL_PATH, config, trainer_mesh, dtype=MODEL_DTYPE
    )

FIXES: b/492376313

Tests

Ran locally

Checklist

Before submitting this PR, please make sure (put X in square brackets):

  • I have performed a self-review of my code. For an optional AI review, add the gemini-review label.
  • I have necessary comments in my code, particularly in hard-to-understand areas.
  • I have run end-to-end tests tests and provided workload links above if applicable.
  • I have made or will make corresponding changes to the doc if needed, including adding new documentation pages to the relevant Table of Contents (toctree directive) as explained in our documentation.

@codecov
Copy link
Copy Markdown

codecov Bot commented Mar 18, 2026

Codecov Report

❌ Patch coverage is 47.41379% with 61 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
src/maxtext/utils/model_creation_utils.py 48.11% 46 Missing and 9 partials ⚠️
src/maxtext/trainers/post_train/rl/train_rl.py 33.33% 6 Missing ⚠️

📢 Thoughts on this report? Let us know!

@A9isha A9isha changed the title Anisha from pretrained3 Add simplified APIs for model obtaining maxtext models Mar 18, 2026
@A9isha A9isha force-pushed the anisha-from-pretrained3 branch from 1e13fdc to ca7b4be Compare March 23, 2026 22:29
@A9isha A9isha requested a review from abhinavclemson as a code owner April 9, 2026 23:33
@A9isha A9isha closed this Apr 10, 2026
@A9isha A9isha force-pushed the anisha-from-pretrained3 branch from 00de2a3 to 41a4e9d Compare April 10, 2026 06:16
@A9isha A9isha reopened this Apr 10, 2026
@A9isha A9isha force-pushed the anisha-from-pretrained3 branch 4 times, most recently from d47d478 to 5f6f563 Compare April 10, 2026 07:08
Copy link
Copy Markdown
Collaborator

@wang2yn84 wang2yn84 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you very much!

Comment thread src/maxtext/configs/pyconfig.py
Comment thread src/maxtext/utils/model_creation_utils.py
Comment thread src/maxtext/integration/vllm/maxtext_vllm_adapter/adapter.py Outdated
Comment thread src/maxtext/utils/model_creation_utils.py
Comment thread src/maxtext/utils/model_creation_utils.py
Comment thread src/maxtext/utils/model_creation_utils.py Outdated
Comment thread src/maxtext/utils/model_creation_utils.py Outdated
@A9isha A9isha force-pushed the anisha-from-pretrained3 branch 2 times, most recently from c4097a8 to ff01066 Compare April 15, 2026 06:49
Comment thread src/maxtext/configs/pyconfig.py Outdated
Comment thread src/maxtext/configs/pyconfig.py Outdated
Comment thread src/maxtext/utils/model_creation_utils.py
@A9isha A9isha force-pushed the anisha-from-pretrained3 branch from ff01066 to 8195563 Compare April 15, 2026 20:20
@A9isha A9isha force-pushed the anisha-from-pretrained3 branch from 8195563 to ef03866 Compare April 15, 2026 20:46
Copy link
Copy Markdown
Collaborator

@richjames0 richjames0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@copybara-service copybara-service Bot merged commit 5182e3b into main Apr 20, 2026
49 of 50 checks passed
@copybara-service copybara-service Bot deleted the anisha-from-pretrained3 branch April 20, 2026 22:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants