Skip to content

docs: Update model architecture in README#550

Merged
anhuong merged 7 commits into
foundation-model-stack:mainfrom
aluu317:fix_README
May 20, 2025
Merged

docs: Update model architecture in README#550
anhuong merged 7 commits into
foundation-model-stack:mainfrom
aluu317:fix_README

Conversation

@aluu317
Copy link
Copy Markdown
Collaborator

@aluu317 aluu317 commented May 9, 2025

Description of the change

Related issue number

How to verify the PR

Was the PR tested

  • I have added >=1 unit test(s) for every new method I have added.
  • I have ensured all unit tests pass

Signed-off-by: Angel Luu <angel.luu@us.ibm.com>
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 9, 2025

Thanks for making a pull request! 😃
One of the maintainers will review and advise on the next steps.

@github-actions github-actions Bot added the docs label May 9, 2025
Comment thread README.md Outdated
Comment on lines +340 to +341
Granite 3B | LlamaForCausalLM | ✅ | ✔️ | ✔️ |
Granite 8B | LlamaForCausalLM | ✅ | ✅ | ✅ |
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please note that these are the Granite code models, something like

Suggested change
Granite 3B | LlamaForCausalLM | ✅ | ✔️ | ✔️ |
Granite 8B | LlamaForCausalLM | ✅ | ✅ | ✅ |
Granite 3B Code | LlamaForCausalLM | ✅ | ✔️ | ✔️ |
Granite 8B Code | LlamaForCausalLM | ✅ | ✅ | ✅ |

Copy link
Copy Markdown
Collaborator

@dushyantbehl dushyantbehl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we please add possibly HF links to the README for each model we tested and support. This is to avoid confusion of which model we support.

aluu317 and others added 2 commits May 13, 2025 15:06
Signed-off-by: Angel Luu <angel.luu@us.ibm.com>
Comment thread README.md Outdated
GraniteMoE 3B | GraniteMoeForCausalLM | ✅ | ✅** | ? |
Granite 3B | LlamawithCausalLM | ✅ | ✔️ | ✔️ |
Granite 8B | LlamawithCausalLM | ✅ | ✅ | ✅ |
[Granite 4.0 Tiny Preview](https://huggingface.co/ibm-granite/granite-4.0-tiny-preview) | GraniteMoeHybridForCausalLM | ✅ | ✅ | ? |
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have we already verified support for this model?
cc @kmehant @ashokponkumar

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, Will has verified tuning works. We can verify inference with vLLM when the required changes for it are merged into vLLM

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dushyantbehl Yes, see here

Comment thread README.md Outdated
Comment thread README.md
aluu317 and others added 3 commits May 14, 2025 09:54
Co-authored-by: Will Johnson <mwjohnson728@gmail.com>
Signed-off-by: Angel Luu <an317gel@gmail.com>
Signed-off-by: Angel Luu <angel.luu@us.ibm.com>
willmj
willmj previously approved these changes May 14, 2025
Copy link
Copy Markdown
Collaborator

@willmj willmj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! Thanks Angel

Comment thread README.md
(***) - Supported from platform up to 8k context length - same architecture as llama3-8b
(***) - Supported from platform up to 8k context length - same architecture as llama3-8b.

(****) - Experimentally supported. Dependent on stable transformers version with PR [#37658](https://github.com/huggingface/transformers/pull/37658) and accelerate >= 1.3.0.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@willmj did you create issues to update these two dependencies in order to support granite 4?

Comment thread README.md Outdated
Comment on lines +345 to +348
[Llama3.1-8B](https://huggingface.co/meta-llama/Llama-3.1-8B) | LLaMA 3.1 | ✅*** | ✔️ | ✔️ |  
[Llama3.1-70B](https://huggingface.co/meta-llama/Llama-3.1-70B)(same architecture as llama3) | LLaMA 3.1 | 🚫 - same as Llama3-70B | ✔️ | ✔️ |
[Llama3.1-405B](https://huggingface.co/meta-llama/Llama-3.1-405B) | LLaMA 3.1 | 🚫 | 🚫 | ✅ |
[Llama3-8B](https://huggingface.co/meta-llama/Meta-Llama-3-8B) | LLaMA 3 | ✅ | ✅ | ✔️ |  
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we listing the llama model architectures as LLaMA 3/3.1 when this is already shown in the model name? Shouldn't these be model arch LlamaForCausalLM as it shows in the model's config.json?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good question! I didn't have the right token to view Llama in HF so couldn't see the config.json. I can update

Comment thread README.md Outdated
aLLaM-13b | LlamaForCausalLM |  ✅ | ✅ | ✅ |
Mixtral 8x7B | Mixtral | ✅ | ✅ | ✅ |
Mistral-7b | Mistral | ✅ | ✅ | ✅ |  
[Mixtral 8x7B](https://huggingface.co/mistralai/Mixtral-8x7B-v0.1) | Mixtral | ✅ | ✅ | ✅ |
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similarly the correct model architecture for Mixtral is MixtralForCausalLM

Comment thread README.md Outdated
Mixtral 8x7B | Mixtral | ✅ | ✅ | ✅ |
Mistral-7b | Mistral | ✅ | ✅ | ✅ |  
[Mixtral 8x7B](https://huggingface.co/mistralai/Mixtral-8x7B-v0.1) | Mixtral | ✅ | ✅ | ✅ |
[Mistral-7b](https://huggingface.co/mistralai/Mistral-7B-v0.1) | Mistral | ✅ | ✅ | ✅ |  
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And for Mistral it is MistralForCausalLM

Comment thread README.md Outdated
Comment on lines +340 to +341
[Granite 3B Code Base](https://huggingface.co/ibm-granite/granite-3b-code-base-2k) | LlamaForCausalLM | ✅ | ✔️ | ✔️ |
[Granite 8B Code Base](https://huggingface.co/ibm-granite/granite-8b-code-base-4k) | LlamaForCausalLM | ✅ | ✅ | ✅ |
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should be so specific as to distinguish base versus instruct models since we support both model types.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure, I put it there because the link is linking to a Code Base model. But I agree that maybe we won't restrict the text to just one type. Updating.

Signed-off-by: Angel Luu <angel.luu@us.ibm.com>
Copy link
Copy Markdown
Collaborator

@anhuong anhuong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Many thanks for the excellent updates Angel!

@anhuong anhuong dismissed dushyantbehl’s stale review May 20, 2025 16:08

Two others have reviewed this and this should not wait on Dushyant's review alone. He has been pinged and if more changes are needed we can add them after

@anhuong anhuong merged commit 00781fc into foundation-model-stack:main May 20, 2025
9 checks passed
dushyantbehl pushed a commit to dushyantbehl/fms-hf-tuning that referenced this pull request Jun 23, 2025
* Update model architecture in README

Signed-off-by: Angel Luu <angel.luu@us.ibm.com>

* Update HF links for models

Signed-off-by: Angel Luu <angel.luu@us.ibm.com>

* Update comment for granite 4.0 support

Co-authored-by: Will Johnson <mwjohnson728@gmail.com>
Signed-off-by: Angel Luu <an317gel@gmail.com>

* Update formatting for table

Signed-off-by: Angel Luu <angel.luu@us.ibm.com>

* Update model archs

Signed-off-by: Angel Luu <angel.luu@us.ibm.com>

---------

Signed-off-by: Angel Luu <angel.luu@us.ibm.com>
Signed-off-by: Angel Luu <an317gel@gmail.com>
Co-authored-by: Will Johnson <mwjohnson728@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants