docs: Update model architecture in README by aluu317 · Pull Request #550 · foundation-model-stack/fms-hf-tuning

aluu317 · 2025-05-09T22:34:09Z

Description of the change

Related issue number

How to verify the PR

Was the PR tested

I have added >=1 unit test(s) for every new method I have added.
I have ensured all unit tests pass

Signed-off-by: Angel Luu <angel.luu@us.ibm.com>

github-actions · 2025-05-09T22:34:22Z

Thanks for making a pull request! 😃
One of the maintainers will review and advise on the next steps.

anhuong · 2025-05-10T01:13:01Z

+Granite 3B           | LlamaForCausalLM      | ✅ | ✔️  | ✔️ | 
+Granite 8B           | LlamaForCausalLM      | ✅ | ✅ | ✅ |


Please note that these are the Granite code models, something like

Suggested change

Granite 3B | LlamaForCausalLM | ✅ | ✔️ | ✔️ |

Granite 8B | LlamaForCausalLM | ✅ | ✅ | ✅ |

Granite 3B Code | LlamaForCausalLM | ✅ | ✔️ | ✔️ |

Granite 8B Code | LlamaForCausalLM | ✅ | ✅ | ✅ |

dushyantbehl

Can we please add possibly HF links to the README for each model we tested and support. This is to avoid confusion of which model we support.

Signed-off-by: Angel Luu <angel.luu@us.ibm.com>

dushyantbehl · 2025-05-14T11:46:22Z

-GraniteMoE 3B        | GraniteMoeForCausalLM  | ✅ | ✅** | ? |
-Granite 3B           | LlamawithCausalLM      | ✅ | ✔️  | ✔️ | 
-Granite 8B           | LlamawithCausalLM      | ✅ | ✅ | ✅ |
+[Granite 4.0 Tiny Preview](https://huggingface.co/ibm-granite/granite-4.0-tiny-preview) | GraniteMoeHybridForCausalLM | ✅ | ✅ | ? |


Have we already verified support for this model?
cc @kmehant @ashokponkumar

Yes, Will has verified tuning works. We can verify inference with vLLM when the required changes for it are merged into vLLM

@dushyantbehl Yes, see here

Co-authored-by: Will Johnson <mwjohnson728@gmail.com> Signed-off-by: Angel Luu <an317gel@gmail.com>

Signed-off-by: Angel Luu <angel.luu@us.ibm.com>

willmj

Nice! Thanks Angel

anhuong · 2025-05-14T16:21:02Z

-(***) - Supported from platform up to 8k context length - same architecture as llama3-8b
+(***) - Supported from platform up to 8k context length - same architecture as llama3-8b.
+
+(****) - Experimentally supported. Dependent on stable transformers version with PR [#37658](https://github.com/huggingface/transformers/pull/37658) and accelerate >= 1.3.0.


@willmj did you create issues to update these two dependencies in order to support granite 4?

anhuong · 2025-05-14T16:23:21Z

+[Llama3.1-8B](https://huggingface.co/meta-llama/Llama-3.1-8B)          | LLaMA 3.1              | ✅*** | ✔️ | ✔️ |  
+[Llama3.1-70B](https://huggingface.co/meta-llama/Llama-3.1-70B)(same architecture as llama3) | LLaMA 3.1 | 🚫 - same as Llama3-70B | ✔️  | ✔️ | 
+[Llama3.1-405B](https://huggingface.co/meta-llama/Llama-3.1-405B)                            | LLaMA 3.1 | 🚫 | 🚫 | ✅ | 
+[Llama3-8B](https://huggingface.co/meta-llama/Meta-Llama-3-8B)                               | LLaMA 3   | ✅ | ✅ | ✔️ |  


Why are we listing the llama model architectures as LLaMA 3/3.1 when this is already shown in the model name? Shouldn't these be model arch LlamaForCausalLM as it shows in the model's config.json?

good question! I didn't have the right token to view Llama in HF so couldn't see the config.json. I can update

anhuong · 2025-05-14T16:23:50Z

 aLLaM-13b                                 | LlamaForCausalLM |  ✅ | ✅ | ✅ |
-Mixtral 8x7B                              | Mixtral   | ✅ | ✅ | ✅ |
-Mistral-7b                                | Mistral   | ✅ | ✅ | ✅ |  
+[Mixtral 8x7B](https://huggingface.co/mistralai/Mixtral-8x7B-v0.1)                              | Mixtral   | ✅ | ✅ | ✅ |


Similarly the correct model architecture for Mixtral is MixtralForCausalLM

anhuong · 2025-05-14T16:24:13Z

-Mixtral 8x7B                              | Mixtral   | ✅ | ✅ | ✅ |
-Mistral-7b                                | Mistral   | ✅ | ✅ | ✅ |  
+[Mixtral 8x7B](https://huggingface.co/mistralai/Mixtral-8x7B-v0.1)                              | Mixtral   | ✅ | ✅ | ✅ |
+[Mistral-7b](https://huggingface.co/mistralai/Mistral-7B-v0.1)                                  | Mistral   | ✅ | ✅ | ✅ |  


And for Mistral it is MistralForCausalLM

anhuong · 2025-05-14T16:25:00Z

+[Granite 3B Code Base](https://huggingface.co/ibm-granite/granite-3b-code-base-2k)           | LlamaForCausalLM      | ✅ | ✔️  | ✔️ | 
+[Granite 8B Code Base](https://huggingface.co/ibm-granite/granite-8b-code-base-4k)           | LlamaForCausalLM      | ✅ | ✅ | ✅ |


I don't think we should be so specific as to distinguish base versus instruct models since we support both model types.

sure, I put it there because the link is linking to a Code Base model. But I agree that maybe we won't restrict the text to just one type. Updating.

Signed-off-by: Angel Luu <angel.luu@us.ibm.com>

anhuong

Many thanks for the excellent updates Angel!

Two others have reviewed this and this should not wait on Dushyant's review alone. He has been pinged and if more changes are needed we can add them after

* Update model architecture in README Signed-off-by: Angel Luu <angel.luu@us.ibm.com> * Update HF links for models Signed-off-by: Angel Luu <angel.luu@us.ibm.com> * Update comment for granite 4.0 support Co-authored-by: Will Johnson <mwjohnson728@gmail.com> Signed-off-by: Angel Luu <an317gel@gmail.com> * Update formatting for table Signed-off-by: Angel Luu <angel.luu@us.ibm.com> * Update model archs Signed-off-by: Angel Luu <angel.luu@us.ibm.com> --------- Signed-off-by: Angel Luu <angel.luu@us.ibm.com> Signed-off-by: Angel Luu <an317gel@gmail.com> Co-authored-by: Will Johnson <mwjohnson728@gmail.com>

Update model architecture in README

dacdc70

Signed-off-by: Angel Luu <angel.luu@us.ibm.com>

aluu317 requested review from anhuong, dushyantbehl, fabianlim and kmehant as code owners May 9, 2025 22:34

github-actions Bot added the docs label May 9, 2025

anhuong requested changes May 10, 2025

View reviewed changes

dushyantbehl previously requested changes May 10, 2025

View reviewed changes

aluu317 and others added 2 commits May 13, 2025 15:06

Update HF links for models

914226d

Signed-off-by: Angel Luu <angel.luu@us.ibm.com>

Merge branch 'main' into fix_README

2115eb4

dushyantbehl reviewed May 14, 2025

View reviewed changes

willmj reviewed May 14, 2025

View reviewed changes

Comment thread README.md Outdated

Comment thread README.md

aluu317 and others added 3 commits May 14, 2025 09:54

Update comment for granite 4.0 support

0a88fbb

Co-authored-by: Will Johnson <mwjohnson728@gmail.com> Signed-off-by: Angel Luu <an317gel@gmail.com>

Merge branch 'main' into fix_README

b21a1ae

Update formatting for table

514a9d3

Signed-off-by: Angel Luu <angel.luu@us.ibm.com>

willmj previously approved these changes May 14, 2025

View reviewed changes

anhuong reviewed May 14, 2025

View reviewed changes

Update model archs

37f05d5

Signed-off-by: Angel Luu <angel.luu@us.ibm.com>

aluu317 dismissed willmj’s stale review via 37f05d5 May 14, 2025 17:18

anhuong approved these changes May 15, 2025

View reviewed changes

anhuong merged commit 00781fc into foundation-model-stack:main May 20, 2025
9 checks passed

		Granite 3B \| LlamaForCausalLM \| ✅ \| ✔️ \| ✔️ \|
		Granite 8B \| LlamaForCausalLM \| ✅ \| ✅ \| ✅ \|

		[Granite 3B Code Base](https://huggingface.co/ibm-granite/granite-3b-code-base-2k) \| LlamaForCausalLM \| ✅ \| ✔️ \| ✔️ \|
		[Granite 8B Code Base](https://huggingface.co/ibm-granite/granite-8b-code-base-4k) \| LlamaForCausalLM \| ✅ \| ✅ \| ✅ \|

Conversation

aluu317 commented May 9, 2025

Description of the change

Related issue number

How to verify the PR

Was the PR tested

Uh oh!

github-actions Bot commented May 9, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dushyantbehl left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

willmj left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

anhuong left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants