docs: Update model architecture in README#550
Conversation
Signed-off-by: Angel Luu <angel.luu@us.ibm.com>
|
Thanks for making a pull request! 😃 |
| Granite 3B | LlamaForCausalLM | ✅ | ✔️ | ✔️ | | ||
| Granite 8B | LlamaForCausalLM | ✅ | ✅ | ✅ | |
There was a problem hiding this comment.
Please note that these are the Granite code models, something like
| Granite 3B | LlamaForCausalLM | ✅ | ✔️ | ✔️ | | |
| Granite 8B | LlamaForCausalLM | ✅ | ✅ | ✅ | | |
| Granite 3B Code | LlamaForCausalLM | ✅ | ✔️ | ✔️ | | |
| Granite 8B Code | LlamaForCausalLM | ✅ | ✅ | ✅ | |
dushyantbehl
left a comment
There was a problem hiding this comment.
Can we please add possibly HF links to the README for each model we tested and support. This is to avoid confusion of which model we support.
Signed-off-by: Angel Luu <angel.luu@us.ibm.com>
| GraniteMoE 3B | GraniteMoeForCausalLM | ✅ | ✅** | ? | | ||
| Granite 3B | LlamawithCausalLM | ✅ | ✔️ | ✔️ | | ||
| Granite 8B | LlamawithCausalLM | ✅ | ✅ | ✅ | | ||
| [Granite 4.0 Tiny Preview](https://huggingface.co/ibm-granite/granite-4.0-tiny-preview) | GraniteMoeHybridForCausalLM | ✅ | ✅ | ? | |
There was a problem hiding this comment.
Have we already verified support for this model?
cc @kmehant @ashokponkumar
There was a problem hiding this comment.
Yes, Will has verified tuning works. We can verify inference with vLLM when the required changes for it are merged into vLLM
Co-authored-by: Will Johnson <mwjohnson728@gmail.com> Signed-off-by: Angel Luu <an317gel@gmail.com>
Signed-off-by: Angel Luu <angel.luu@us.ibm.com>
| (***) - Supported from platform up to 8k context length - same architecture as llama3-8b | ||
| (***) - Supported from platform up to 8k context length - same architecture as llama3-8b. | ||
|
|
||
| (****) - Experimentally supported. Dependent on stable transformers version with PR [#37658](https://github.com/huggingface/transformers/pull/37658) and accelerate >= 1.3.0. |
There was a problem hiding this comment.
@willmj did you create issues to update these two dependencies in order to support granite 4?
| [Llama3.1-8B](https://huggingface.co/meta-llama/Llama-3.1-8B) | LLaMA 3.1 | ✅*** | ✔️ | ✔️ | | ||
| [Llama3.1-70B](https://huggingface.co/meta-llama/Llama-3.1-70B)(same architecture as llama3) | LLaMA 3.1 | 🚫 - same as Llama3-70B | ✔️ | ✔️ | | ||
| [Llama3.1-405B](https://huggingface.co/meta-llama/Llama-3.1-405B) | LLaMA 3.1 | 🚫 | 🚫 | ✅ | | ||
| [Llama3-8B](https://huggingface.co/meta-llama/Meta-Llama-3-8B) | LLaMA 3 | ✅ | ✅ | ✔️ | |
There was a problem hiding this comment.
Why are we listing the llama model architectures as LLaMA 3/3.1 when this is already shown in the model name? Shouldn't these be model arch LlamaForCausalLM as it shows in the model's config.json?
There was a problem hiding this comment.
good question! I didn't have the right token to view Llama in HF so couldn't see the config.json. I can update
| aLLaM-13b | LlamaForCausalLM | ✅ | ✅ | ✅ | | ||
| Mixtral 8x7B | Mixtral | ✅ | ✅ | ✅ | | ||
| Mistral-7b | Mistral | ✅ | ✅ | ✅ | | ||
| [Mixtral 8x7B](https://huggingface.co/mistralai/Mixtral-8x7B-v0.1) | Mixtral | ✅ | ✅ | ✅ | |
There was a problem hiding this comment.
Similarly the correct model architecture for Mixtral is MixtralForCausalLM
| Mixtral 8x7B | Mixtral | ✅ | ✅ | ✅ | | ||
| Mistral-7b | Mistral | ✅ | ✅ | ✅ | | ||
| [Mixtral 8x7B](https://huggingface.co/mistralai/Mixtral-8x7B-v0.1) | Mixtral | ✅ | ✅ | ✅ | | ||
| [Mistral-7b](https://huggingface.co/mistralai/Mistral-7B-v0.1) | Mistral | ✅ | ✅ | ✅ | |
There was a problem hiding this comment.
And for Mistral it is MistralForCausalLM
| [Granite 3B Code Base](https://huggingface.co/ibm-granite/granite-3b-code-base-2k) | LlamaForCausalLM | ✅ | ✔️ | ✔️ | | ||
| [Granite 8B Code Base](https://huggingface.co/ibm-granite/granite-8b-code-base-4k) | LlamaForCausalLM | ✅ | ✅ | ✅ | |
There was a problem hiding this comment.
I don't think we should be so specific as to distinguish base versus instruct models since we support both model types.
There was a problem hiding this comment.
sure, I put it there because the link is linking to a Code Base model. But I agree that maybe we won't restrict the text to just one type. Updating.
Signed-off-by: Angel Luu <angel.luu@us.ibm.com>
anhuong
left a comment
There was a problem hiding this comment.
Many thanks for the excellent updates Angel!
Two others have reviewed this and this should not wait on Dushyant's review alone. He has been pinged and if more changes are needed we can add them after
* Update model architecture in README Signed-off-by: Angel Luu <angel.luu@us.ibm.com> * Update HF links for models Signed-off-by: Angel Luu <angel.luu@us.ibm.com> * Update comment for granite 4.0 support Co-authored-by: Will Johnson <mwjohnson728@gmail.com> Signed-off-by: Angel Luu <an317gel@gmail.com> * Update formatting for table Signed-off-by: Angel Luu <angel.luu@us.ibm.com> * Update model archs Signed-off-by: Angel Luu <angel.luu@us.ibm.com> --------- Signed-off-by: Angel Luu <angel.luu@us.ibm.com> Signed-off-by: Angel Luu <an317gel@gmail.com> Co-authored-by: Will Johnson <mwjohnson728@gmail.com>
Description of the change
Related issue number
How to verify the PR
Was the PR tested