-
Notifications
You must be signed in to change notification settings - Fork 66
docs: Update model architecture in README #550
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 3 commits
dacdc70
914226d
2115eb4
0a88fbb
b21a1ae
514a9d3
37f05d5
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -328,28 +328,28 @@ Please refer to [this document](docs/offline-data-preprocessing.md) for details | |
|
|
||
| Model Name & Size | Model Architecture | Full Finetuning | Low Rank Adaptation (i.e. LoRA) | qLoRA(quantized LoRA) | | ||
| -------------------- | ---------------- | --------------- | ------------------------------- | --------------------- | | ||
| Granite PowerLM 3B | GraniteForCausalLM | ✅* | ✅* | ✅* | | ||
| Granite 3.1 1B | GraniteForCausalLM | ✔️* | ✔️* | ✔️* | | ||
| Granite 3.1 2B | GraniteForCausalLM | ✔️* | ✔️* | ✔️* | | ||
| Granite 3.1 3B | GraniteForCausalLM | ✔️* | ✔️* | ✔️* | | ||
| Granite 3.1 8B | GraniteForCausalLM | ✔️* | ✔️* | ✔️* | | ||
| Granite 3.0 2B | GraniteForCausalLM | ✔️* | ✔️* | ✔️* | | ||
| Granite 3.0 8B | GraniteForCausalLM | ✅* | ✅* | ✔️ | | ||
| GraniteMoE 1B | GraniteMoeForCausalLM | ✅ | ✅** | ? | | ||
| GraniteMoE 3B | GraniteMoeForCausalLM | ✅ | ✅** | ? | | ||
| Granite 3B | LlamawithCausalLM | ✅ | ✔️ | ✔️ | | ||
| Granite 8B | LlamawithCausalLM | ✅ | ✅ | ✅ | | ||
| [Granite 4.0 Tiny Preview](https://huggingface.co/ibm-granite/granite-4.0-tiny-preview) | GraniteMoeHybridForCausalLM | ✅ | ✅ | ? | | ||
|
aluu317 marked this conversation as resolved.
Outdated
|
||
| [Granite PowerLM 3B](https://huggingface.co/ibm-research/PowerLM-3b) | GraniteForCausalLM | ✅* | ✅* | ✅* | | ||
| [Granite 3.1 1B](https://huggingface.co/ibm-granite/granite-3.1-1b-a400m-base) | GraniteForCausalLM | ✔️* | ✔️* | ✔️* | | ||
| [Granite 3.1 2B](https://huggingface.co/ibm-granite/granite-3.1-2b-base) | GraniteForCausalLM | ✔️* | ✔️* | ✔️* | | ||
| [Granite 3.1 8B](https://huggingface.co/ibm-granite/granite-3.1-8b-base) | GraniteForCausalLM | ✔️* | ✔️* | ✔️* | | ||
| [Granite 3.0 2B](https://huggingface.co/ibm-granite/granite-3.0-2b-base) | GraniteForCausalLM | ✔️* | ✔️* | ✔️* | | ||
| [Granite 3.0 8B](https://huggingface.co/ibm-granite/granite-3.0-8b-base) | GraniteForCausalLM | ✅* | ✅* | ✔️ | | ||
| [GraniteMoE 1B](https://huggingface.co/ibm-granite/granite-3.0-1b-a400m-base) | GraniteMoeForCausalLM | ✅ | ✅** | ? | | ||
| [GraniteMoE 3B](https://huggingface.co/ibm-granite/granite-3.0-3b-a800m-base) | GraniteMoeForCausalLM | ✅ | ✅** | ? | | ||
| [Granite 3B Code Base](https://huggingface.co/ibm-granite/granite-3b-code-base-2k) | LlamaForCausalLM | ✅ | ✔️ | ✔️ | | ||
| [Granite 8B Code Base](https://huggingface.co/ibm-granite/granite-8b-code-base-4k) | LlamaForCausalLM | ✅ | ✅ | ✅ | | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't think we should be so specific as to distinguish base versus instruct models since we support both model types.
Collaborator
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. sure, I put it there because the link is linking to a Code Base model. But I agree that maybe we won't restrict the text to just one type. Updating. |
||
| Granite 13B | GPTBigCodeForCausalLM | ✅ | ✅ | ✔️ | | ||
| Granite 20B | GPTBigCodeForCausalLM | ✅ | ✔️ | ✔️ | | ||
| Granite 34B | GPTBigCodeForCausalLM | 🚫 | ✅ | ✅ | | ||
| Llama3.1-8B | LLaMA 3.1 | ✅*** | ✔️ | ✔️ | | ||
| Llama3.1-70B(same architecture as llama3) | LLaMA 3.1 | 🚫 - same as Llama3-70B | ✔️ | ✔️ | | ||
| Llama3.1-405B | LLaMA 3.1 | 🚫 | 🚫 | ✅ | | ||
| Llama3-8B | LLaMA 3 | ✅ | ✅ | ✔️ | | ||
| Llama3-70B | LLaMA 3 | 🚫 | ✅ | ✅ | | ||
| [Granite 34B Code Instruct](https://huggingface.co/ibm-granite/granite-34b-code-instruct-8k) | GPTBigCodeForCausalLM | 🚫 | ✅ | ✅ | | ||
| [Llama3.1-8B](https://huggingface.co/meta-llama/Llama-3.1-8B) | LLaMA 3.1 | ✅*** | ✔️ | ✔️ | | ||
| [Llama3.1-70B](https://huggingface.co/meta-llama/Llama-3.1-70B)(same architecture as llama3) | LLaMA 3.1 | 🚫 - same as Llama3-70B | ✔️ | ✔️ | | ||
| [Llama3.1-405B](https://huggingface.co/meta-llama/Llama-3.1-405B) | LLaMA 3.1 | 🚫 | 🚫 | ✅ | | ||
| [Llama3-8B](https://huggingface.co/meta-llama/Meta-Llama-3-8B) | LLaMA 3 | ✅ | ✅ | ✔️ | | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why are we listing the llama model architectures as LLaMA 3/3.1 when this is already shown in the model name? Shouldn't these be model arch
Collaborator
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. good question! I didn't have the right token to view Llama in HF so couldn't see the |
||
| [Llama3-70B](https://huggingface.co/meta-llama/Meta-Llama-3-70B) | LLaMA 3 | 🚫 | ✅ | ✅ | | ||
| aLLaM-13b | LlamaForCausalLM | ✅ | ✅ | ✅ | | ||
| Mixtral 8x7B | Mixtral | ✅ | ✅ | ✅ | | ||
| Mistral-7b | Mistral | ✅ | ✅ | ✅ | | ||
| [Mixtral 8x7B](https://huggingface.co/mistralai/Mixtral-8x7B-v0.1) | Mixtral | ✅ | ✅ | ✅ | | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Similarly the correct model architecture for Mixtral is |
||
| [Mistral-7b](https://huggingface.co/mistralai/Mistral-7B-v0.1) | Mistral | ✅ | ✅ | ✅ | | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. And for Mistral it is |
||
| Mistral large | Mistral | 🚫 | 🚫 | 🚫 | | ||
|
|
||
| (*) - Supported with `fms-hf-tuning` v2.4.0 or later. | ||
|
aluu317 marked this conversation as resolved.
|
||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have we already verified support for this model?
cc @kmehant @ashokponkumar
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, Will has verified tuning works. We can verify inference with vLLM when the required changes for it are merged into vLLM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dushyantbehl Yes, see here