Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
44 changes: 23 additions & 21 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -328,35 +328,37 @@ Please refer to [this document](docs/offline-data-preprocessing.md) for details

Model Name & Size | Model Architecture | Full Finetuning | Low Rank Adaptation (i.e. LoRA) | qLoRA(quantized LoRA) |
-------------------- | ---------------- | --------------- | ------------------------------- | --------------------- |
Granite PowerLM 3B | GraniteForCausalLM | ✅* | ✅* | ✅* |
Granite 3.1 1B | GraniteForCausalLM | ✔️* | ✔️* | ✔️* |
Granite 3.1 2B | GraniteForCausalLM | ✔️* | ✔️* | ✔️* |
Granite 3.1 3B | GraniteForCausalLM | ✔️* | ✔️* | ✔️* |
Granite 3.1 8B | GraniteForCausalLM | ✔️* | ✔️* | ✔️* |
Granite 3.0 2B | GraniteForCausalLM | ✔️* | ✔️* | ✔️* |
Granite 3.0 8B | GraniteForCausalLM | ✅* | ✅* | ✔️ |
GraniteMoE 1B | GraniteMoeForCausalLM | ✅ | ✅** | ? |
GraniteMoE 3B | GraniteMoeForCausalLM | ✅ | ✅** | ? |
Granite 3B | LlamawithCausalLM | ✅ | ✔️ | ✔️ |
Granite 8B | LlamawithCausalLM | ✅ | ✅ | ✅ |
[Granite 4.0 Tiny Preview](https://huggingface.co/ibm-granite/granite-4.0-tiny-preview) | GraniteMoeHybridForCausalLM | ✅**** | ✅**** | ? |
[Granite PowerLM 3B](https://huggingface.co/ibm-research/PowerLM-3b) | GraniteForCausalLM | * | * | * |
[Granite 3.1 1B](https://huggingface.co/ibm-granite/granite-3.1-1b-a400m-base) | GraniteForCausalLM | ✔️* | ✔️* | ✔️* |
[Granite 3.1 2B](https://huggingface.co/ibm-granite/granite-3.1-2b-base) | GraniteForCausalLM | ✔️* | ✔️* | ✔️* |
[Granite 3.1 8B](https://huggingface.co/ibm-granite/granite-3.1-8b-base) | GraniteForCausalLM | ✔️* | ✔️* | ✔️* |
[Granite 3.0 2B](https://huggingface.co/ibm-granite/granite-3.0-2b-base) | GraniteForCausalLM | ✔️* | ✔️* | ✔️* |
[Granite 3.0 8B](https://huggingface.co/ibm-granite/granite-3.0-8b-base) | GraniteForCausalLM | ✅* | ✅* | ✔️ |
[GraniteMoE 1B](https://huggingface.co/ibm-granite/granite-3.0-1b-a400m-base) | GraniteMoeForCausalLM | ✅ | ✅** | ? |
[GraniteMoE 3B](https://huggingface.co/ibm-granite/granite-3.0-3b-a800m-base) | GraniteMoeForCausalLM | ✅ | ✅** | ? |
[Granite 3B Code](https://huggingface.co/ibm-granite/granite-3b-code-base-2k) | LlamaForCausalLM | ✅ | ✔️ | ✔️ |
[Granite 8B Code](https://huggingface.co/ibm-granite/granite-8b-code-base-4k) | LlamaForCausalLM | ✅ | ✅ | ✅ |
Granite 13B | GPTBigCodeForCausalLM | ✅ | ✅ | ✔️ |
Granite 20B | GPTBigCodeForCausalLM | ✅ | ✔️ | ✔️ |
Granite 34B | GPTBigCodeForCausalLM | 🚫 | ✅ | ✅ |
Llama3.1-8B | LLaMA 3.1 | ✅*** | ✔️ | ✔️ |  
Llama3.1-70B(same architecture as llama3) | LLaMA 3.1 | 🚫 - same as Llama3-70B | ✔️ | ✔️ |
Llama3.1-405B | LLaMA 3.1 | 🚫 | 🚫 | ✅ |
Llama3-8B | LLaMA 3 | ✅ | ✅ | ✔️ |  
Llama3-70B | LLaMA 3 | 🚫 | ✅ | ✅ |
[Granite 34B Code](https://huggingface.co/ibm-granite/granite-34b-code-instruct-8k) | GPTBigCodeForCausalLM | 🚫 | ✅ | ✅ |
[Llama3.1-8B](https://huggingface.co/meta-llama/Llama-3.1-8B) | LlamaForCausalLM | ✅*** | ✔️ | ✔️ |  
[Llama3.1-70B](https://huggingface.co/meta-llama/Llama-3.1-70B)(same architecture as llama3) | LlamaForCausalLM | 🚫 - same as Llama3-70B | ✔️ | ✔️ |
[Llama3.1-405B](https://huggingface.co/meta-llama/Llama-3.1-405B) | LlamaForCausalLM | 🚫 | 🚫 | ✅ |
[Llama3-8B](https://huggingface.co/meta-llama/Meta-Llama-3-8B) | LlamaForCausalLM | ✅ | ✅ | ✔️ |  
[Llama3-70B](https://huggingface.co/meta-llama/Meta-Llama-3-70B) | LlamaForCausalLM | 🚫 | ✅ | ✅ |
aLLaM-13b | LlamaForCausalLM |  ✅ | ✅ | ✅ |
Mixtral 8x7B | Mixtral | ✅ | ✅ | ✅ |
Mistral-7b | Mistral | ✅ | ✅ | ✅ |  
Mistral large | Mistral | 🚫 | 🚫 | 🚫 |
[Mixtral 8x7B](https://huggingface.co/mistralai/Mixtral-8x7B-v0.1) | MixtralForCausalLM | ✅ | ✅ | ✅ |
[Mistral-7b](https://huggingface.co/mistralai/Mistral-7B-v0.1) | MistralForCausalLM | ✅ | ✅ | ✅ |  
Mistral large | MistralForCausalLM | 🚫 | 🚫 | 🚫 |

(*) - Supported with `fms-hf-tuning` v2.4.0 or later.
Comment thread
aluu317 marked this conversation as resolved.

(**) - Supported for q,k,v,o layers . `all-linear` target modules does not infer on vLLM yet.

(***) - Supported from platform up to 8k context length - same architecture as llama3-8b
(***) - Supported from platform up to 8k context length - same architecture as llama3-8b.

(****) - Experimentally supported. Dependent on stable transformers version with PR [#37658](https://github.com/huggingface/transformers/pull/37658) and accelerate >= 1.3.0.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@willmj did you create issues to update these two dependencies in order to support granite 4?


## Training

Expand Down