Skip to content
Merged
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
38 changes: 19 additions & 19 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -328,28 +328,28 @@ Please refer to [this document](docs/offline-data-preprocessing.md) for details

Model Name & Size | Model Architecture | Full Finetuning | Low Rank Adaptation (i.e. LoRA) | qLoRA(quantized LoRA) |
-------------------- | ---------------- | --------------- | ------------------------------- | --------------------- |
Granite PowerLM 3B | GraniteForCausalLM | ✅* | ✅* | ✅* |
Granite 3.1 1B | GraniteForCausalLM | ✔️* | ✔️* | ✔️* |
Granite 3.1 2B | GraniteForCausalLM | ✔️* | ✔️* | ✔️* |
Granite 3.1 3B | GraniteForCausalLM | ✔️* | ✔️* | ✔️* |
Granite 3.1 8B | GraniteForCausalLM | ✔️* | ✔️* | ✔️* |
Granite 3.0 2B | GraniteForCausalLM | ✔️* | ✔️* | ✔️* |
Granite 3.0 8B | GraniteForCausalLM | ✅* | ✅* | ✔️ |
GraniteMoE 1B | GraniteMoeForCausalLM | ✅ | ✅** | ? |
GraniteMoE 3B | GraniteMoeForCausalLM | ✅ | ✅** | ? |
Granite 3B | LlamawithCausalLM | ✅ | ✔️ | ✔️ |
Granite 8B | LlamawithCausalLM | ✅ | ✅ | ✅ |
[Granite 4.0 Tiny Preview](https://huggingface.co/ibm-granite/granite-4.0-tiny-preview) | GraniteMoeHybridForCausalLM | ✅ | ✅ | ? |
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have we already verified support for this model?
cc @kmehant @ashokponkumar

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, Will has verified tuning works. We can verify inference with vLLM when the required changes for it are merged into vLLM

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dushyantbehl Yes, see here

Comment thread
aluu317 marked this conversation as resolved.
Outdated
[Granite PowerLM 3B](https://huggingface.co/ibm-research/PowerLM-3b) | GraniteForCausalLM | * | * | * |
[Granite 3.1 1B](https://huggingface.co/ibm-granite/granite-3.1-1b-a400m-base) | GraniteForCausalLM | ✔️* | ✔️* | ✔️* |
[Granite 3.1 2B](https://huggingface.co/ibm-granite/granite-3.1-2b-base) | GraniteForCausalLM | ✔️* | ✔️* | ✔️* |
[Granite 3.1 8B](https://huggingface.co/ibm-granite/granite-3.1-8b-base) | GraniteForCausalLM | ✔️* | ✔️* | ✔️* |
[Granite 3.0 2B](https://huggingface.co/ibm-granite/granite-3.0-2b-base) | GraniteForCausalLM | ✔️* | ✔️* | ✔️* |
[Granite 3.0 8B](https://huggingface.co/ibm-granite/granite-3.0-8b-base) | GraniteForCausalLM | ✅* | ✅* | ✔️ |
[GraniteMoE 1B](https://huggingface.co/ibm-granite/granite-3.0-1b-a400m-base) | GraniteMoeForCausalLM | ✅ | ✅** | ? |
[GraniteMoE 3B](https://huggingface.co/ibm-granite/granite-3.0-3b-a800m-base) | GraniteMoeForCausalLM | ✅ | ✅** | ? |
[Granite 3B Code Base](https://huggingface.co/ibm-granite/granite-3b-code-base-2k) | LlamaForCausalLM | ✅ | ✔️ | ✔️ |
[Granite 8B Code Base](https://huggingface.co/ibm-granite/granite-8b-code-base-4k) | LlamaForCausalLM | ✅ | ✅ | ✅ |
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should be so specific as to distinguish base versus instruct models since we support both model types.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure, I put it there because the link is linking to a Code Base model. But I agree that maybe we won't restrict the text to just one type. Updating.

Granite 13B | GPTBigCodeForCausalLM | ✅ | ✅ | ✔️ |
Granite 20B | GPTBigCodeForCausalLM | ✅ | ✔️ | ✔️ |
Granite 34B | GPTBigCodeForCausalLM | 🚫 | ✅ | ✅ |
Llama3.1-8B | LLaMA 3.1 | ✅*** | ✔️ | ✔️ |  
Llama3.1-70B(same architecture as llama3) | LLaMA 3.1 | 🚫 - same as Llama3-70B | ✔️ | ✔️ |
Llama3.1-405B | LLaMA 3.1 | 🚫 | 🚫 | ✅ |
Llama3-8B | LLaMA 3 | ✅ | ✅ | ✔️ |  
Llama3-70B | LLaMA 3 | 🚫 | ✅ | ✅ |
[Granite 34B Code Instruct](https://huggingface.co/ibm-granite/granite-34b-code-instruct-8k) | GPTBigCodeForCausalLM | 🚫 | ✅ | ✅ |
[Llama3.1-8B](https://huggingface.co/meta-llama/Llama-3.1-8B) | LLaMA 3.1 | ✅*** | ✔️ | ✔️ |  
[Llama3.1-70B](https://huggingface.co/meta-llama/Llama-3.1-70B)(same architecture as llama3) | LLaMA 3.1 | 🚫 - same as Llama3-70B | ✔️ | ✔️ |
[Llama3.1-405B](https://huggingface.co/meta-llama/Llama-3.1-405B) | LLaMA 3.1 | 🚫 | 🚫 | ✅ |
[Llama3-8B](https://huggingface.co/meta-llama/Meta-Llama-3-8B) | LLaMA 3 | ✅ | ✅ | ✔️ |  
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we listing the llama model architectures as LLaMA 3/3.1 when this is already shown in the model name? Shouldn't these be model arch LlamaForCausalLM as it shows in the model's config.json?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good question! I didn't have the right token to view Llama in HF so couldn't see the config.json. I can update

[Llama3-70B](https://huggingface.co/meta-llama/Meta-Llama-3-70B) | LLaMA 3 | 🚫 | ✅ | ✅ |
aLLaM-13b | LlamaForCausalLM |  ✅ | ✅ | ✅ |
Mixtral 8x7B | Mixtral | ✅ | ✅ | ✅ |
Mistral-7b | Mistral | ✅ | ✅ | ✅ |  
[Mixtral 8x7B](https://huggingface.co/mistralai/Mixtral-8x7B-v0.1) | Mixtral | ✅ | ✅ | ✅ |
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similarly the correct model architecture for Mixtral is MixtralForCausalLM

[Mistral-7b](https://huggingface.co/mistralai/Mistral-7B-v0.1) | Mistral | ✅ | ✅ | ✅ |  
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And for Mistral it is MistralForCausalLM

Mistral large | Mistral | 🚫 | 🚫 | 🚫 |

(*) - Supported with `fms-hf-tuning` v2.4.0 or later.
Comment thread
aluu317 marked this conversation as resolved.
Expand Down