You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
## How to use Foundation Model Stack (FMS) on AIU hardware
143
+
The [scripts](https://github.com/foundation-model-stack/aiu-fms-testing-utils/tree/main/scripts) directory provides various scripts to use FMS on AIU hardware for many use cases. These scripts provide robust support for passing desired command line options for running encoder and decoder models along with other use cases. Refer to the documentation on [using different scripts](https://github.com/foundation-model-stack/aiu-fms-testing-utils/blob/main/scripts/README.md) for more details.
143
144
144
-
Tensor parallel execution is only supported on the AIU through the [Foundation Model Stack](https://github.com/foundation-model-stack/foundation-model-stack).
145
-
146
-
The `--nproc-per-node`command line option controls the number of AIUs to use (number of parallel processes).
147
-
148
-
### Small Toy
149
-
150
-
The `small-toy.py` is a slimmed down version of the Big Toy model. The purpose of this model is to demonstrate how to run a tensor parallel model with the FMS on AIU hardware.
The `roberta.py` is a simple version of the Roberta model. The purpose of this model is to demonstrate how to run a tensor parallel model with the FMS on AIU hardware.
191
-
192
-
**Note**: We need to disable the Tensor Parallel `Embedding` conversion to avoid the use of a `torch.distributed` interface that `gloo` does not support. Namely `torch.ops._c10d_functional.all_gather_into_tensor`. The `roberta.py` script will set the following envar to avoid the problematic conversion. This will be removed in a future PyTorch release.
# Run a llama 194m model, grab the example inputs in a folder, grab validation text from a folder, validate token equivalency (will only validate up to max(max_new_tokens, tokens_in_validation_file)):
To run a logits-based validation, pass `--validation_level=1` to the validation script. This will check for the logits output to match at every step of the model through cross-entropy loss.
279
-
You can control the acceptable threshold with `--logits_loss_threshold`
145
+
The [examples](https://github.com/foundation-model-stack/aiu-fms-testing-utils/tree/main/examples) directory provides small examples aimed at helping understand the general workflow of running a model using FMS on AIU hardware.
# Small examples of using Foundation Model Stack (FMS) on AIU hardware
2
+
3
+
The [scripts](https://github.com/foundation-model-stack/aiu-fms-testing-utils/tree/main/scripts) directory provides robust scripts allowing users to pass various command-line options. You should use them according to your use case. However, considering they are robust and bigger, it can be difficult to follow the flow quickly. The examples provided here serve a short workflow, helping users quickly understand how to run FMS on AIU hardware.
4
+
5
+
We will walk through an example of running the IBM Granite model with the AIU backend.
6
+
The first step is to make sure that the required libraries are available, which includes [aiu-fms-testing-utils](https://github.com/foundation-model-stack/aiu-fms-testing-utils), [fms](https://github.com/foundation-model-stack/foundation-model-stack), [HF Transformers](https://huggingface.co/docs/hub/en/transformers), [torch](https://pytorch.org/get-started/locally/) and torch_sendnn. Depending on your use case, you may need other libraries as well.
7
+
8
+
For our example code, we will use the following libraries.
9
+
```python
10
+
import math
11
+
import os
12
+
import torch
13
+
14
+
from aiu_fms_testing_utils.utils import warmup_model
15
+
from aiu_fms_testing_utils.utils.aiu_setup import dprint
16
+
from fms.models import get_model
17
+
from fms.utils.generation import generate, pad_input_ids
18
+
from torch_sendnn import torch_sendnn
19
+
from transformers import AutoTokenizer
20
+
```
21
+
22
+
Now, add the model setup and tokenizer details.
23
+
```python
24
+
# We will provide our model as a variant as below. If you have a model available locally, you can use model_path variable instead of variant.
25
+
variant ="ibm-granite/granite-3.0-8b-base"# or "ibm-ai-platform/micro-g3.3-8b-instruct-1b" etc.
26
+
model = get_model(
27
+
architecture="hf_pretrained",
28
+
variant=variant,
29
+
device_type="cpu",
30
+
data_type=torch.float16,
31
+
fused_weights=False,
32
+
)
33
+
model.eval()
34
+
torch.set_grad_enabled(False)
35
+
model.compile(backend="sendnn") # Compile with the AIU sendnn backend
template ="Below is an instruction that describes a task. Write a response that appropriately completes the request.\n\n### Instruction:\n{}\n\n### Response:"
49
+
prompt = template.format("Provide a list of instructions for preparing chicken soup.")
template="Below is an instruction that describes a task. Write a response that appropriately completes the request.\n\n### Instruction:\n{}\n\n### Response:"
31
+
prompt=template.format("Provide a list of instructions for preparing chicken soup.")
0 commit comments