You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: .github/copilot-instructions.md
+25-38Lines changed: 25 additions & 38 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -12,61 +12,48 @@ Iris is a Triton-based framework for Remote Memory Access (RMA) operations on AM
12
12
13
13
**FOLLOW THESE INSTRUCTIONS EXACTLY. Reference these instructions first before using search or bash commands.**
14
14
15
-
## Prerequisites
15
+
## GPU Environment
16
16
17
-
-**GPU**: AMD GPUs with ROCm compatibility (tested on MI300X, MI350X & MI355X)
18
-
> **Note**: See below for instructions on development without AMD GPU access
19
-
-**ROCm/HIP Toolkit**: Required for building C++/HIP components
20
-
-**Docker/Apptainer**: Recommended for containerized development
17
+
**You have AMD GPU access.** Your runner has 8x MI325X GPUs (gfx942) with ROCm installed. Do NOT skip GPU operations or say you lack GPU access. You can and should:
21
18
22
-
## Build
19
+
- Run benchmarks and post results (charts, tables) in PR comments
20
+
- Run tests with `torchrun --nproc_per_node=N` for multi-GPU tests
21
+
- Use `rocm-smi` to verify GPU status
22
+
- Run `python -c "import torch; print(torch.cuda.device_count())"` to confirm GPU count
23
23
24
-
### Docker Development Environment (Recommended)
24
+
When asked to run a benchmark, **run it and post the output**. Do not say you cannot.
25
+
26
+
### Running multi-GPU tests and benchmarks
27
+
28
+
Multi-GPU tests require `torch.distributed` initialization before pytest:
25
29
```bash
26
-
#Build and start development container (takes 45-60 minutes - NEVER CANCEL)
Benchmarks use the declarative `iris.bench` framework. See existing `benchmark/ops/bench_*.py` files for examples. Output includes latency, throughput, and bandwidth tables. When posting benchmark results in PR comments, format as markdown tables.
43
43
44
-
# Install Iris
45
-
cd iris && pip install -e ".[dev]"
46
-
```
44
+
## Prerequisites
47
45
48
-
### Apptainer Setup
49
-
```bash
50
-
# Build and run Apptainer image
51
-
./apptainer/build.sh
52
-
./apptainer/run.sh
46
+
-**GPU**: AMD GPUs with ROCm compatibility (tested on MI300X, MI325X, MI350X & MI355X)
47
+
-**ROCm/HIP Toolkit**: Required for building C++/HIP components
48
+
-**Docker/Apptainer**: Recommended for containerized development
53
49
54
-
# Install Iris
55
-
pip install -e ".[dev]"
56
-
```
50
+
## Build
57
51
58
-
### Local Development (Not Recommended)
52
+
iris is already installed in your environment via `pip install -e .` in the setup steps. You do not need to build or install anything. If you need to reinstall after modifying `setup.py` or C extensions:
59
53
```bash
60
-
# Requires ROCm/HIP toolkit installation
61
54
pip install -e ".[dev]"
62
55
```
63
56
64
-
### Development Without AMD GPU
65
-
If you don't have access to AMD GPUs, you can still contribute to the project:
66
-
-**Code Editing**: Start editing code directly in your local environment
67
-
-**CI Testing**: The project has comprehensive CI pipelines that will test your changes automatically. You can check the CI logs if your changes fail to understand what went wrong.
68
-
-**Local Validation**: Run linting and formatting locally: `ruff check . --fix && ruff format .`
0 commit comments