Skip to content

Commit f537a20

Browse files
authored
Merge branch 'main' into astroC86/load-store-latency
2 parents 003b273 + 5bc9719 commit f537a20

52 files changed

Lines changed: 6106 additions & 195 deletions

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

.github/copilot-instructions.md

Lines changed: 126 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,126 @@
1+
# Iris: Multi-GPU Programming Framework
2+
3+
## Description
4+
5+
Iris is a Triton-based framework for Remote Memory Access (RMA) operations on AMD GPUs. It provides SHMEM-like APIs within Triton for Multi-GPU programming with:
6+
7+
- Clean abstractions with full symmetric heap implementation
8+
- Pythonic PyTorch-like host APIs for tensor operations
9+
- Triton-style device APIs for load, store, and atomic operations
10+
- Minimal dependencies (Triton, PyTorch, HIP runtime, mpi4py)
11+
- Comprehensive examples showing communication/computation overlap
12+
13+
**FOLLOW THESE INSTRUCTIONS EXACTLY. Reference these instructions first before using search or bash commands.**
14+
15+
## Prerequisites
16+
17+
- **GPU**: AMD GPUs with ROCm compatibility (tested on MI300X, MI350X & MI355X)
18+
> **Note**: See below for instructions on development without AMD GPU access
19+
- **ROCm/HIP Toolkit**: Required for building C++/HIP components
20+
- **MPI**: Required for multi-GPU operations
21+
- **Docker/Apptainer**: Recommended for containerized development
22+
23+
## Build
24+
25+
### Docker Development Environment (Recommended)
26+
```bash
27+
# Build and start development container (takes 45-60 minutes - NEVER CANCEL)
28+
docker compose up --build -d
29+
30+
# Attach to running container
31+
docker attach iris-dev
32+
33+
# Install Iris in development mode
34+
cd iris && pip install -e ".[dev]"
35+
```
36+
37+
### Alternative Docker Setup
38+
```bash
39+
# Build Docker image manually
40+
./docker/build.sh <image-name> # Takes 45-60 minutes
41+
42+
# Run container
43+
./docker/run.sh <image-name>
44+
45+
# Install Iris
46+
cd iris && pip install -e ".[dev]"
47+
```
48+
49+
### Apptainer Setup
50+
```bash
51+
# Build and run Apptainer image
52+
./apptainer/build.sh
53+
./apptainer/run.sh
54+
55+
# Install Iris
56+
pip install -e ".[dev]"
57+
```
58+
59+
### Local Development (Not Recommended)
60+
```bash
61+
# Requires ROCm/HIP toolkit installation
62+
pip install -e ".[dev]"
63+
```
64+
65+
### Development Without AMD GPU
66+
If you don't have access to AMD GPUs, you can still contribute to the project:
67+
- **Code Editing**: Start editing code directly in your local environment
68+
- **CI Testing**: The project has comprehensive CI pipelines that will test your changes automatically. You can check the CI logs if your changes fail to understand what went wrong.
69+
- **Local Validation**: Run linting and formatting locally: `ruff check . --fix && ruff format .`
70+
71+
## Run
72+
73+
### Testing
74+
```bash
75+
# Run unit tests
76+
pytest tests/unittests/
77+
78+
# Run example tests
79+
pytest tests/examples/
80+
81+
# Run specific example (requires MPI and GPU)
82+
mpirun -np 8 python examples/00_load/load_bench.py
83+
```
84+
85+
### Code Quality
86+
```bash
87+
# Linting and formatting
88+
ruff check .
89+
ruff format .
90+
91+
# Pre-commit validation (required)
92+
ruff check . --fix
93+
ruff format .
94+
```
95+
96+
## Contributing Guidelines
97+
98+
### Development Workflow
99+
1. **Setup**: Install with dev dependencies: `pip install -e ".[dev]"`
100+
2. **Branch**: Create feature branch: `git checkout -b $USER/feature-name`
101+
3. **Develop**: Follow existing code style, add tests, update docs
102+
4. **Test**: Run `ruff check .`, `ruff format .`, and `pytest`
103+
5. **Commit**: Use descriptive commit messages
104+
6. **PR**: Create pull request with change details
105+
106+
### Code Standards
107+
- Follow existing code style and patterns
108+
- Add tests for new functionality
109+
- Update documentation as needed
110+
- Ensure all tests pass before submitting PR
111+
- Run pre-commit validation: `ruff check . --fix && ruff format .`
112+
113+
### Repository Structure
114+
```
115+
iris/
116+
├── iris/ # Main Python package
117+
├── csrc/ # C++/HIP source code
118+
├── examples/ # Algorithm implementations
119+
├── tests/ # Test suite
120+
├── docker/ # Docker configuration
121+
└── docs/ # Documentation
122+
```
123+
124+
## License
125+
126+
MIT License - see [LICENSE](LICENSE) file for details.

.github/workflows/auto-label.yml

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -37,7 +37,7 @@ jobs:
3737

3838
# PRs: label so the project rule moves them to In Progress
3939
label-prs:
40-
if: github.event_name == 'pull_request'
40+
if: github.event_name == 'pull_request' && !github.event.pull_request.head.repo.fork
4141
runs-on: ubuntu-latest
4242
steps:
4343
- name: Add iris + in-progress labels to PR
@@ -49,4 +49,5 @@ jobs:
4949
repo: context.repo.repo,
5050
issue_number: context.payload.pull_request.number,
5151
labels: ["iris", "in-progress"]
52-
});
52+
})
53+
Lines changed: 79 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,79 @@
1+
name: Iris Tests with Apptainer
2+
3+
on:
4+
push:
5+
branches: [ main ]
6+
pull_request:
7+
branches: [ main ]
8+
workflow_dispatch:
9+
10+
concurrency:
11+
group: ${{ github.workflow }}-${{ github.head_ref || github.ref }}
12+
cancel-in-progress: ${{ github.ref != 'refs/heads/main' }}
13+
14+
jobs:
15+
build-apptainer-image:
16+
runs-on: [self-hosted, mi3008x]
17+
timeout-minutes: 90
18+
19+
steps:
20+
- name: Checkout repository
21+
uses: actions/checkout@v4
22+
23+
- name: Setup Apptainer
24+
run: |
25+
apt-get update && apt-get install -y software-properties-common
26+
add-apt-repository -y ppa:apptainer/ppa
27+
apt-get update && apt-get install -y apptainer
28+
29+
- name: Build Iris Apptainer container
30+
run: |
31+
# Create persistent Apptainer directory
32+
mkdir -p ~/apptainer
33+
34+
# Build Apptainer image from definition file (only if it doesn't exist)
35+
if [ ! -f ~/apptainer/iris-dev.sif ]; then
36+
echo "Building new Apptainer image..."
37+
apptainer build ~/apptainer/iris-dev.sif apptainer/iris.def
38+
else
39+
echo "Using existing Apptainer image"
40+
fi
41+
run-tests:
42+
name: ${{ matrix.ranks }}-rank Iris Test
43+
needs: build-apptainer-image
44+
runs-on: [self-hosted, mi3008x]
45+
timeout-minutes: 20
46+
strategy:
47+
matrix:
48+
ranks: [1, 2, 4, 8]
49+
max-parallel: 1
50+
51+
steps:
52+
- name: Checkout repository
53+
uses: actions/checkout@v4
54+
55+
- name: Run Iris Tests with ${{ matrix.ranks }} MPI ranks
56+
run: |
57+
apptainer exec ~/apptainer/iris-dev.sif bash -c "
58+
set -e # Exit on any error
59+
60+
# Install iris first
61+
pip install -e .
62+
63+
# Create function for mpirun with root permissions
64+
mpirun-root() { mpirun --allow-run-as-root \"\$@\"; }
65+
66+
# Run examples tests one at a time
67+
echo 'Running examples tests one at a time...'
68+
for test_file in tests/examples/test_*.py; do
69+
echo \"Testing: \$test_file with ${{ matrix.ranks }} MPI ranks\"
70+
mpirun-root -np ${{ matrix.ranks }} python -m pytest \"\$test_file\" -v --tb=short
71+
done
72+
73+
# Run unit tests one at a time
74+
echo 'Running unit tests one at a time...'
75+
for test_file in tests/unittests/test_*.py; do
76+
echo \"Testing: \$test_file with ${{ matrix.ranks }} MPI ranks\"
77+
mpirun-root -np ${{ matrix.ranks }} python -m pytest \"\$test_file\" -v --tb=short
78+
done
79+
"

CODE_OF_CONDUCT.md

Lines changed: 128 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,128 @@
1+
# Contributor Covenant Code of Conduct
2+
3+
## Our Pledge
4+
5+
We as members, contributors, and leaders pledge to make participation in our
6+
community a harassment-free experience for everyone, regardless of age, body
7+
size, visible or invisible disability, ethnicity, sex characteristics, gender
8+
identity and expression, level of experience, education, socio-economic status,
9+
nationality, personal appearance, race, religion, or sexual identity
10+
and orientation.
11+
12+
We pledge to act and interact in ways that contribute to an open, welcoming,
13+
diverse, inclusive, and healthy community.
14+
15+
## Our Standards
16+
17+
Examples of behavior that contributes to a positive environment for our
18+
community include:
19+
20+
* Demonstrating empathy and kindness toward other people
21+
* Being respectful of differing opinions, viewpoints, and experiences
22+
* Giving and gracefully accepting constructive feedback
23+
* Accepting responsibility and apologizing to those affected by our mistakes,
24+
and learning from the experience
25+
* Focusing on what is best not just for us as individuals, but for the
26+
overall community
27+
28+
Examples of unacceptable behavior include:
29+
30+
* The use of sexualized language or imagery, and sexual attention or
31+
advances of any kind
32+
* Trolling, insulting or derogatory comments, and personal or political attacks
33+
* Public or private harassment
34+
* Publishing others' private information, such as a physical or email
35+
address, without their explicit permission
36+
* Other conduct which could reasonably be considered inappropriate in a
37+
professional setting
38+
39+
## Enforcement Responsibilities
40+
41+
Community leaders are responsible for clarifying and enforcing our standards of
42+
acceptable behavior and will take appropriate and fair corrective action in
43+
response to any behavior that they deem inappropriate, threatening, offensive,
44+
or harmful.
45+
46+
Community leaders have the right and responsibility to remove, edit, or reject
47+
comments, commits, code, wiki edits, issues, and other contributions that are
48+
not aligned to this Code of Conduct, and will communicate reasons for moderation
49+
decisions when appropriate.
50+
51+
## Scope
52+
53+
This Code of Conduct applies within all community spaces, and also applies when
54+
an individual is officially representing the community in public spaces.
55+
Examples of representing our community include using an official e-mail address,
56+
posting via an official social media account, or acting as an appointed
57+
representative at an online or offline event.
58+
59+
## Enforcement
60+
61+
Instances of abusive, harassing, or otherwise unacceptable behavior may be
62+
reported to the community leaders responsible for enforcement at
63+
muhaawad at amd dot com.
64+
All complaints will be reviewed and investigated promptly and fairly.
65+
66+
All community leaders are obligated to respect the privacy and security of the
67+
reporter of any incident.
68+
69+
## Enforcement Guidelines
70+
71+
Community leaders will follow these Community Impact Guidelines in determining
72+
the consequences for any action they deem in violation of this Code of Conduct:
73+
74+
### 1. Correction
75+
76+
**Community Impact**: Use of inappropriate language or other behavior deemed
77+
unprofessional or unwelcome in the community.
78+
79+
**Consequence**: A private, written warning from community leaders, providing
80+
clarity around the nature of the violation and an explanation of why the
81+
behavior was inappropriate. A public apology may be requested.
82+
83+
### 2. Warning
84+
85+
**Community Impact**: A violation through a single incident or series
86+
of actions.
87+
88+
**Consequence**: A warning with consequences for continued behavior. No
89+
interaction with the people involved, including unsolicited interaction with
90+
those enforcing the Code of Conduct, for a specified period of time. This
91+
includes avoiding interactions in community spaces as well as external channels
92+
like social media. Violating these terms may lead to a temporary or
93+
permanent ban.
94+
95+
### 3. Temporary Ban
96+
97+
**Community Impact**: A serious violation of community standards, including
98+
sustained inappropriate behavior.
99+
100+
**Consequence**: A temporary ban from any sort of interaction or public
101+
communication with the community for a specified period of time. No public or
102+
private interaction with the people involved, including unsolicited interaction
103+
with those enforcing the Code of Conduct, is allowed during this period.
104+
Violating these terms may lead to a permanent ban.
105+
106+
### 4. Permanent Ban
107+
108+
**Community Impact**: Demonstrating a pattern of violation of community
109+
standards, including sustained inappropriate behavior, harassment of an
110+
individual, or aggression toward or disparagement of classes of individuals.
111+
112+
**Consequence**: A permanent ban from any sort of public interaction within
113+
the community.
114+
115+
## Attribution
116+
117+
This Code of Conduct is adapted from the [Contributor Covenant][homepage],
118+
version 2.0, available at
119+
https://www.contributor-covenant.org/version/2/0/code_of_conduct.html.
120+
121+
Community Impact Guidelines were inspired by [Mozilla's code of conduct
122+
enforcement ladder](https://github.com/mozilla/diversity).
123+
124+
[homepage]: https://www.contributor-covenant.org
125+
126+
For answers to common questions about this code of conduct, see the FAQ at
127+
https://www.contributor-covenant.org/faq. Translations are available at
128+
https://www.contributor-covenant.org/translations.

0 commit comments

Comments
 (0)