Skip to content

Commit 02737dc

Browse files
committed
Add singularity and slurm instructions
1 parent bd08746 commit 02737dc

21 files changed

Lines changed: 525 additions & 104 deletions

.dockerignore

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
# Applies to build only, ignore everything by default
2+
**
3+
4+
# Keep only files needed for build
5+
!environment.yml
6+
!pyproject.toml
7+
!docker/
8+
!docker-compose.yml

.env

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -8,9 +8,12 @@ IMAGE_NAME=todo-image-name
88
IMAGE_TAG=latest
99

1010
CONTAINER_NAME=${IMAGE_NAME}
11-
CONTAINER_HOME_FOLDER=/root
11+
IMAGE_USER=todo-image-user
12+
HOME_FOLDER=/home/${IMAGE_USER}
1213
CODE_FOLDER=${IMAGE_NAME}
1314

1415
HOST_UID=$(id -u)
1516
HOST_GID=$(id -g)
1617
HOSTNAME=${HOSTNAME}
18+
19+
BUILDER=multi-platform

README.md

Lines changed: 53 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -1,13 +1,17 @@
1-
# New Docker Repository
1+
# New Cloud Computing Repository
22

3-
[![pre-commit](https://github.com/Tom-Notch/Docker-Repository-Template/actions/workflows/pre-commit.yml/badge.svg)](https://github.com/Tom-Notch/Docker-Repository-Template/actions/workflows/pre-commit.yml)
3+
[![pre-commit](https://github.com/Tom-Notch/Cloud-Computing-Repository-Template/actions/workflows/pre-commit.yml/badge.svg)](https://github.com/Tom-Notch/Cloud-Computing-Repository-Template/actions/workflows/pre-commit.yml)
44

55
## Dependencies
66

77
- [Docker](https://docs.docker.com/get-docker/)
8+
- [Singularity/Apptainer](https://apptainer.org/)
89

910
## Usage Guidelines
1011

12+
TLDR: Search for `todo` and update all occurrences to your desired name
13+
Docker and singularity is not a must unless you cannot install some dependencies locally on HPC shell environment due to permission issue
14+
1115
### Base Repo
1216

1317
1. Change [LICENSE](LICENSE) if necessary
@@ -18,20 +22,59 @@
1822

1923
### Docker Config
2024

21-
1. Modify **DOCKER_USER**, **IMAGE_NAME** in [.env](.env)
25+
1. Modify `DOCKER_USER`, `IMAGE_NAME`, `IMAGE_USER` in [.env](.env)
26+
27+
- [.env](env) will be loaded when you use docker compose for build/run/push/...
28+
- `DOCKER_USER` refers to your docker hub account username
29+
- `IMAGE_USER` refers to the default user inside the image, which is used to determine home folder
30+
31+
1. Modify the service name from `default` to your service name in [docker-compose.yml](docker-compose.yml), add additional volume mounting options such as dataset directories
32+
33+
1. Update [Dockerfile](docker/latest/Dockerfile) and [.dockerignore](.dockerignore)
34+
35+
- Existing dockerfile has screen & tmux config, oh-my-zsh, cmake, and other basic goodies
36+
- Add any additional dependency installations at appropriate locations
37+
38+
1. [build_docker_image.sh](scripts/build_docker_image.sh) to build and test the image locally in your machine's architecture
39+
40+
- Do this on a machine where you have docker permission, HPC clusters usually restrict docker access for security reasons
41+
- The scripts uses buildx to build multi-arch image, you can disable this by removing redundant archs in [docker-compose.yml](docker-compose.yml)
42+
- Building stage does not have GPU access, if some of your dependencies need GPU, build them inside a running container and commit to the final image
43+
44+
1. To run and test a built image, use [run_docker_container.sh](scripts/run_docker_container.sh) or `docker compose up -d`
45+
46+
- The service by default will mount the whole repository onto `CODE_FOLDER` inside the container so any modification inside also takes effect outside, which is useful when you use vscode remote extension to develop inside a running container with docker context
47+
48+
1. [push_docker_image.sh](scripts/push_docker_image.sh) to push the multi-arch image to docker hub
49+
50+
- You should have the docker hub repository set up before pushing
51+
52+
### Singularity Config
53+
54+
1. [pull_singularity_image.sh](scripts/pull_singularity_image.sh) to build the singularity image locally
55+
56+
- Singularity image can be built upon existing docker image
57+
58+
1. [run_singularity_instance.sh](scripts/run_singularity_instance.sh) to test the image
59+
60+
- Add additional volume binding options to the script such as dataset directories, best practice is to define in [.env](.env) then export in [variables.sh](scripts/variables.sh) with `resolve_host_path` to turn relative path into absolute real path
61+
- Singularity instances by default has less environment separation than docker containers unless you specify the additional options like the script
62+
63+
### Job Config
2264

23-
1. Modify the service name from **default** to your service name in [docker-compose.yml](docker-compose.yml)
65+
1. Modify job specifications under [jobs/](jobs/)
2466

25-
1. Update [Dockerfile](docker/latest/Dockerfile)
67+
- Each (HPC) Slurm environment has different partition definitions, which are often heterogeneous, you can query this by `sinfo` with some options
68+
- All the jobs has `-l`(login) options in shebang so that any command working in your current shell environment should also run as a job
2669

27-
1. [build.sh](scripts/build.sh) to build and test the image locally in your machine's architecture
70+
1. Submit job by `sbatch jobs/your-cluster/your-job.job` or `jobs/your-cluster/your-job.job`
2871

29-
1. [push.sh](scripts/push.sh) to push the multi-arch image to the registry
72+
1. Recommend [turm](https://github.com/kabouzeid/turm) for job monitor, use `turn -u your-slurm-user` after installation
3073

3174
## Developer Quick Start
3275

33-
- Run [scripts/dev-setup.sh](scripts/dev-setup.sh) to setup the development environment
76+
- Run [dev_setup.sh](scripts/dev_setup.sh) to setup the development environment
3477

35-
## Note
78+
## Maintainer
3679

37-
- This template currently only supports docker image for amd64 and arm64, if you want to support other architectures, please modify the [build.sh](scripts/build.sh) script and [docker-compose.yml](docker-compose.yml) accordingly
80+
- Mukai (Tom Notch) Yu: [mukaiy@andrew.cmu.edu](mailto:mukaiy@andrew.cmu.edu)

docker/latest/Dockerfile

Lines changed: 130 additions & 40 deletions
Original file line numberDiff line numberDiff line change
@@ -1,19 +1,71 @@
11
# Do not add --platform=linux/blabla since this is intended for multiplatform builds
2-
FROM tomnotch/bipvrobotics-base-image:latest
3-
ENV HOME_FOLDER=/root
2+
FROM nvidia/cuda:13.0.0-cudnn-devel-ubuntu24.04
3+
ENV HOME=$HOME_FOLDER
44
WORKDIR $HOME_FOLDER/
55

66
# Fix apt install stuck problem
77
ENV DEBIAN_FRONTEND=noninteractive
88
RUN echo 'debconf debconf/frontend select Noninteractive' | debconf-set-selections
99

10+
# Copy home folder config
11+
COPY --from=home-folder-config . $HOME_FOLDER/
12+
13+
# Remove cuda source list
14+
# E: Failed to fetch https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2404/x86_64/Packages.gz File has unexpected size (853674 != 833365). Mirror sync in progress? [IP: 23.58.157.10 443]
15+
RUN rm /etc/apt/sources.list.d/cuda.list
16+
1017
# update all obsolete packages to latest, install sudo, and cleanup
1118
RUN apt update -o Acquire::Check-Valid-Until=false -o Acquire::AllowInsecureRepositories=true -o Acquire::AllowDowngradeToInsecureRepositories=true && \
1219
apt full-upgrade -y && \
1320
apt install -y sudo ca-certificates && \
1421
apt autoremove -y && \
1522
apt autoclean -y
1623

24+
# fix local time problem
25+
RUN apt-get install -y --no-install-recommends tzdata && \
26+
ln -fs /usr/share/zoneinfo/America/New_York /etc/localtime && \
27+
echo "America/New_York" > /etc/timezone && \
28+
dpkg-reconfigure --frontend noninteractive tzdata
29+
30+
# update all obsolete packages to latest, install sudo, and cleanup
31+
RUN apt update -o Acquire::Check-Valid-Until=false -o Acquire::AllowInsecureRepositories=true -o Acquire::AllowDowngradeToInsecureRepositories=true && \
32+
apt full-upgrade -y && \
33+
apt install -y sudo ca-certificates && \
34+
apt autoremove -y && \
35+
apt autoclean -y
36+
37+
# Install some goodies
38+
RUN apt-get install -y --no-install-recommends \
39+
build-essential \
40+
clang-format \
41+
curl \
42+
dirmngr \
43+
git \
44+
gnupg \
45+
htop \
46+
less \
47+
locate \
48+
lsb-release \
49+
nano \
50+
ncdu \
51+
net-tools \
52+
perl \
53+
screen \
54+
software-properties-common \
55+
tmux \
56+
tmuxp \
57+
tree \
58+
unzip \
59+
valgrind \
60+
vim \
61+
wget \
62+
zsh
63+
64+
# upgrade cmake to kitware official apt-get repo release version
65+
RUN wget https://apt-get.kitware.com/kitware-archive.sh -O- | sh -s && \
66+
apt-get upgrade -y cmake && \
67+
apt-get autoremove -y
68+
1769
# # Add a new group and user
1870
# RUN addgroup --gid 1000 $USER && \
1971
# adduser --uid 1000 --ingroup $USER --home $HOME_FOLDER --shell /bin/zsh --disabled-password --gecos "" $USER && \
@@ -25,44 +77,76 @@ RUN apt update -o Acquire::Check-Valid-Until=false -o Acquire::AllowInsecureRepo
2577
# chmod 4755 /usr/local/bin/fixuid && \
2678
# mkdir -p /etc/fixuid
2779

28-
# # Switch to the new user
29-
# USER $USER:$USER
30-
31-
#! Install OpenCV 4.2.0 with QUIRC support from source
32-
ENV OPENCV_VERSION=4.2.0
33-
RUN pip3 uninstall -y opencv && \
34-
apt install -y --no-install-recommends libavcodec-dev libavformat-dev libswscale-dev libgstreamer-plugins-base1.0-dev libgstreamer1.0-dev libgtk-3-dev libpng-dev libjpeg-dev && \
35-
git clone --depth 1 --recursive https://github.com/opencv/opencv.git $HOME_FOLDER/opencv -b $OPENCV_VERSION && \
36-
# follow the error information,replace all the “ipcp-unit-growth” with “ipa-cp-unit-growth” in 3rdparty/carotene/CMakeLists.txt and 3rdparty/carotene/hal/CMakeLists.txt
37-
perl -pi -e 's/ipcp-unit-growth/ipa-cp-unit-growth/g' $HOME_FOLDER/opencv/3rdparty/carotene/CMakeLists.txt $HOME_FOLDER/opencv/3rdparty/carotene/hal/CMakeLists.txt && \
38-
git clone --depth 1 --recursive https://github.com/opencv/opencv_contrib.git $HOME_FOLDER/opencv_contrib -b $OPENCV_VERSION && \
39-
mkdir -p $HOME_FOLDER/opencv/build && \
40-
cd $HOME_FOLDER/opencv/build && \
41-
cmake \
42-
-D CMAKE_CXX_STANDARD=20 \
43-
-D EIGEN_INCLUDE_PATH=/usr/include/eigen3 \
44-
-D OPENCV_GENERATE_PKGCONFIG=ON \
45-
-D BUILD_opencv_python3=ON \
46-
-D OPENCV_PYTHON3_INSTALL_PATH=/usr/local/lib/python3.8/dist-packages \
47-
# -D OPENCV_ENABLE_NONFREE=ON \
48-
-D OPENCV_EXTRA_MODULES_PATH=../../opencv_contrib/modules \
49-
# -D WITH_LAPACK=ON \
50-
-D WITH_GTK=ON \
51-
-D WITH_TBB=ON \
52-
# -D WITH_QUIRC=ON \
53-
-D WITH_GSTREAMER=ON \
54-
-D WITH_V4L=ON \
55-
# -D WITH_OPENGL=ON \
56-
-D BUILD_TESTS=OFF \
57-
-D BUILD_PERF_TESTS=OFF \
58-
-D BUILD_EXAMPLES=OFF \
59-
-D CMAKE_BUILD_TYPE=RELEASE \
60-
-D CMAKE_INSTALL_PREFIX=/usr/local .. && \
61-
make install -j$(($(nproc)-1)) && \
62-
echo "export OpenCV_DIR=/usr/local/lib/cmake/opencv4/" >> ${HOME_FOLDER}/.zshrc && \
63-
echo "export OpenCV_DIR=/usr/local/lib/cmake/opencv4/" >> ${HOME_FOLDER}/.bashrc && \
64-
rm -rf $HOME_FOLDER/opencv && \
65-
rm -rf $HOME_FOLDER/opencv_contrib
80+
# install zsh, Oh-My-Zsh, and plugins
81+
RUN sh -c "$(wget -O- https://github.com/deluan/zsh-in-docker/releases/latest/download/zsh-in-docker.sh)" -- \
82+
-t https://github.com/romkatv/powerlevel10k \
83+
-p git \
84+
-p https://github.com/zsh-users/zsh-autosuggestions \
85+
-p https://github.com/zsh-users/zsh-completions \
86+
-p https://github.com/zsh-users/zsh-syntax-highlighting \
87+
-a "[[ ! -f $HOME_FOLDER/.p10k.zsh ]] || source $HOME_FOLDER/.p10k.zsh" \
88+
-a "POWERLEVEL9K_DISABLE_GITSTATUS=true" \
89+
-a "bindkey -M emacs '^[[3;5~' kill-word" \
90+
-a "bindkey '^H' backward-kill-word" \
91+
-a "autoload -U compinit && compinit" \
92+
-a "export PATH=~/.local/bin:$PATH"
93+
94+
# change default shell for the $USER in the image building process for extra environment safety
95+
RUN chsh -s $(which zsh)
96+
97+
# Use bash -lc for conda commands in build layers
98+
SHELL ["/bin/bash", "-lc"]
99+
100+
# Install miniconda at /opt/conda
101+
RUN wget "https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-$(uname -m).sh" -O ~/miniconda.sh && \
102+
sh ~/miniconda.sh -b -p /opt/conda && \
103+
rm ~/miniconda.sh
104+
105+
# Add conda to PATH
106+
ENV PATH=$PATH:/opt/conda/bin
107+
108+
# Init conda for zsh
109+
RUN conda init zsh
110+
111+
#! Examples of actual dependency install:
112+
# #! Install OpenCV 4.2.0 with QUIRC support from source
113+
# ENV OPENCV_VERSION=4.2.0
114+
# RUN pip3 uninstall -y opencv && \
115+
# apt install -y --no-install-recommends libavcodec-dev libavformat-dev libswscale-dev libgstreamer-plugins-base1.0-dev libgstreamer1.0-dev libgtk-3-dev libpng-dev libjpeg-dev && \
116+
# git clone --depth 1 --recursive https://github.com/opencv/opencv.git $HOME_FOLDER/opencv -b $OPENCV_VERSION && \
117+
# # follow the error information,replace all the “ipcp-unit-growth” with “ipa-cp-unit-growth” in 3rdparty/carotene/CMakeLists.txt and 3rdparty/carotene/hal/CMakeLists.txt
118+
# perl -pi -e 's/ipcp-unit-growth/ipa-cp-unit-growth/g' $HOME_FOLDER/opencv/3rdparty/carotene/CMakeLists.txt $HOME_FOLDER/opencv/3rdparty/carotene/hal/CMakeLists.txt && \
119+
# git clone --depth 1 --recursive https://github.com/opencv/opencv_contrib.git $HOME_FOLDER/opencv_contrib -b $OPENCV_VERSION && \
120+
# mkdir -p $HOME_FOLDER/opencv/build && \
121+
# cd $HOME_FOLDER/opencv/build && \
122+
# cmake \
123+
# -D CMAKE_CXX_STANDARD=20 \
124+
# -D EIGEN_INCLUDE_PATH=/usr/include/eigen3 \
125+
# -D OPENCV_GENERATE_PKGCONFIG=ON \
126+
# -D BUILD_opencv_python3=ON \
127+
# -D OPENCV_PYTHON3_INSTALL_PATH=/usr/local/lib/python3.8/dist-packages \
128+
# # -D OPENCV_ENABLE_NONFREE=ON \
129+
# -D OPENCV_EXTRA_MODULES_PATH=../../opencv_contrib/modules \
130+
# # -D WITH_LAPACK=ON \
131+
# -D WITH_GTK=ON \
132+
# -D WITH_TBB=ON \
133+
# # -D WITH_QUIRC=ON \
134+
# -D WITH_GSTREAMER=ON \
135+
# -D WITH_V4L=ON \
136+
# # -D WITH_OPENGL=ON \
137+
# -D BUILD_TESTS=OFF \
138+
# -D BUILD_PERF_TESTS=OFF \
139+
# -D BUILD_EXAMPLES=OFF \
140+
# -D CMAKE_BUILD_TYPE=RELEASE \
141+
# -D CMAKE_INSTALL_PREFIX=/usr/local .. && \
142+
# make install -j$(($(nproc)-1)) && \
143+
# echo "export OpenCV_DIR=/usr/local/lib/cmake/opencv4/" >> ${HOME_FOLDER}/.zshrc && \
144+
# echo "export OpenCV_DIR=/usr/local/lib/cmake/opencv4/" >> ${HOME_FOLDER}/.bashrc && \
145+
# rm -rf $HOME_FOLDER/opencv && \
146+
# rm -rf $HOME_FOLDER/opencv_contrib
147+
148+
# Add git safe directory
149+
RUN git config --global --add safe.directory "*"
66150

67151
# end of sudo apt installs
68152
RUN apt full-upgrade -y && \
@@ -71,6 +155,12 @@ RUN apt full-upgrade -y && \
71155
apt clean -y && \
72156
rm -rf /var/lib/apt/lists/*
73157

158+
# change owner of home folder
159+
RUN chown -R $IMAGE_USER:$IMAGE_USER $HOME_FOLDER
160+
161+
# change user
162+
USER $IMAGE_USER
163+
74164
# Set the default shell to zsh
75165
SHELL [ "/bin/zsh", "-c" ]
76166

jobs/perceptron/gpu.job

Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,19 @@
1+
#!/bin/bash -l
2+
#SBATCH -N 1 # Number of nodes
3+
#SBATCH -p a100-gpu-shared # partition
4+
#SBATCH --ntasks-per-node=2 # number of tasks per node
5+
#SBATCH --gpus-per-task=1 # number of GPUs per task
6+
#SBATCH --cpus-per-task=16 # number of CPU cores per task
7+
#SBATCH --mem=100G # CPU RAM
8+
#SBATCH -t 48:00:00 # time
9+
#SBATCH -o todo_your_job_name_%j.out
10+
#SBATCH --job-name todo_your_job_name
11+
#SBATCH --mail-type END
12+
#SBATCH --mail-user todo-your-email@your-domain
13+
14+
# echo commands to stdout
15+
set -x
16+
17+
cd todo-your-code-directory || exit # this directory refers to the absolute path outside any docker/singularity container/instance
18+
conda activate todo-your-conda-env-name
19+
srun todo-your-code-entrypoint

jobs/psc/gpu-singularity.job

Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,20 @@
1+
#!/bin/bash -l
2+
#SBATCH -N 1 # Number of nodes
3+
#SBATCH -p GPU-shared # partition
4+
#SBATCH --gpus=h100-80:2 # gpu type and number, use --gres for GPU partition
5+
#SBATCH --ntasks-per-node=2 # number of tasks per node
6+
#SBATCH --cpus-per-task=12 # number of CPU cores per task
7+
#SBATCH --mem-per-gpu=100G # CPU RAM per GPU
8+
#SBATCH -t 48:00:00 # time
9+
#SBATCH -A todo-your-project-id
10+
#SBATCH -o todo_your_job_name_%j.out
11+
#SBATCH --job-name todo_your_job_name
12+
#SBATCH --mail-type END
13+
#SBATCH --mail-user todo-your-email@your-domain
14+
15+
# echo commands to stdout
16+
set -x
17+
18+
cd todo-your-code-directory || exit # this directory refers to the absolute path outside any docker/singularity container/instance
19+
scripts/run_singularity_instance.sh
20+
srun singularity exec instance://todo-your-container-name zsh -c "source ~/.zshrc && conda run -n todo-your-conda-env-name todo-your-code-entrypoint"

jobs/psc/gpu.job

Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,20 @@
1+
#!/bin/bash -l
2+
#SBATCH -N 1 # Number of nodes
3+
#SBATCH -p GPU-shared # partition
4+
#SBATCH --gpus=h100-80:2 # gpu type and number, use --gres for GPU partition
5+
#SBATCH --ntasks-per-node=2 # number of tasks per node
6+
#SBATCH --cpus-per-task=12 # number of CPU cores per task
7+
#SBATCH --mem-per-gpu=100G # CPU RAM per GPU
8+
#SBATCH -t 48:00:00 # time
9+
#SBATCH -A todo-your-project-id
10+
#SBATCH -o todo_your_job_name_%j.out
11+
#SBATCH --job-name todo_your_job_name
12+
#SBATCH --mail-type END
13+
#SBATCH --mail-user todo-your-email@your-domain
14+
15+
# echo commands to stdout
16+
set -x
17+
18+
cd todo-your-code-directory || exit # this directory refers to the absolute path outside any docker/singularity container/instance
19+
conda activate todo-your-conda-env-name
20+
srun todo-your-code-entrypoint

0 commit comments

Comments
 (0)