Skip to content

Commit 34f91ae

Browse files
committed
Improve workflow and documentation
1 parent bdee69b commit 34f91ae

15 files changed

Lines changed: 87 additions & 73 deletions

.env

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -14,6 +14,7 @@ CODE_FOLDER=${IMAGE_NAME}
1414

1515
HOST_UID=$(id -u)
1616
HOST_GID=$(id -g)
17-
HOSTNAME=${HOSTNAME}
17+
HOST=${HOST:-$(hostname)}
18+
HOSTNAME=${HOSTNAME:-$(hostname)}
1819

1920
BUILDER=multi-platform

.gitignore

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -354,3 +354,6 @@ AMENT_IGNORE
354354
.ionide
355355

356356
# End of https://www.toptal.com/developers/gitignore/api/visualstudiocode,c++,python,jupyternotebooks,ros,ros2,matlab,git
357+
358+
# Singularity images
359+
*.sif

README.md

Lines changed: 17 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -13,7 +13,7 @@ TLDR: Search for `todo` and update all occurrences to your desired name
1313

1414
Docker and singularity is not a must unless you cannot install some dependencies locally on HPC shell environment due to permission issue
1515

16-
### Base Repo
16+
### Base Repository
1717

1818
1. Change [LICENSE](LICENSE) if necessary
1919

@@ -23,13 +23,15 @@ Docker and singularity is not a must unless you cannot install some dependencies
2323

2424
### Docker Config
2525

26+
Continue on a machine where you have docker permission, HPC clusters usually restrict docker access for security reasons
27+
2628
1. Modify `todo-docker-user`, `todo-image-name`, `todo-image-user` in [.env](.env)
2729

2830
- [.env](env) will be loaded when you use docker compose for build/run/push/...
2931
- `todo-docker-user` refers to your docker hub account username
3032
- `todo-image-user` refers to the default user inside the image, which is used to determine home folder
3133

32-
1. Modify the service name from `default` to your service name in [docker-compose.yml](docker-compose.yml), add additional volume mounting options such as dataset directories
34+
1. Modify the service name from `todo-service-name` to your service name in [docker-compose.yml](docker-compose.yml), add additional volume mounting options such as dataset directories
3335

3436
1. Update [Dockerfile](docker/latest/Dockerfile) and [.dockerignore](.dockerignore)
3537

@@ -38,39 +40,45 @@ Docker and singularity is not a must unless you cannot install some dependencies
3840

3941
1. [build_docker_image.sh](scripts/build_docker_image.sh) to build and test the image locally in your machine's architecture
4042

41-
- Do this on a machine where you have docker permission, HPC clusters usually restrict docker access for security reasons
4243
- The scripts uses buildx to build multi-arch image, you can disable this by removing redundant archs in [docker-compose.yml](docker-compose.yml)
4344
- Building stage does not have GPU access, if some of your dependencies need GPU, build them inside a running container and commit to the final image
4445

45-
1. To run and test a built image, use [run_docker_container.sh](scripts/run_docker_container.sh) or `docker compose up -d`
46+
1. [run_docker_container.sh](scripts/run_docker_container.sh) or `docker compose up -d` to run and test a built image
4647

4748
- The service by default will mount the whole repository onto `CODE_FOLDER` inside the container so any modification inside also takes effect outside, which is useful when you use vscode remote extension to develop inside a running container with remote docker context
49+
- You should be able to run and see GUI applications inside the container if `$DISPLAY` is set correctly when you run the script
4850

4951
1. [push_docker_image.sh](scripts/push_docker_image.sh) to push the multi-arch image to docker hub
5052

5153
- You should have the docker hub repository set up before pushing
5254

5355
### Singularity Config
5456

57+
Continue on the actual HPC cluster environment
58+
5559
1. [pull_singularity_image.sh](scripts/pull_singularity_image.sh) to build the singularity image locally
5660

5761
- Singularity image can be built upon existing docker image
62+
- You should see the image `todo-image-name_latest.def` after successfully built
5863

5964
1. [run_singularity_instance.sh](scripts/run_singularity_instance.sh) to test the image
6065

6166
- Add additional volume binding options to the script such as dataset directories, best practice is to define in [.env](.env) then export in [variables.sh](scripts/variables.sh) with `resolve_host_path` to turn relative path into absolute real path
62-
- Singularity instances by default has less environment isolation than docker containers unless you specify the additional options like the script
67+
- Singularity instances by default have less environment isolation than docker containers unless you specify the additional options like the script
6368

6469
### Job Config
6570

66-
1. Modify job specifications under [jobs/](jobs/)
71+
1. Modify job specifications under `jobs/`
6772

6873
- Each (HPC) Slurm environment has different partition definitions, which are often heterogeneous, you can query this by `sinfo` with some options
69-
- All the jobs has `-l`(login) options in shebang so that any command working in your current shell environment should also run as a job
74+
- `--ntasks-per-node` specifies number of parallelization, and it's convenient to tie other resources to task, e.g., `--gpus-per-task`, `--cpus-per-task`, `--mem-per-gpu`, so that you only need to increase ntasks to scale up on a node
75+
- All the jobs have `-l`(login) options in shebang so that any command working in your current shell environment should also run as a job
76+
77+
1. `sbatch jobs/your-cluster/your-job.job` or `jobs/your-cluster/your-job.job` to submit jobs
7078

71-
1. Submit job by `sbatch jobs/your-cluster/your-job.job` or `jobs/your-cluster/your-job.job`
79+
- You should see a file `todo_your_job_name_slurm_job_id.out` in the base folder of this repository, which contains job logs
7280

73-
1. Recommend [turm](https://github.com/kabouzeid/turm) for job monitor, use `turm -u your-slurm-user` after installation
81+
1. Recommend [turm](https://github.com/kabouzeid/turm) for job monitor outside the job, use `turm -u your-slurm-user` after installation
7482

7583
## Developer Quick Start
7684

docker-compose.yml

Lines changed: 5 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,5 @@
1-
version: "3"
2-
31
services:
4-
default:
2+
todo-service-name:
53
build:
64
dockerfile: ./docker/${IMAGE_TAG}/Dockerfile
75
x-bake:
@@ -17,7 +15,8 @@ services:
1715
pull_policy: missing
1816
container_name: ${CONTAINER_NAME}
1917
privileged: true
20-
hostname: ${HOSTNAME}
18+
user: root
19+
hostname: ${HOST}
2120
network_mode: host
2221
ipc: host
2322
pid: host
@@ -34,8 +33,8 @@ services:
3433
- /var/lib/systemd/coredump/:/cores
3534
- /tmp/.X11-unix/:/tmp/.X11-unix/:rw
3635
- ${XAUTH}:${XAUTH}
37-
- .:${CONTAINER_HOME_FOLDER}/${CODE_FOLDER}
38-
working_dir: ${CONTAINER_HOME_FOLDER}/${CODE_FOLDER}
36+
- .:${HOME_FOLDER}/${CODE_FOLDER}
37+
working_dir: ${HOME_FOLDER}/${CODE_FOLDER}
3938
stdin_open: true # for -it
4039
tty: true # for -it
4140
# command: /bin/zsh

docker/build-context/home-folder-config/.p10k.zsh

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -473,7 +473,7 @@
473473
typeset -g POWERLEVEL9K_VCS_MAX_INDEX_SIZE_DIRTY=-1
474474

475475
# Don't show Git status in prompt for repositories whose workdir matches this pattern.
476-
# For example, if set to '~', the Git repository at $HOME/.git will be ignored.
476+
# For example, if set to '~', the Git repository at ${HOME}/.git will be ignored.
477477
# Multiple patterns can be combined with '|': '~(|/foo)|/bar/baz/*'.
478478
typeset -g POWERLEVEL9K_VCS_DISABLED_WORKDIR_PATTERN='~'
479479

docker/latest/Dockerfile

Lines changed: 17 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -1,14 +1,14 @@
11
# Do not add --platform=linux/blabla since this is intended for multiplatform builds
22
FROM nvidia/cuda:13.0.0-cudnn-devel-ubuntu24.04
3-
ENV HOME=$HOME_FOLDER
4-
WORKDIR $HOME_FOLDER/
3+
ENV HOME=${HOME_FOLDER}
4+
WORKDIR ${HOME_FOLDER}/
55

66
# Fix apt install stuck problem
77
ENV DEBIAN_FRONTEND=noninteractive
88
RUN echo 'debconf debconf/frontend select Noninteractive' | debconf-set-selections
99

1010
# Copy home folder config
11-
COPY --from=home-folder-config . $HOME_FOLDER/
11+
COPY --from=home-folder-config . ${HOME_FOLDER}/
1212

1313
# Remove cuda source list
1414
# E: Failed to fetch https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2404/x86_64/Packages.gz File has unexpected size (853674 != 833365). Mirror sync in progress? [IP: 23.58.157.10 443]
@@ -68,7 +68,7 @@ RUN wget https://apt-get.kitware.com/kitware-archive.sh -O- | sh -s && \
6868

6969
# # Add a new group and user
7070
# RUN addgroup --gid 1000 $USER && \
71-
# adduser --uid 1000 --ingroup $USER --home $HOME_FOLDER --shell /bin/zsh --disabled-password --gecos "" $USER && \
71+
# adduser --uid 1000 --ingroup $USER --home ${HOME_FOLDER} --shell /bin/zsh --disabled-password --gecos "" $USER && \
7272
# echo "$USER ALL=(ALL) NOPASSWD:ALL" >> /etc/sudoers
7373

7474
# # Fix UID/GID when mounting from host using this: https://github.com/boxboat/fixuid
@@ -84,12 +84,12 @@ RUN sh -c "$(wget -O- https://github.com/deluan/zsh-in-docker/releases/latest/do
8484
-p https://github.com/zsh-users/zsh-autosuggestions \
8585
-p https://github.com/zsh-users/zsh-completions \
8686
-p https://github.com/zsh-users/zsh-syntax-highlighting \
87-
-a "[[ ! -f $HOME_FOLDER/.p10k.zsh ]] || source $HOME_FOLDER/.p10k.zsh" \
87+
-a "[[ ! -f ${HOME_FOLDER}/.p10k.zsh ]] || source ${HOME_FOLDER}/.p10k.zsh" \
8888
-a "POWERLEVEL9K_DISABLE_GITSTATUS=true" \
8989
-a "bindkey -M emacs '^[[3;5~' kill-word" \
9090
-a "bindkey '^H' backward-kill-word" \
9191
-a "autoload -U compinit && compinit" \
92-
-a "export PATH=~/.local/bin:$PATH"
92+
-a "export PATH=~/.local/bin:${PATH}"
9393

9494
# change default shell for the $USER in the image building process for extra environment safety
9595
RUN chsh -s $(which zsh)
@@ -103,7 +103,7 @@ RUN wget "https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-$(uname -m
103103
rm ~/miniconda.sh
104104

105105
# Add conda to PATH
106-
ENV PATH=$PATH:/opt/conda/bin
106+
ENV PATH=${PATH}:/opt/conda/bin
107107

108108
# Init conda for zsh
109109
RUN conda init zsh
@@ -113,12 +113,12 @@ RUN conda init zsh
113113
# ENV OPENCV_VERSION=4.2.0
114114
# RUN pip3 uninstall -y opencv && \
115115
# apt install -y --no-install-recommends libavcodec-dev libavformat-dev libswscale-dev libgstreamer-plugins-base1.0-dev libgstreamer1.0-dev libgtk-3-dev libpng-dev libjpeg-dev && \
116-
# git clone --depth 1 --recursive https://github.com/opencv/opencv.git $HOME_FOLDER/opencv -b $OPENCV_VERSION && \
116+
# git clone --depth 1 --recursive https://github.com/opencv/opencv.git ${HOME_FOLDER}/opencv -b $OPENCV_VERSION && \
117117
# # follow the error information,replace all the “ipcp-unit-growth” with “ipa-cp-unit-growth” in 3rdparty/carotene/CMakeLists.txt and 3rdparty/carotene/hal/CMakeLists.txt
118-
# perl -pi -e 's/ipcp-unit-growth/ipa-cp-unit-growth/g' $HOME_FOLDER/opencv/3rdparty/carotene/CMakeLists.txt $HOME_FOLDER/opencv/3rdparty/carotene/hal/CMakeLists.txt && \
119-
# git clone --depth 1 --recursive https://github.com/opencv/opencv_contrib.git $HOME_FOLDER/opencv_contrib -b $OPENCV_VERSION && \
120-
# mkdir -p $HOME_FOLDER/opencv/build && \
121-
# cd $HOME_FOLDER/opencv/build && \
118+
# perl -pi -e 's/ipcp-unit-growth/ipa-cp-unit-growth/g' ${HOME_FOLDER}/opencv/3rdparty/carotene/CMakeLists.txt ${HOME_FOLDER}/opencv/3rdparty/carotene/hal/CMakeLists.txt && \
119+
# git clone --depth 1 --recursive https://github.com/opencv/opencv_contrib.git ${HOME_FOLDER}/opencv_contrib -b $OPENCV_VERSION && \
120+
# mkdir -p ${HOME_FOLDER}/opencv/build && \
121+
# cd ${HOME_FOLDER}/opencv/build && \
122122
# cmake \
123123
# -D CMAKE_CXX_STANDARD=20 \
124124
# -D EIGEN_INCLUDE_PATH=/usr/include/eigen3 \
@@ -142,8 +142,8 @@ RUN conda init zsh
142142
# make install -j$(($(nproc)-1)) && \
143143
# echo "export OpenCV_DIR=/usr/local/lib/cmake/opencv4/" >> ${HOME_FOLDER}/.zshrc && \
144144
# echo "export OpenCV_DIR=/usr/local/lib/cmake/opencv4/" >> ${HOME_FOLDER}/.bashrc && \
145-
# rm -rf $HOME_FOLDER/opencv && \
146-
# rm -rf $HOME_FOLDER/opencv_contrib
145+
# rm -rf ${HOME_FOLDER}/opencv && \
146+
# rm -rf ${HOME_FOLDER}/opencv_contrib
147147

148148
# Add git safe directory
149149
RUN git config --global --add safe.directory "*"
@@ -156,16 +156,16 @@ RUN apt full-upgrade -y && \
156156
rm -rf /var/lib/apt/lists/*
157157

158158
# change owner of home folder
159-
RUN chown -R $IMAGE_USER:$IMAGE_USER $HOME_FOLDER
159+
RUN chown -R ${IMAGE_USER}:${IMAGE_USER} ${HOME_FOLDER}
160160

161161
# change user
162-
USER $IMAGE_USER
162+
USER ${IMAGE_USER}
163163

164164
# Set the default shell to zsh
165165
SHELL [ "/bin/zsh", "-c" ]
166166

167167
# # move fixuid config
168-
# RUN mv $HOME_FOLDER/fixuid-config.yml /etc/fixuid/config.yml
168+
# RUN mv ${HOME_FOLDER}/fixuid-config.yml /etc/fixuid/config.yml
169169

170170
# Entrypoint command
171171
# ENTRYPOINT [ "/bin/sh" , "-c", "fixuid; /bin/zsh" ]

jobs/psc/gpu-singularity.job

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -17,4 +17,5 @@ set -x
1717

1818
cd todo-your-code-directory || exit # this directory refers to the absolute path outside any docker/singularity container/instance
1919
scripts/run_singularity_instance.sh
20-
srun singularity exec instance://todo-your-container-name zsh -c "source ~/.zshrc && conda run -n todo-your-conda-env-name todo-your-code-entrypoint"
20+
source scripts/variables.sh
21+
srun singularity exec instance://"${CONTAINER_NAME}" zsh -c "source ~/.zshrc && conda run -n todo-your-conda-env-name todo-your-code-entrypoint"

scripts/dev_setup.sh

Lines changed: 12 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -15,14 +15,14 @@ echo "Please enter your sudo password:"
1515
read -rs SUDO_PASSWORD # Use -r to avoid mangling backslashes, -s for silent input
1616

1717
# Keep sudo alive during the script execution
18-
echo "$SUDO_PASSWORD" | sudo -v -S
18+
echo "${SUDO_PASSWORD}" | sudo -v -S
1919

2020
# Update package lists
21-
echo "$SUDO_PASSWORD" | sudo -S apt update
21+
echo "${SUDO_PASSWORD}" | sudo -S apt update
2222

2323
# Install dependencies
2424
echo "Installing clang-format clang-tidy python3 python3-pip"
25-
echo "$SUDO_PASSWORD" | sudo -S apt install -y clang-format clang-tidy libpython3-dev python3-pip
25+
echo "${SUDO_PASSWORD}" | sudo -S apt install -y clang-format clang-tidy libpython3-dev python3-pip
2626

2727
# Install/uprade pre-commit
2828
echo "Installing/upgrading pre-commit"
@@ -32,7 +32,7 @@ pip3 install --upgrade pre-commit
3232
DEFAULT_SHELL=$(basename "$SHELL")
3333

3434
# Determine the shell configuration file
35-
case "$DEFAULT_SHELL" in
35+
case "${DEFAULT_SHELL}" in
3636
bash)
3737
SHELL_RC="${HOME}/.bashrc"
3838
;;
@@ -46,24 +46,24 @@ ksh)
4646
SHELL_RC="${HOME}/.kshrc"
4747
;;
4848
*)
49-
echo "Unsupported shell: $DEFAULT_SHELL"
49+
echo "Unsupported shell: ${DEFAULT_SHELL}"
5050
exit 1
5151
;;
5252
esac
5353

5454
# Add pre-commit executable to PATH if not already present
55-
if ! grep -q 'export PATH=~/.local/bin:$PATH' "$SHELL_RC"; then
56-
echo "Adding pre-commit (actually python3-pip packages') executable to path in $SHELL_RC"
57-
echo 'export PATH=~/.local/bin:$PATH' >>"$SHELL_RC"
55+
if ! grep -q 'export PATH=~/.local/bin:${PATH}' "${SHELL_RC}"; then
56+
echo "Adding pre-commit (actually python3-pip packages') executable to path in ${SHELL_RC}"
57+
echo 'export PATH=~/.local/bin:${PATH}' >>"${SHELL_RC}"
5858
# Source the .zshrc file using zsh
59-
if [ "$DEFAULT_SHELL" = "zsh" ]; then
60-
zsh -c "source $SHELL_RC"
59+
if [ "${DEFAULT_SHELL}" = "zsh" ]; then
60+
zsh -c "source ${SHELL_RC}"
6161
else
6262
# shellcheck source=/dev/null
63-
. "$SHELL_RC"
63+
. "${SHELL_RC}"
6464
fi
6565
else
66-
echo "PATH already updated in $SHELL_RC"
66+
echo "PATH already updated in ${SHELL_RC}"
6767
fi
6868

6969
# Perform pre-installation of pre-commit hooks and dry run on all files

scripts/kill_docker_container.sh

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,6 @@
1010

1111
. "$(dirname "$0")"/variables.sh
1212

13-
docker exec --privileged -it "$CONTAINER_NAME" pkill -f zsh
13+
docker exec --privileged -it "${CONTAINER_NAME}" pkill -f zsh
1414

15-
docker rm -f "$CONTAINER_NAME"
15+
docker rm -f "${CONTAINER_NAME}"

scripts/kill_singularity_instance.sh

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -11,4 +11,4 @@
1111
set -euo pipefail
1212
. "$(dirname "$0")"/variables.sh
1313

14-
singularity instance stop "$CONTAINER_NAME"
14+
singularity instance stop "${CONTAINER_NAME}"

0 commit comments

Comments
 (0)