Skip to content
This repository was archived by the owner on Nov 17, 2023. It is now read-only.

Commit e2ed553

Browse files
authored
[v1.x] Port CICD changes (#21123, #21126 and #21128) from v1.9.x (#21129)
* Refactor CD to support newer cuda versions (11.0-11.7) (#21123) * WIP to add cuda build versions. * WIP to add cuda build versions. * Remove sudo install, moved to CD specific dockerfile. * Allow passing of linker flags for Distribution build type. * Update distribution cmake configs, add new configs for newer cuda versions. * Update cuda versions to build in CD. * Update base images for GPU, add new Cuda 11.6 container. * Correctly set LD_LIBRARY_PATH. * Provide cmake hints in dependency install scripts. * Refactor Cuda dependency installation to simplify and support newer versions. * Add new Dockerfile for CD builds. * Use new CD-specific container for building MXNet static library. * Add Cuda verions. * Upgrade to Python 3.8 in CentOS 7 containers. * Update base images. * Install requirements only if file exists. * Clean up dockerfile * Do not pin Cython, relax scipy version. * Install all build dependencies. * Add documentation. * Add documentation. * Set LD_LIBRARY_PATH to include stubs. * Build cmake from source for portability. * Install hdf5 headers during python install, as it is required for h5py module. * Install any dependencies via yum for cmake build. * Update libtiff to version that builds on aarch64. * Build libtiff and protobuf from source so we can statically link mxnet on aarch64. * Change centos7_aarch64_cpu container to install software using common scripts for consistency. Remove installing protobuf and other depedency libraries so we properly statically link to them. * Install pre-built cmake packages. * Use common method to install cmake. * Update pipelines to use supported cuda versions for static build tests. * Ensure required build tools are installed. * Install required headers for building all R packages. * Add/update make configs for newer Cuda versions. * Install gfortran as build dependency in CD image. * Use ldd to find actual path of dynamically linked libraries instead of guessing. * Add additional Cuda versions for CI testing. * Set minimum OSX version to support via C/CXXFLAGS to match what we build MXNet for. * Don't specify minimum OS version when building MXNet for OSX. * Turns out we can set target OSX version, but recently libtiff introduced zstd support, which doesn't link properly. Disabling support via --disable-zstd works. * Disable zlib, as it was previously. * Disable webp support in libtiff (present only in newer version) * [v1.9.x] Refactor dockerfiles in CI, migrate some ubuntu docker containers to use docker-compose. Update CI to use Cuda 11.7 (#21126) * Remove deprecated dockerfiles. * Update documentation to use different image. * Install Scala in centos7 CD container and build tools. * Update static scala build to use CD container, change julia container. * Removed deprecated Jenkins pipeline files, remove old disabled build steps. * Add new base Dockerfile for docker-compose. * Migrate ubuntu cuda containers to docker-compose. * Build python from source on ubuntu for portability. * Remove old dockerfiles, upgrade nightly gpu image to cuda 11.7. * Remove Cuda versions from runtime function names to simplify. * Update Jenkins pipelines to use newer Cuda containers. * Install LLVM before TVM. * Fix ubuntu TVM install script (was failing but returning true.) * Move cmake install into unified script. * Move cmake install for ubuntu into centralized script. * Update cudnn version passed to builds. * Consolidate installation of packages for efficiency. * Remove unused containers from docker-compose config. * Fix pylint. * Set LD_LIBRARY_PATH on ubuntu_gpu images to find libcuda.so. * Set CUB_IGNORE_DEPRECATED_CPP_DIALECT to prevent build failures with gcc-4.8 + Cuda 11.7. * Install sqlite headers/library before building python on ubuntu. * Revert "Remove unused containers from docker-compose config." This reverts commit 5de82df. * Revert "Set CUB_IGNORE_DEPRECATED_CPP_DIALECT to prevent build failures with gcc-4.8 + Cuda 11.7." This reverts commit e649660. * Allow building CUB with c++11 to prevent failures on newer cuda versions. * Set variable only on gpu make builds. * Use docker-compose to also build ubuntu_cpu image. * We no longer need to enable python3.8 on aarch64 since we are building from source now. * Add Cuda 11.1 and 11.3 centos7 images which is used by CD testing phase. * Don't install python-opencv, we are installing the module via pip instead. * Change Makefile to set CUB_IGNORE_DEPRECATED_CPP_DIALECT when using Cuda, not only for < 11.0. * Don't pin down h5py (old versions do not work on aarch64.) * Conditionally install different versions of h5py dependending on architecture. * Fix value for platform_machine. * Don't install h5py on aarch64 at all. * Set USE_LAPATH_PATH to correct path on ubuntu 18.04. * Rearrange dockerfiles to build more efficiently when small changes occur. Split python install into 2 steps: building python and install requirements. * Since we are not using multi-stage builds, do not specify target to ensure docker cache works as expected. * When building docker-compose based containers, pull the latest version for caching before building. * When pulling docker-compose images, pass quiet option to squell CI logs. * When pulling docker-compose images, pass quiet option to squell CI logs. * Clean up docker cache build code. * [v1.9.x] Restore Cuda 10.x CD builds (#21128) * Create Dockerfile for ubuntu CD, add ccache, install cuda repos in base container instead of adding dynamically and requiring more sudo permissions. * Prevent hanging for user input on package installation. * Update build configs for cuda 10.0, 10.1 and 10.2 to work with centos7 CD. * Update links to other versions to include all supported cuda releases. * Update supported cuda version list. * Add back support for cuda 10.x, change installation design to require cuda repos to be already setup and accessible in the base containers for simplicity. * Use correct script name for installing ccache. * No need to use non-exact matches for variants. * Standardize name for ccache installation script. * Update ccache version and clean up install scripts. * Install libtool in ubuntu CD container. * Restore Cuda 10.x builds for CD. * Dynamically determine which dockerfiles are used by docker-compose (instead of having a hard-coded list) so docker cache refresh will finish successfully. * Remove debug line. * Define python executable path for tensorrt build. * Remove old hacks for changing permissions to /usr/local/bin. * Install libtool in ubuntu r container. * Update permissions to allow CI tasks to run. * Recursively set permissions on deps directory.
1 parent 702e475 commit e2ed553

121 files changed

Lines changed: 2324 additions & 1887 deletions

File tree

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

CMakeLists.txt

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -147,8 +147,8 @@ if(CMAKE_BUILD_TYPE STREQUAL "Distribution" AND UNIX AND NOT APPLE)
147147
set(CMAKE_BUILD_WITH_INSTALL_RPATH ON)
148148
set(CMAKE_INSTALL_RPATH $\{ORIGIN\})
149149
# Enforce DT_PATH instead of DT_RUNPATH
150-
set(CMAKE_SHARED_LINKER_FLAGS "-Wl,--disable-new-dtags")
151-
set(CMAKE_EXE_LINKER_FLAGS "-Wl,--disable-new-dtags")
150+
set(CMAKE_SHARED_LINKER_FLAGS "${CMAKE_SHARED_LINKER_FLAGS} -Wl,--disable-new-dtags")
151+
set(CMAKE_EXE_LINKER_FLAGS "${CMAKE_EXE_LINKER_FLAGS} -Wl,--disable-new-dtags")
152152
set(Protobuf_USE_STATIC_LIBS ON)
153153
endif()
154154

Makefile

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -579,8 +579,9 @@ ALL_DEP = $(OBJ) $(EXTRA_OBJ) $(PLUGIN_OBJ) $(LIB_DEP)
579579

580580
ifeq ($(USE_CUDA), 1)
581581
CUDA_VERSION_MAJOR := $(shell $(NVCC) --version | grep "release" | awk '{print $$6}' | cut -c2- | cut -d '.' -f1)
582+
CFLAGS += -DCUB_IGNORE_DEPRECATED_CPP_DIALECT
582583
ifeq ($(shell test $(CUDA_VERSION_MAJOR) -lt 11; echo $$?), 0)
583-
CFLAGS += -I$(ROOTDIR)/3rdparty/nvidia_cub -DCUB_IGNORE_DEPRECATED_CPP_DIALECT
584+
CFLAGS += -I$(ROOTDIR)/3rdparty/nvidia_cub
584585
endif
585586

586587
ALL_DEP += $(CUOBJ) $(EXTRA_CUOBJ) $(PLUGIN_CUOBJ)

cd/Jenkinsfile_cd_pipeline

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -36,7 +36,7 @@ pipeline {
3636

3737
parameters {
3838
// Release parameters
39-
string(defaultValue: "cpu,native,cu100,cu101,cu102,cu110,cu112,aarch64_cpu", description: "Comma separated list of variants", name: "MXNET_VARIANTS")
39+
string(defaultValue: "cpu,native,cu100,cu102,cu110,cu111,cu112,cu113,cu114,cu115,cu116,cu117,aarch64_cpu", description: "Comma separated list of variants", name: "MXNET_VARIANTS")
4040
booleanParam(defaultValue: false, description: 'Whether this is a release build or not', name: "RELEASE_BUILD")
4141
}
4242

cd/Jenkinsfile_release_job

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -43,7 +43,7 @@ pipeline {
4343
// any disruption caused by different COMMIT_ID values chaning the job parameter configuration on
4444
// Jenkins.
4545
string(defaultValue: "mxnet_lib", description: "Pipeline to build", name: "RELEASE_JOB_TYPE")
46-
string(defaultValue: "cpu,native,cu100,cu101,cu102,cu110,cu112,aarch64_cpu", description: "Comma separated list of variants", name: "MXNET_VARIANTS")
46+
string(defaultValue: "cpu,native,cu100,cu102,cu110,cu111,cu112,cu113,cu114,cu115,cu116,cu117,aarch64_cpu", description: "Comma separated list of variants", name: "MXNET_VARIANTS")
4747
booleanParam(defaultValue: false, description: 'Whether this is a release build or not', name: "RELEASE_BUILD")
4848
string(defaultValue: "nightly_v1.x", description: "String used for naming docker images", name: "VERSION")
4949
}

cd/mxnet_lib/Jenkins_pipeline.groovy

Lines changed: 1 addition & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -55,9 +55,7 @@ def build(mxnet_variant) {
5555
node(NODE_LINUX_CPU) {
5656
ws("workspace/mxnet_${libtype}/${mxnet_variant}/${env.BUILD_NUMBER}") {
5757
ci_utils.init_git()
58-
// Compiling in Ubuntu14.04 due to glibc issues.
59-
// This should be updates once we have clarity on this issue.
60-
ci_utils.docker_run('centos7_cpu', "build_static_libmxnet ${mxnet_variant}", false)
58+
ci_utils.docker_run('centos7_cd', "build_static_libmxnet ${mxnet_variant}", false)
6159
ci_utils.pack_lib("mxnet_${mxnet_variant}", libmxnet_pipeline.get_stash(mxnet_variant))
6260
}
6361
}

cd/utils/mxnet_base_image.sh

Lines changed: 25 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -21,20 +21,38 @@
2121
mxnet_variant=${1:?"Please specify the mxnet variant as the first parameter"}
2222

2323
case ${mxnet_variant} in
24-
cu100*)
24+
cu100)
2525
echo "nvidia/cuda:10.0-cudnn7-runtime-ubuntu18.04"
2626
;;
27-
cu101*)
27+
cu101)
2828
echo "nvidia/cuda:10.1-cudnn7-runtime-ubuntu18.04"
2929
;;
30-
cu102*)
30+
cu102)
3131
echo "nvidia/cuda:10.2-cudnn8-runtime-ubuntu18.04"
3232
;;
33-
cu110*)
34-
echo "nvidia/cuda:11.0-cudnn8-runtime-ubuntu18.04"
33+
cu110)
34+
echo "nvidia/cuda:11.0.3-cudnn8-runtime-ubuntu18.04"
3535
;;
36-
cu112*)
37-
echo "nvidia/cuda:11.2.1-cudnn8-runtime-ubuntu18.04"
36+
cu111)
37+
echo "nvidia/cuda:11.1.1-cudnn8-runtime-ubuntu18.04"
38+
;;
39+
cu112)
40+
echo "nvidia/cuda:11.2.2-cudnn8-runtime-ubuntu18.04"
41+
;;
42+
cu113)
43+
echo "nvidia/cuda:11.3.1-cudnn8-runtime-ubuntu18.04"
44+
;;
45+
cu114)
46+
echo "nvidia/cuda:11.4.3-cudnn8-runtime-ubuntu18.04"
47+
;;
48+
cu115)
49+
echo "nvidia/cuda:11.5.2-cudnn8-runtime-ubuntu18.04"
50+
;;
51+
cu116)
52+
echo "nvidia/cuda:11.6.2-cudnn8-runtime-ubuntu18.04"
53+
;;
54+
cu117)
55+
echo "nvidia/cuda:11.7.1-cudnn8-runtime-ubuntu18.04"
3856
;;
3957
cpu)
4058
echo "ubuntu:18.04"

ci/build.py

Lines changed: 17 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -43,11 +43,8 @@
4343

4444
from util import *
4545

46-
# Files for docker compose
47-
DOCKER_COMPOSE_FILES = set(['docker/build.centos7'])
48-
4946
# keywords to identify arm-based dockerfiles
50-
AARCH_FILE_KEYWORDS = ['aarch64']
47+
AARCH_FILE_KEYWORDS = ['aarch64', 'armv']
5148

5249
def get_dockerfiles_path():
5350
return "docker"
@@ -60,13 +57,20 @@ def get_docker_compose_platforms(path: str = get_dockerfiles_path()):
6057
platforms.add(platform)
6158
return platforms
6259

60+
def get_docker_compose_dockerfiles(path: str = get_dockerfiles_path()):
61+
dockerfiles = set()
62+
with open(os.path.join(path, "docker-compose.yml"), "r") as f:
63+
compose_config = yaml.load(f.read(), yaml.SafeLoader)
64+
for platform in compose_config["services"]:
65+
dockerfiles.add("docker/" + compose_config['services'][platform]['build']['dockerfile'])
66+
return dockerfiles
6367

6468
def get_platforms(path: str = get_dockerfiles_path(), arch=machine()) -> List[str]:
6569
"""Get a list of platforms given our dockerfiles"""
6670
dockerfiles = glob.glob(os.path.join(path, "Dockerfile.*"))
6771
dockerfiles = set(filter(lambda x: x[-1] != '~', dockerfiles))
72+
dockerfiles = dockerfiles - get_docker_compose_dockerfiles()
6873
files = set(map(lambda x: re.sub(r"Dockerfile.(.*)", r"\1", x), dockerfiles))
69-
files = files - DOCKER_COMPOSE_FILES
7074
files.update(["build."+x for x in get_docker_compose_platforms()])
7175
arm_files = set(filter(lambda x: any(y in x for y in AARCH_FILE_KEYWORDS), files))
7276
if arch == 'x86_64':
@@ -187,11 +191,11 @@ def build_docker(platform: str, registry: str, num_retries: int, no_cache: bool,
187191
env["DOCKER_CACHE_REGISTRY"] = registry
188192

189193
@retry(subprocess.CalledProcessError, tries=num_retries)
190-
def run_cmd(env=None):
191-
logging.info("Running command: '%s'", ' '.join(cmd))
192-
check_call(cmd, env=env)
194+
def run_cmd(c, e):
195+
logging.info("Running command: '%s'", ' '.join(c))
196+
check_call(c, env=e)
193197

194-
run_cmd(env=env)
198+
run_cmd(cmd, env)
195199

196200
# Get image id by reading the tag. It's guaranteed (except race condition) that the tag exists. Otherwise, the
197201
# check_call would have failed
@@ -308,23 +312,21 @@ def list_platforms(arch=machine()) -> str:
308312
def load_docker_cache(platform, tag, docker_registry) -> None:
309313
"""Imports tagged container from the given docker registry"""
310314
if docker_registry:
315+
env = os.environ.copy()
316+
env["DOCKER_CACHE_REGISTRY"] = docker_registry
311317
if is_docker_compose(platform):
312318
docker_compose_platform = platform.split(".")[1] if any(x in platform for x in ['build.', 'publish.']) else platform
313-
env = os.environ.copy()
314-
env["DOCKER_CACHE_REGISTRY"] = docker_registry
315319
if "dkr.ecr" in docker_registry:
316320
try:
317321
import docker_cache
318322
docker_cache._ecr_login(docker_registry)
319323
except Exception:
320324
logging.exception('Unable to login to ECR...')
321-
cmd = ['docker-compose', '-f', 'docker/docker-compose.yml', 'pull', docker_compose_platform]
322-
logging.info("Running command: 'DOCKER_CACHE_REGISTRY=%s %s'", docker_registry, ' '.join(cmd))
325+
cmd = ['docker-compose', '-f', 'docker/docker-compose.yml', 'pull', '--quiet', docker_compose_platform]
326+
logging.info("Running command: '%s'", ' '.join(cmd))
323327
check_call(cmd, env=env)
324328
return
325329

326-
env = os.environ.copy()
327-
env["DOCKER_CACHE_REGISTRY"] = docker_registry
328330
# noinspection PyBroadException
329331
try:
330332
import docker_cache

ci/dev_menu.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -134,7 +134,7 @@ def provision_virtualenv(venv_path=DEFAULT_PYENV):
134134
('[Docker] Build the Java API docs - outputs to "docs/scala-package/build/docs/java"',
135135
"ci/build.py --platform ubuntu_cpu_scala /work/runtime_functions.sh build_java_docs"),
136136
('[Docker] Build the Julia API docs - outputs to "julia/docs/site/"',
137-
"ci/build.py --platform ubuntu_cpu_julia /work/runtime_functions.sh build_julia_docs"),
137+
"ci/build.py --platform ubuntu_cpu /work/runtime_functions.sh build_julia_docs"),
138138
('[Docker] Build the R API docs - outputs to "R-package/build/mxnet-r-reference-manual.pdf"',
139139
"ci/build.py --platform ubuntu_cpu_r /work/runtime_functions.sh build_r_docs"),
140140
('[Docker] Build the Scala API docs - outputs to "scala-package/docs/build/docs/scala"',

ci/docker/Dockerfile.build.centos7

Lines changed: 11 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -31,19 +31,27 @@
3131
# "--target" option or docker-compose.yml
3232
####################################################################################################
3333
ARG BASE_IMAGE
34-
FROM $BASE_IMAGE AS base
34+
FROM $BASE_IMAGE
3535

3636
WORKDIR /work/deps
3737

3838
COPY install/centos7_core.sh /work/
3939
RUN /work/centos7_core.sh
40+
41+
COPY install/centos7_cmake.sh /work/
42+
RUN /work/centos7_cmake.sh
43+
4044
COPY install/centos7_ccache.sh /work/
4145
RUN /work/centos7_ccache.sh
42-
COPY install/centos7_python.sh /work/
43-
RUN /work/centos7_python.sh
46+
4447
COPY install/centos7_scala.sh /work/
4548
RUN /work/centos7_scala.sh
4649

50+
COPY install/centos7_python.sh /work/
51+
RUN /work/centos7_python.sh
52+
COPY install/requirements /work/
53+
RUN pip3 install -r /work/requirements
54+
4755
ARG USER_ID=0
4856
COPY install/centos7_adduser.sh /work/
4957
RUN /work/centos7_adduser.sh

ci/docker/Dockerfile.build.centos7_aarch64_cpu

Lines changed: 13 additions & 38 deletions
Original file line numberDiff line numberDiff line change
@@ -19,7 +19,7 @@
1919
# Dockerfile for CentOS 7 AArch64 CPU build.
2020
# Via the CentOS 7 Dockerfiles, we ensure MXNet continues to run fine on older systems.
2121

22-
FROM arm64v8/centos:7
22+
FROM centos:7
2323

2424
WORKDIR /work/deps
2525

@@ -39,47 +39,24 @@ RUN yum -y check-update || true && \
3939
automake \
4040
autoconf \
4141
libtool \
42-
protobuf-compiler \
43-
protobuf-devel \
4442
# CentOS Software Collections https://www.softwarecollections.org
4543
devtoolset-10 \
4644
devtoolset-10-gcc \
4745
devtoolset-10-gcc-c++ \
4846
devtoolset-10-gcc-gfortran \
49-
rh-python38 \
50-
rh-python38-python-numpy \
51-
rh-python38-python-scipy \
5247
# Libraries
53-
opencv-devel \
54-
openssl-devel \
55-
zeromq-devel \
56-
# Build-dependencies for ccache 3.7.9
57-
gperf \
58-
libb2-devel \
59-
libzstd-devel && \
48+
hdf5-devel && \
6049
yum clean all
6150

62-
# Make Red Hat Developer Toolset 10.0 and Python 3.8 Software Collections available by default
51+
# Make Red Hat Developer Toolset 10.0 Software Collection available by default
6352
# during the following build steps in this Dockerfile
64-
SHELL [ "/usr/bin/scl", "enable", "devtoolset-10", "rh-python38" ]
53+
SHELL [ "/usr/bin/scl", "enable", "devtoolset-10" ]
6554

66-
# Install minimum required cmake version
67-
RUN cd /usr/local/src && \
68-
wget -nv https://cmake.org/files/v3.20/cmake-3.20.5-linux-aarch64.sh && \
69-
sh cmake-3.20.5-linux-aarch64.sh --prefix=/usr/local --skip-license && \
70-
rm cmake-3.20.5-linux-aarch64.sh
55+
# Fix the en_DK.UTF-8 locale to test locale invariance
56+
RUN localedef -i en_DK -f UTF-8 en_DK.UTF-8
7157

72-
# ccache 3.7.9 has fixes for caching nvcc outputs
73-
RUN cd /usr/local/src && \
74-
git clone --recursive https://github.com/ccache/ccache.git && \
75-
cd ccache && \
76-
git checkout v3.7.9 && \
77-
./autogen.sh && \
78-
./configure --disable-man && \
79-
make -j$(nproc) && \
80-
make install && \
81-
cd /usr/local/src && \
82-
rm -rf ccache
58+
COPY install/centos7_cmake.sh /work/
59+
RUN /work/centos7_cmake.sh
8360

8461
# Arm Performance Libraries 21.0
8562
RUN cd /usr/local/src && \
@@ -89,13 +66,11 @@ RUN cd /usr/local/src && \
8966
rm -rf arm-performance-libraries_21.0_RHEL-7_gcc-8.2.tar arm-performance-libraries_21.0_RHEL-7_gcc-8.2
9067
ENV LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:/opt/arm/armpl_21.0_gcc-8.2/lib
9168

92-
# Fix the en_DK.UTF-8 locale to test locale invariance
93-
RUN localedef -i en_DK -f UTF-8 en_DK.UTF-8
94-
95-
# Python dependencies
96-
RUN python3 -m pip install --upgrade pip
97-
COPY install/requirements_aarch64 /work/
98-
RUN python3 -m pip install -r /work/requirements_aarch64
69+
# Install Python and dependency packages
70+
COPY install/centos7_python.sh /work/
71+
RUN /work/centos7_python.sh
72+
COPY install/requirements /work/
73+
RUN pip3 install -r /work/requirements
9974

10075
ARG USER_ID=0
10176
COPY install/centos7_adduser.sh /work/

0 commit comments

Comments
 (0)