Skip to content

Commit e66d328

Browse files
jgabler-hpcJoachim Gabler
andauthored
CS-1187 add systemd and cgroups integration (#60)
* EH: CS-1188 control daemons with systemd * avoid endless loop in case an invalid slice is given in the autoinstall template // BelongsTo: CS-1188 * EH: CS-1192 at startup of daemons output the cgroups slice the service is running in * fixed type "deamon" * EH: CS-1223 with systemd integration, move sge_shepherd processes out of the sge_execd service cgroup * sd_bus method StartTransientUnit does only start a job creating the unit and returns before the action has actually finished. Need to wait for the job to be finished. // BelongsTo: CS-1123 * - do not report systemd as init system on ulx-* as we cannot build systemd support in sge_execd, libsystemd.so is too old - fixed broken build on CentOS 8 * * sd_bus error was not reported to caller * error messages were truncated at 100 characters, introduced SFN4 macro for 400 character strings * fixed non-unique message ids * EH: CS-1291 move shepherd child to its own scope * shepherd tried to use systemd on host having systemd library but not having systemd as init system (Antix Linux) * EH: CS-1292 get job online usage information via systemd * tried to connect to systemd on host not having systemd * errors in StartTransientUnit were not always propagated to caller * EH: CS-1294 set job limits via systemd * EH: CS-1315 set binding via systemd * cleanup * EH CS-1295 set device isolation via systemd * EH: CS-1241 add profiling information for systemd operations * - execd profiling could not be disabled again - cleanup, moved code to own module // BelongsTo: CS-1241 * EH: CS-1318 allow to run jobs under systemd control even if sge_execd itself is not started as systemd service * EH: CS-1319 make running jobs under systemd control configurable * added ENABLE_SYSTEMD to sge_conf.5 man page // BelongsTo: CS-1319 * EH: CS-1322 the job specific scopes need to contain the toplevel slice name to be unique * EH: CS-1300 do not add and handle the additional group id for jobs running under systemd * BF: CS-1325 possible race condition between calling StartTransientUnit and waiting for the corresponding job to finish * EH: CS-1296 kill jobs via systemd * EH: CS-1321 allow to configure a hybrid usage data collection (both via systemd and the pdc) * fixed memory leaks * BF: CS-1335 need special handling for interrupted system call * EH: CS-1342 add systemd specific settings (toplevel slice name) to the installation guide * cleanup and added systemd integration to the release notes * cleanup * - addressed review comments - fixed a race condition leading to multiple execd children trying to create the shepherds.scope * added more details of the systemd integration to the release notes * addressed review comments * refactoring and documentation with Doxygen headers * EH: CS-1408 USAGE_COLLECTION mode must be kept consistent for running jobs * EH: CS-1419 disable systemd integration if sge_execd is started as non privileged user * with HYBRID usage collection non systemd hosts didn't report cpu and rss * reprioritization code was broken by systemd integration // SeeAlso: CS-1421 * - improved diagnostics when ptf job / osjob cannot be found - enforce cleanup in execd only when KEEP_ACTIVE is changed to FALSE * BF: CS-1019 sge_execd logs errors when running tightly integrated parallel jobs * BF: CS-1425 backup/restore does not handle $SGE_ROOT/$SGE_CELL/slice_name * BF: CS-1429 sge_qmaster can segfault on qdel -f * BF: CS-1019 sge_execd logs errors when running tightly integrated parallel jobs * BF: CS-1430 running tightly integrated parallel jobs leaves systemd slices // + additional cleanup * fix to the fix for CS-1019 * added missing files --------- Co-authored-by: Joachim Gabler <joga.oge@gabler-net.de>
1 parent 1c0f383 commit e66d328

100 files changed

Lines changed: 4593 additions & 883 deletions

File tree

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

CMakeLists.txt

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -88,6 +88,7 @@ option(WITH_LCOV "Enable code coverage analysis with lcov" OFF)
8888
option(WITH_PYTHON "Enable Python external bindings" OFF)
8989
option(WITH_BOOST "Enable Boost framework" OFF)
9090
option(WITH_MUNGE "Enable Munge authentication" ON)
91+
option(WITH_SYSTEMD "Enable systemd support" OFF)
9192

9293
# private extensions
9394
set(PROJECT_EXTENSIONS "None" CACHE STRING "directory of private extensions")
@@ -188,6 +189,10 @@ if (WITH_MUNGE)
188189
add_compile_definitions("OCS_WITH_MUNGE")
189190
endif()
190191

192+
if (WITH_SYSTEMD)
193+
add_compile_definitions("OCS_WITH_SYSTEMD")
194+
endif()
195+
191196
#if (SGE_ARCH MATCHES "darwin-arm64" OR SGE_ARCH MATCHES "fbsd-amd64")
192197
if (NOT WITH_SPOOL_BERKELEYDB AND NOT WITH_SPOOL_DYNAMIC)
193198
set(SPOOLING_LIBS spoolloader spoolc_static spool)

cmake/ArchitectureSpecificSettings.cmake

Lines changed: 18 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -61,25 +61,7 @@ function(architecture_specific_settings)
6161
message("Build with extensions is enabled")
6262
endif()
6363

64-
if (SGE_ARCH MATCHES "lx-riscv64")
65-
# Linux RiscV
66-
message(STATUS "We are on Linux: ${SGE_ARCH}")
67-
set(CMAKE_C_FLAGS "-Wall -Werror -pedantic" CACHE STRING "" FORCE)
68-
set(CMAKE_CXX_FLAGS "-Wall -Werror -pedantic" CACHE STRING "" FORCE)
69-
70-
add_compile_definitions(LINUX _GNU_SOURCE GETHOSTBYNAME_R6 GETHOSTBYADDR_R8 HAS_IN_PORT_T SPOOLING_dynamic __SGE_COMPILE_WITH_GETTEXT__)
71-
add_compile_options(-fPIC)
72-
add_compile_options(-pthread)
73-
add_link_options(-pthread -rdynamic)
74-
75-
set(TIRPC_INCLUDES /usr/include/tirpc PARENT_SCOPE)
76-
set(TIRPC_LIB tirpc PARENT_SCOPE)
77-
message(STATUS "using libtirpc")
78-
79-
set(WITH_JEMALLOC OFF PARENT_SCOPE)
80-
set(WITH_MTMALLOC OFF PARENT_SCOPE)
81-
set(JNI_ARCH "linux" PARENT_SCOPE)
82-
elseif (SGE_ARCH MATCHES "lx-.*" OR SGE_ARCH MATCHES "ulx-.*" OR SGE_ARCH MATCHES "xlx-.*")
64+
if (SGE_ARCH MATCHES "lx-.*" OR SGE_ARCH MATCHES "ulx-.*" OR SGE_ARCH MATCHES "xlx-.*")
8365
# master is not supported on CentOS 6. Execd is deprecated and will be removed in the future.
8466
if (SGE_ARCH STREQUAL "xlx-.*")
8567
set(INSTALL_SGE_BIN_MASTER OFF CACHE BOOL "Install master daemon binaries" FORCE)
@@ -166,6 +148,23 @@ function(architecture_specific_settings)
166148
message(STATUS "no libtirpc or libntirpc found")
167149
endif ()
168150

151+
# build with systemd?
152+
# @todo we might want to check the api version, we need at least
153+
# - 235: here FreezeUnit and ThawUnit were added (not required, we work around this not being available)
154+
# - 231: 240? here sd_bus_process() was added (not required, we work around this)
155+
# - 221: here StopUnit was added
156+
# Our build hosts are OK as it is (RHEL-8 compatible for lx-* has a recent enough version,
157+
# RHEL-7 compatible for ulx-* does not have it at all)
158+
if (EXISTS /usr/include/systemd/sd-bus.h)
159+
set(WITH_SYSTEMD ON PARENT_SCOPE CACHE STRING "" FORCE)
160+
message(STATUS "systemd development files found")
161+
endif()
162+
163+
if (SGE_ARCH MATCHES "lx-riscv64")
164+
# Linux RiscV
165+
add_compile_options(-fPIC)
166+
set(WITH_JEMALLOC OFF PARENT_SCOPE)
167+
endif()
169168
if (SGE_ARCH STREQUAL "lx-x86" OR SGE_ARCH STREQUAL "ulx-x86" OR SGE_ARCH STREQUAL "xlx-x86")
170169
# we need patchelf for setting the run path in the db_* tools
171170
# but patchelf is not available on CentOS 7 x86

doc/markdown/man/man5/sge_conf.md

Lines changed: 20 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1044,11 +1044,19 @@ completely.
10441044

10451045
***ENABLE_BINDING***
10461046

1047-
If this parameter is set then xxQS_NAMExx enables the core binding module within the execution daemon to apply
1048-
binding parameters that are specified during submission time of a job. This parameter is not set per default and
1047+
If this parameter is set, then xxQS_NAMExx enables the core binding module within the execution daemon to apply
1048+
binding parameters that are specified during submission time of a job. This parameter is not set per default, and
10491049
therefore all binding related information will be ignored. Find more information for job to core binding in the
10501050
section `-binding` of qsub(1).
10511051

1052+
***ENABLE_SYSTEMD***
1053+
1054+
If this parameter is set,
1055+
and an execution hosts supports systemd, then jobs will be started in a systemd scope. This allows the execution daemon to
1056+
manage the job's processes as a group, which is useful for resource management and job control.
1057+
1058+
This parameter is set to true by default, meaning that on hosts that support systemd, jobs will be started in a systemd scope. If a host does not support systemd, then this parameter will be ignored.
1059+
10521060
***SCRIPT_TIMEOUT***
10531061

10541062
This parameter allows to configure the allowed runtime of execution side scripts like prolog, epilog, and the PE
@@ -1060,6 +1068,15 @@ in one load report interval. The default for *execd_params* is none.
10601068

10611069
The global configuration entry for this value may be overwritten by the execution host local configuration.
10621070

1071+
***USAGE_COLLECTION***
1072+
1073+
This parameter controls how xxqs_name_sxx_execd collects the online usage information of jobs. The following values are recognized:
1074+
1075+
- *FALSE* : No online usage information is collected. Use with care, this also disables limit enforcement for *s_cpu*, *h_cpu*, *s_rss*, *h_rss*, *s_vmem*, and *h_vmem*.
1076+
- *PDC* : Online usage information is collected by the PDC (Portable Data Collector) mode, even if Systemd is available.
1077+
- *HYBRID* : Hybrid mode, where online usage information is both gathered via Systemd (if available) and the PDC. Use this mode, when your jobs are controlled by systemd, but you also want to collect usage information for jobs that is not available via Systemd, e.g., vmem, maxvmem, io, and iow.
1078+
- *TRUE* : This is the default mode. Online usage information is collected via Systemd if the host supports Systemd and *ENABLE_SYSTEMD* is set to *TRUE* (which is the default). It is collected by the PDC (Portable Data Collector) if the host does not support Systemd or if *ENABLE_SYSTEMD* is set to *FALSE*.
1079+
10631080
## gdi_request_limits
10641081

10651082
This value is a global configuration parameter only, and is used to prevent denial-of-service attacks on the xxqs_name_sxx_qmaster(8) process.
@@ -1375,3 +1392,4 @@ xxqs_name_sxx_shepherd*(8), cron(8),
13751392
# COPYRIGHT
13761393

13771394
See xxqs_name_sxx_intro(1) for a full statement of rights and permissions.
1395+

doc/markdown/manual/development-guide/00_overview.md

Lines changed: 6 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -34,7 +34,12 @@ Tags and branches before `V9` will also not be described here.
3434
| | V900p1\_TAG | patch to the 9.0.0 making it work on GCP (CS-663) |
3535
| | | |
3636
| V90\_BRANCH | | maintenance of 9.0 |
37+
| | V903\_TAG | third 9.0 patch |
38+
| | V904\_TAG | fourth 9.0 patch |
39+
| | V905\_TAG | fifth 9.0 patch |
40+
| | V906\_TAG | sixth 9.0 patch |
41+
| | V907\_TAG | seventh 9.0 patch |
3742
| | | |
3843

39-
[//]: # (Each file has to end with two emty lines)
44+
[//]: # (Each file has to end with two empty lines)
4045

doc/markdown/manual/development-guide/01_prepare_dev_env.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -267,5 +267,5 @@ git clone https://github.com/hpc-gridware/gcs-extensions
267267
git clone https://github.com/hpc-gridware/gcs-testsuite
268268
```
269269

270-
[//]: # (Each file has to end with two emty lines)
270+
[//]: # (Each file has to end with two empty lines)
271271

doc/markdown/manual/development-guide/02_build_configuration.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -188,5 +188,5 @@ Here we use *CLion* as example because it provides full integration with CMake t
188188

189189
Next step is to build and install xxQS_NAMExx.
190190

191-
[//]: # (Eeach file has to end with two emty lines)
191+
[//]: # (Eeach file has to end with two empty lines)
192192

doc/markdown/manual/development-guide/03_build_and_installation.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -21,5 +21,5 @@ make install
2121
You can now either install the product (follow the instructions in the *Installation Guide*) or you can continue to
2222
setup the automated test environment as described in the next chapter.
2323

24-
[//]: # (Eeach file has to end with two emty lines)
24+
[//]: # (Eeach file has to end with two empty lines)
2525

doc/markdown/manual/development-guide/05_simulating_hosts_and_execution.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -101,5 +101,5 @@ Instead, the job execution is just simulated.
101101

102102
@todo add details
103103

104-
[//]: # (Eeach file has to end with two emty lines)
104+
[//]: # (Eeach file has to end with two empty lines)
105105

doc/markdown/manual/development-guide/25_scheduler_thread.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -296,5 +296,5 @@ we can switch off a few potentially expensive features and just rely on scheduli
296296
* do not configure queue load_thresholds and suspend_thresholds
297297
* do not use load adjustments (in the scheduler config)
298298

299-
[//]: # (Eeach file has to end with two emty lines)
299+
[//]: # (Eeach file has to end with two empty lines)
300300

doc/markdown/manual/installation-guide/01_planning_the_installation.md

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -280,7 +280,7 @@ Please refer to the next question for more information.
280280

281281
### Where is the spooling area for the master service located?
282282

283-
For HA-setups, it must be a shared network location; otherwise, it can be the local filesystem of the host
283+
For HA setups, it must be a shared network location; otherwise, it can be the local filesystem of the host
284284
running the master service.
285285

286286
Ensure that the spooling location meets the requirements of the spooling mechanism. Classic spooling can be done on
@@ -359,4 +359,5 @@ If this is your first time installing xxQS_NAMExx, we suggest a manual installat
359359
Automatic installation is recommended if you need to install or reinstall a cluster multiple times or if you plan
360360
to install multiple clusters with slightly different settings.
361361

362-
[//]: # (Eeach file has to end with two emty lines)
362+
[//]: # (Eeach file has to end with two empty lines)
363+

0 commit comments

Comments
 (0)