Skip to content

Commit 83963f4

Browse files
jgabler-hpcJoachim Gabler
authored andcommitted
CS-1187 add systemd and cgroups integration (#60)
* EH: CS-1188 control daemons with systemd * avoid endless loop in case an invalid slice is given in the autoinstall template // BelongsTo: CS-1188 * EH: CS-1192 at startup of daemons output the cgroups slice the service is running in * fixed type "deamon" * EH: CS-1223 with systemd integration, move sge_shepherd processes out of the sge_execd service cgroup * sd_bus method StartTransientUnit does only start a job creating the unit and returns before the action has actually finished. Need to wait for the job to be finished. // BelongsTo: CS-1123 * - do not report systemd as init system on ulx-* as we cannot build systemd support in sge_execd, libsystemd.so is too old - fixed broken build on CentOS 8 * * sd_bus error was not reported to caller * error messages were truncated at 100 characters, introduced SFN4 macro for 400 character strings * fixed non-unique message ids * EH: CS-1291 move shepherd child to its own scope * shepherd tried to use systemd on host having systemd library but not having systemd as init system (Antix Linux) * EH: CS-1292 get job online usage information via systemd * tried to connect to systemd on host not having systemd * errors in StartTransientUnit were not always propagated to caller * EH: CS-1294 set job limits via systemd * EH: CS-1315 set binding via systemd * cleanup * EH CS-1295 set device isolation via systemd * EH: CS-1241 add profiling information for systemd operations * - execd profiling could not be disabled again - cleanup, moved code to own module // BelongsTo: CS-1241 * EH: CS-1318 allow to run jobs under systemd control even if sge_execd itself is not started as systemd service * EH: CS-1319 make running jobs under systemd control configurable * added ENABLE_SYSTEMD to sge_conf.5 man page // BelongsTo: CS-1319 * EH: CS-1322 the job specific scopes need to contain the toplevel slice name to be unique * EH: CS-1300 do not add and handle the additional group id for jobs running under systemd * BF: CS-1325 possible race condition between calling StartTransientUnit and waiting for the corresponding job to finish * EH: CS-1296 kill jobs via systemd * EH: CS-1321 allow to configure a hybrid usage data collection (both via systemd and the pdc) * fixed memory leaks * BF: CS-1335 need special handling for interrupted system call * EH: CS-1342 add systemd specific settings (toplevel slice name) to the installation guide * cleanup and added systemd integration to the release notes * cleanup * - addressed review comments - fixed a race condition leading to multiple execd children trying to create the shepherds.scope * added more details of the systemd integration to the release notes * addressed review comments * refactoring and documentation with Doxygen headers * EH: CS-1408 USAGE_COLLECTION mode must be kept consistent for running jobs * EH: CS-1419 disable systemd integration if sge_execd is started as non privileged user * with HYBRID usage collection non systemd hosts didn't report cpu and rss * reprioritization code was broken by systemd integration // SeeAlso: CS-1421 * - improved diagnostics when ptf job / osjob cannot be found - enforce cleanup in execd only when KEEP_ACTIVE is changed to FALSE * BF: CS-1019 sge_execd logs errors when running tightly integrated parallel jobs * BF: CS-1425 backup/restore does not handle $SGE_ROOT/$SGE_CELL/slice_name * BF: CS-1429 sge_qmaster can segfault on qdel -f * BF: CS-1019 sge_execd logs errors when running tightly integrated parallel jobs * BF: CS-1430 running tightly integrated parallel jobs leaves systemd slices // + additional cleanup * fix to the fix for CS-1019 * added missing files --------- Co-authored-by: Joachim Gabler <joga.oge@gabler-net.de>
1 parent ada9e08 commit 83963f4

1 file changed

Lines changed: 5 additions & 0 deletions

File tree

source/daemons/qmaster/sge_follow.cc

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -617,8 +617,13 @@ sge_follow_order(lListElem *ep, char *ruser, char *rhost, lList **topp, monitori
617617
}
618618

619619
// @todo: can this be summarized with the mod event that will set the job in t-state?
620+
<<<<<<< HEAD
620621
sge_add_event(now, enrolled_task ? sgeE_JATASK_ADD : sgeE_JATASK_MOD, job_number, task_number,
621622
nullptr, nullptr, lGetString(jep, JB_session), jatp, gdi_session);
623+
=======
624+
sge_add_event(now, sgeE_JATASK_ADD, job_number, task_number,
625+
nullptr, nullptr, lGetString(jep, JB_session), jatp, gdi_session);
626+
>>>>>>> 828cffed1 (CS-1187 add systemd and cgroups integration (#60))
622627

623628
if (sge_give_job(jep, jatp, master_qep, master_host, monitor, gdi_session)) {
624629
/* setting of queues in state unheard is done by sge_give_job() */

0 commit comments

Comments
 (0)