Skip to content

Releases: hpc-gridware/clusterscheduler

OCS/GCS v9.0.12

20 Apr 09:04

Choose a tag to compare

Open Cluster Scheduler and Gridware Cluster Scheduler v9.0.11 are available for download from the HPC-Gridware Download Page.

OCS/GCS v9.0.11

18 Feb 08:59

Choose a tag to compare

Open Cluster Scheduler and Gridware Cluster Scheduler v9.0.11 are available for download from the HPC-Gridware Download Page.

OCS/GCS v9.0.10

14 Dec 18:31

Choose a tag to compare

Open Cluster Scheduler and Gridware Cluster Scheduler v9.0.10 are available for download from the HPC-Gridware Download Page.

OCS/GCS v9.0.9

16 Nov 16:56

Choose a tag to compare

Open Cluster Scheduler and Gridware Cluster Scheduler v9.0.9 are available for download from the HPC-Gridware Download Page.

OCS/GCS v9.0.8

28 Aug 12:51

Choose a tag to compare

Open Cluster Scheduler and Gridware Cluster Scheduler v9.0.8 are available for dowbload: https://www.hpc-gridware.com/download-main/

OCS/GCS v9.0.7

08 Jul 19:38

Choose a tag to compare

OCS/GCS v9.0.6

21 May 17:56

Choose a tag to compare

V906_TAG

OCS/GCS 9.0.6 release

OCS/GCS v9.0.5

15 Apr 19:15

Choose a tag to compare

Major Enhancements

v9.0.5

qtelemetry (Developer Preview in GCS)

This release introduces qtelemetry, a new metrics exporter for Gridware Cluster Scheduler (GCS). It allows administrators to easily collect and expose cluster metrics for monitoring and observability purposes.

Features:

  • Simple integration with Prometheus and Grafana
  • Export cluster metrics, including:
  • Host metrics (CPU load, GPU availability, memory usage, and many more)
  • Job metrics (queued, running, errored, waiting time, and many more)
  • qmaster statistics (CPU/memory usage of sge_qmaster, spooling filesystem information)
  • Optional per-job metric export for detailed insights (recommended only for very small workloads)
  • Built-in support for pre-configured Grafana dashboard:
  • Grafana dashboard example.

Quick Start:

By default, qtelemetry exports metrics on port 9464 from the /metrics endpoint:

./qtelemetry start

Enable additional metrics sources using command-line flags:

# Export exec host and qmaster metrics

./qtelemetry start --enableExecd --enableMaster

# Export individual job-level metrics (for smaller systems)

./qtelemetry start --singleJobs

(Available in Gridware Cluster Scheduler only)

Out of the Box Support of various MPI Distributions

The $SGE_ROOT/mpi directory contains templates of the PE configuration for the following MPI distributions:

  • Intel MPI
  • mpich
  • mvapich
  • openmpi

They can be added by simply calling qconf -Ap <path to template> and will add the PE configuration for running jobs using the given MPI in tight integration.

In addition build scripts for mpich, mvapich, and openmpi give an example on how the MPI distribution can be built and installed. The build scripts are located in $SGE_ROOT/mpi/<mpi name>/build.sh.

$SGE_ROOT/mpi/examples contains a MPI example written in the C language.
It can be run as tightly integrated parallel job in any of the MPI distributions mentioned above
and supports checkpointing and restart.

It comes with documentation, build script, job script and a template of a ckeckpointing enviroment.

(Available in Open Cluster Scheduler and Gridware Cluster Scheduler)

Easier Creation of Configuration Templates

Configuration objects can now contain the additional special variables $sge_root and $sge_cell for
paths to scripts, e.g. for

  • prolog and epilog in the global config and queue configurations
  • starter_method, suspend_method, resume_method, and terminate_method in the queue configuration
  • start_proc_args and stop_proc_args in the parallel environment configuration
  • ckpt_command, migr_command, restart_command, and clean_command in the checkpointing environment

This allows to have configuration templates that can be used in different environments without
the need to modify the paths before applying the configuration.

A list of all special variables is given in the sge_conf.5 man page in the prolog section.

(Available in Open Cluster Scheduler and Gridware Cluster Scheduler)

Full List of Fixes

Release notes - Cluster Scheduler

v9.0.5

Improvement

CS-342 provide an openmpi integration

CS-343 provide an example and test program using MPI

CS-791 sge_root should be available as special variable in the configuration of prolog, epilog, queue, pe, ckpt

CS-914 Make ARCH script more robust

CS-1090 qstat -r shall report resource requests by scope

CS-1094 Update sge_pe.md to better explain PE_HOSTFILE

CS-1114 Add GPU monitoring examples to qtelemetry Grafana dashboard

CS-1115 Build qtelemetry in containers for lx-amd64 and lx-arm64

CS-1126 in the environment of tasks of tightly integrated parallel jobs set the pe_task_id

CS-1128 Add enroot to worker GPU VM image for GCP

CS-1143 provide a MPICH integration

CS-1144 provide a MVAPICH integration

CS-1145 provide an Intel MPI integration

CS-1146 cleanup and document the ssh wrapper MPI template and scripts

CS-1152 add a checktree_mpi to testsuite with configuration and tests making use of the various MPI integrations

CS-1158 Add qtelemetry Grafana dashboard to public Grafana Cloud Dashboards

New Feature

CS-1091 Clearly document the slots syntax in man5 sge_queue_conf.md

Sub-task

CS-697 Jenkins: enable issue_3013

CS-698 Jenkins: enable issue_3179

Task

CS-662 verify delayed job reporting of sge_execd after reconnecting to sge_qmaster

CS-1117 Add qtelemetry as developer preview to GCS distribution

CS-1118 Create a packer file which builds a GPU enabled VM with and without GCS for fast deployment on GCP

CS-1125 Provide a basic examples of how enroot can be used with the GPU integration

CS-1134 message cutoff after 8 characters

CS-1136 add checktree_qtelemetry to all build environments + Jenkins setup

Bug

CS-430 booking of resources into advance reservations needs to distinguish between host and queue resources

CS-722 env_list in qstat should show NONE if not set

CS-1028 qtelemetry should support NVIDIA loadsensor values for hosts

CS-1085 BDB build error on lx-riscv64 after OS update.

CS-1096 USE_QSUB_GID functionality fails on FreeBSD 14

CS-1111 minimum and maximum thread counts in the bootstrap.5 man page are incorrect

CS-1131 wallclock time reported for tasks of a tightly integrated parallel job is incorrect

CS-1139 job deletion via JAPI/DRMAA fails if job ID exceeds INT_MAX

CS-1140 termination of event client via JAPI fails if event client ID exceeds INT_MAX

CS-1141 MacOS build broken due to unavailability of getgrouplist()

CS-1163 when a queue is signalled then additional invalid entries are created in the berkeleydb spooling database

OCS/GCS v9.0.4

05 Mar 18:56

Choose a tag to compare

v9.0.4

IT IS STRONGLY RECOMMENDED TO UPGRADE TO PATCH v9.0.4

OCS/GCS v9.0.3

12 Feb 22:11

Choose a tag to compare

Patch release. Prebuild packages are available here: https://www.hpc-gridware.com/download-main/