Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
96 changes: 96 additions & 0 deletions dirac.cfg
Original file line number Diff line number Diff line change
Expand Up @@ -1158,6 +1158,102 @@ Operations
pilotVORepoBranch = master # Branch to use
workDir = /tmp/pilot3Files # Local work directory on the masterCS for synchronisation
}

# RSS section
# See https://dirac.readthedocs.io/en/latest/AdministratorGuide/Systems/ResourceStatus/configuration.html
ResourceStatus
{
Config
{
Cache = 720 # Lifetime (seconds) of the RSSCache (default: 300)
FromAddress = rss@dirac # Email address used as sender for RSS notifications
}
Policies
{
# Command arguments for the built-in Downtime policy type.
# hours = 0 means only ongoing downtimes are considered (default).
# Set hours > 0 to also catch downtimes starting within that window.
# Note: this section has no policyType key and is therefore NOT treated
# as a policy definition — it only sets command argument defaults.
Downtime
{
hours = 0 # look-ahead window in hours (0 = ongoing only, default)
}
# Command arguments for the built-in FreeDiskSpace policy type.
# Unit and thresholds apply to all SEs monitored by this policy.
# Note: same as above — no policyType key, so not a policy definition.
FreeDiskSpace
{
Unit = TB # Space unit: TB (default), GB or MB
Banned_threshold = 0.1 # Free space below which the SE is Banned (in the chosen unit)
Degraded_threshold = 5 # Free space below which the SE is Degraded (in the chosen unit)
}
# Example: apply Downtime policy to all Sites
SiteDowntime
{
policyType = Downtime
matchParams
{
element = Site
}
}
# Example: apply Downtime policy to all Resources
ResourceDowntime
{
policyType = Downtime
matchParams
{
element = Resource
}
}
# Example: apply FreeDiskSpace policy to all SE WriteAccess status types
SEWriteAccessFreeDiskSpace
{
policyType = FreeDiskSpace
matchParams
{
element = Resource
elementType = StorageElement
statusType = WriteAccess
}
}
# Example: apply FreeDiskSpace to SE1 with specific args (Unit and Banned_threshold);
# Degraded_threshold falls back to the default defined in the FreeDiskSpace section above.
SpecificFreeDiskSpace
{
policyType = FreeDiskSpace
Unit = GB
Banned_threshold = 15
matchParams
{
name = SE1
}
}
}
PolicyActions
{
# Example: send an email when any Resource reaches Banned status
BannedResourceEmail
{
actionType = EmailAction
notificationGroups = RSSAdmins
matchParams
{
element = Resource
status = Banned
}
}
}
Notification
{
RSSAdmins
{
users = admin@dirac # email addresses used for the notifications
}
}
}

# Services section
Services
{
# See http://dirac.readthedocs.io/en/latest/AdministratorGuide/Resources/Catalog/index.html
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -66,6 +66,82 @@ we cannot define the following matchParams:

Code templates and examples for creating custom policies: :doc:`../../../DeveloperGuide/Systems/ResourceStatus/index`

Built-in Downtime Policy
~~~~~~~~~~~~~~~~~~~~~~~~

The ``Downtime`` policy type evaluates GOCDB downtime data for a Site or Resource.
Severity is mapped to RSS status as follows:

* **OUTAGE** → **Banned**
* **WARNING** → **Degraded**
* No downtime → **Active**

The look-ahead window is configurable from the Operations CS:

::

/Operations/Defaults/ResourceStatus
/Policies
/Downtime
hours = 0 # hours to look ahead (0 = ongoing only, default)

.. note::

Setting ``hours = 0`` (the default) means only downtimes that are currently ongoing
are considered. Setting a positive value (e.g. ``12``) also catches downtimes scheduled
to start within that window, which is useful for proactive status changes.

This section has no ``policyType`` key and is therefore treated purely as
command-argument defaults, not as a policy definition.

Example: flag elements with downtimes starting within the next 24 hours::

/Operations/Defaults/ResourceStatus/Policies/Downtime
{
hours = 24
}

Built-in FreeDiskSpace Policy
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

The ``FreeDiskSpace`` policy type monitors Storage Element occupancy.
It compares the free space reported by the SE against two configurable thresholds:

* If free space is below ``Banned_threshold``, the SE is set to **Banned**.
* If free space is below ``Degraded_threshold`` (but above ``Banned_threshold``), the SE is set to **Degraded**.
* Otherwise the SE is set to **Active**.

All three parameters — unit, banned threshold, and degraded threshold — are fully configurable
from the Operations CS and fall back to safe defaults:

::

/Operations/Defaults/ResourceStatus
/Policies
/FreeDiskSpace
Unit = TB # unit for the SE occupancy query (TB, GB or MB)
Banned_threshold = 0.1 # in the chosen unit (default)
Degraded_threshold = 5 # in the chosen unit (default)

.. note::

These keys live under ``/Operations/Defaults/ResourceStatus/Policies/FreeDiskSpace``,
not under the ``/matchParams`` sub-section. They tune the **command arguments**, not the
element-matching logic. This section has no ``policyType`` key and is therefore not treated
as a policy definition by the policy engine.

The default values of ``0.1`` and ``5`` are always used as fallback regardless of unit.
Make sure to set meaningful threshold values explicitly in the CS when changing the unit.

Example: use GB with tighter thresholds::

/Operations/Defaults/ResourceStatus/Policies/FreeDiskSpace
{
Unit = GB
Banned_threshold = 100
Degraded_threshold = 5000
}

-------------
PolicyActions
-------------
Expand Down
20 changes: 19 additions & 1 deletion docs/source/DeveloperGuide/Systems/ResourceStatus/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -115,7 +115,7 @@ Cache tables for metrics used by policies.
* - PolicyResult
- Policy evaluation results (Element, Name, PolicyName, Status, Reason)
* - SpaceTokenOccupancyCache
- Storage space usage (Endpoint, Token, Free, Guaranteed)
- Storage space usage (Endpoint, Token, Free, Total) — values stored in MB
* - TransferCache
- Transfer quality metrics (SourceName, DestinationName, Metric, Value)

Expand Down Expand Up @@ -215,6 +215,24 @@ Policies inherit from ``PolicyBase`` and implement ``evaluate()``.
return {'Status': 'Degraded', 'Reason': f'Low efficiency: {efficiency:.2%}'}
return {'Status': 'Banned', 'Reason': f'Very low efficiency: {efficiency:.2%}'}

FreeDiskSpace Policy
--------------------

The ``FreeDiskSpacePolicy`` (``Policy/FreeDiskSpacePolicy.py``) evaluates SE occupancy using
configurable thresholds. Thresholds are passed through as command arguments so they propagate
from the CS configuration all the way to the policy evaluation:

1. ``Configurations.py`` reads ``Unit``, ``Banned_threshold`` and ``Degraded_threshold`` from the
Operations CS via ``Operations().getValue("ResourceStatus/Policies/FreeDiskSpace/Banned_threshold", 0.1)``
and stores them in the policy ``args`` dict.
2. ``FreeDiskSpaceCommand`` reads these values from ``self.args`` in ``_prepareCommand()`` and
returns them alongside ``Free`` and ``Total`` in both ``doNew()`` and ``doCache()``.
3. ``FreeDiskSpacePolicy._evaluate()`` reads ``Banned_threshold`` and ``Degraded_threshold``
from the command result dict (with safe defaults) and applies the comparison.

This design keeps thresholds fully configurable per deployment without code changes.
See :ref:`rss_advanced_configuration` for the available CS keys.

Command Implementation
----------------------

Expand Down
Loading
Loading