Skip to content

Latest commit

 

History

History
130 lines (90 loc) · 4.42 KB

File metadata and controls

130 lines (90 loc) · 4.42 KB

Systems configuration

Each DIRAC system has its corresponding section in the Configuration namespace.

Note

The configuration options for services and agents are being moved to the :ref:`Code Documentation <code_documentation>`. You can find the options for each service and agent on the individual documentation page of the respective agent or service.

.. toctree::
   :maxdepth: 1

   Accounting/index
   Configuration/index
   DataManagement/index
   WorkloadManagement/index
   /CodeDocumentation/RequestManagementSystem/RequestManagementSystem_Module
   Framework/index
   StorageManagement/index
   Transformation/index


Default structure

In each system, per setup, you normally find the following sections:

  • Agents: definition of each agent
  • Services: definition of each service
  • Databases: definition of each db
  • URLs: Resolution of the URL of a given Service (like 'DataManagement/FileCatalog') to a list of real urls (like 'dips://<host>:<port>/DataManagement/FileCatalog'). They are tried in a random order.
  • FailoverURLs: Like URLs, but they are only tried if no server in URLs was successfully contacted.

Preferred URLs

For most services, the standard URLs and FailoverURLs mechanism provides a way to specify primary and backup service endpoints.

However, this approach has limitations in certain scenarios:

  • Some services (like the Configuration service) have replicas that automatically register themselves in the Configuration System
  • External servers ("voboxes") running at sites may not be accessible from all clients
  • Connection attempts to inaccessible servers cause errors that, while harmless due to fallback mechanisms, slow down DIRAC and generate misleading error messages

To address these issues, you can define a PreferredURLPatterns that identifies a subset of URLs to try first:

System
{
  URLs
  {
    Service = dips://host1.main.invalid:1234/System/Service,dips://host2.main.invalid:1234/System/Service,dips://external.invalid:1234/System/Service
  }
}
DIRAC
{
  PreferredURLPatterns = .*\.main\.invalid/.*
}

In this example:

  1. The PreferredURLPatterns specifies a regular expression that matches servers in the main.invalid domain
  2. When connecting to the service, DIRAC will first try URLs matching this pattern (host1.main.invalid and host2.main.invalid)
  3. Only if these preferred servers fail will DIRAC attempt to connect to other servers (external.invalid)

This approach reduces connection errors and improves performance by prioritizing servers that are more likely to be accessible from the client.

Note

The PreferredURLPatterns is a list of regular expressions, not a single regular expression. This allows you to specify multiple patterns to match different subsets of servers if desired.

Main Servers

There might be setup in which all services are installed behind one or several dns alias(es) or gateways (typically orchestrator like Mesos/Kubernetes). When this is the case, it can be bothering to redefine the very same URL everywhere, especially the day the machine name changes.

For this reason, there is the possibility to define a entry in the Operation section which contains the list of servers:

Operations/MainServers = server1, server2

There should be no port, no protocol. In the system configuration, one can then write:

System
{
  URLs
  {
    Service = dips://$MAINSERVERS$:1234/System/Service
  }
}

This will resolve in the following 2 urls:

dips://server1:1234/System/Service, dips://server2:1234/System/Service

Using together the FailoverURLs section, it can be interesting for orchestrator's setup, where there is a risk for the whole cluster to go down:

System
{
  URLs
  {
    Service = dips://$MAINSERVERS$:1234/System/Service
  }
  FailoverURLs
  {
    Service = dips://failover1:1234/System/Service,dips://failover2:1234/System/Service
  }
}
Operations
{
  Defaults
  {
    MainServers = gateway1, gateway2
  }
}

This results in all calls going to gateway1 and gateway2, which could be frontend to your orchestrator, and only if none of them answers, then do we use failover1 and failover2, which can be installed on separate machines, independent from the orchestrator