Skip to content

Commit 046e0f7

Browse files
authored
security: SafeguardDotNet 8.2.4 — reconnect backoff and documentation (#224)
* docs(samples): document SafeguardIgnoreSsl dev-only default and logging diagnostics The ServiceNowTicketValidator sample app.config ships with SafeguardIgnoreSsl=true so the sample runs out-of-the-box against development appliances that use self-signed certificates. The setting is kept for backward compatibility, but the insecure default is dangerous if developers copy the sample without reviewing it. - Samples/ServiceNowTicketValidator/app.config: add an inline XML comment beside the setting explaining why it is true in the sample and what the secure-by-default posture is for production. - Samples/README.md: add a 'TLS Certificate Validation' section that spells out when SafeguardIgnoreSsl=true is acceptable, the steps to flip it off for production (trust store install or pinning via the RemoteCertificateValidationCallback overload), and warns that the flag is a developer convenience, not a deployment default. - README.md: add a 'Logging & diagnostics' section explaining that SafeguardDotNetException.Response carries the raw appliance response body, that the SDK does not redact response fields (per security review D-013), and how callers should scrub at the log layer when forwarding exception details to less-trusted sinks. No code changes; no public API change. Resolves the documentation portion of F-SafeguardDotNet-001 and F-SafeguardDotNet-005. * feat(event): exponential reconnect backoff for persistent event listener (W6) PersistentSafeguardEventListenerBase previously reconnected every 5 seconds with no backoff cap or jitter. On sustained appliance downtime or a network partition the reconnect loop ran indefinitely at a fixed rate, which can exhaust calling-process resources and hammer the appliance side of a recovering link. This is the cross-SDK reference implementation called out in security review work unit W6. SafeguardJava and safeguard.js are expected to replicate the algorithm with identical constants. - SafeguardDotNet/Event/ReconnectBackoff.cs (new) Internal sealed helper. Algorithm: delay_n = min(60s, 2^n * 1s) for n = 0, 1, 2, ... actual_n = delay_n * (0.75 + 0.5 * rng()) // ±25% jitter OnSuccess() resets n back to zero. Jitter source is injectable for deterministic unit testing. Out-of-range jitter values are clamped. - SafeguardDotNet/Event/PersistentSafeguardEventListenerBase.cs Replace the fixed 5-second Thread.Sleep with _reconnectBackoff.GetNextDelay(), wait on the cancellation token's WaitHandle so a Stop() request interrupts the sleep, and call OnSuccess() once the listener has successfully restarted. - SafeguardDotNet/Properties/AssemblyInfo.cs (new) InternalsVisibleTo SafeguardDotNetUnitTest so the algorithm can be covered by deterministic unit tests without exposing the helper as public API. - Test/SafeguardDotNetUnitTest/ (new xunit project) 18 deterministic unit tests covering: - exponential doubling 1,2,4,8,16,32 then cap at 60 - lower / upper jitter envelope (±25% exact) - random-sample band coverage at every attempt index 0..9 - approximate uniform distribution across the jitter band - OnSuccess() resets the counter - null jitter source throws - default constructor still produces in-band delays - misbehaving RNG (returns 5.0 or -3.0) is clamped, not propagated Scope expansion vs. the fix plan: the plan named the test file under Test/SafeguardDotNetTool/, which is a CLI exe (not a test project). A dedicated xunit project is required to satisfy the 'deterministic unit tests for sequence and jitter bounds' contract. - SafeguardDotNet.Core.sln Add SafeguardDotNetUnitTest project so the test pack runs as part of the standard solution build. No public API change. Existing PersistentSafeguardA2AEventListener and PersistentSafeguardEventListener callers see identical behavior on the happy path; on sustained failure they now see exponentially increasing delays between attempts (1s, 2s, 4s, 8s, 16s, 32s, 60s, 60s, ...) with ±25% jitter, capped at 60 seconds. Resolves F-SafeguardDotNet-002. * chore(release): bump semanticVersion to 8.3.0 for security review (new public ReconnectBackoff API) * test(a2a): auto-enable/restore A2A service in test setup (env self-heal) * chore: remove internal review codes, fix version bump, clean sln noise - Remove W6/S1-S6/D-013 internal references from source comments and XML docs - Replace version bump 8.3.0 -> 8.2.4 (ReconnectBackoff is internal, not public API) - Remove x64/x86 platform noise from solution file (keep Any CPU only) - Remove dangling reference to internal security review document in README
1 parent 38b829d commit 046e0f7

14 files changed

Lines changed: 599 additions & 23 deletions

README.md

Lines changed: 40 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -77,6 +77,46 @@ SafeguardDotNet uses RestSharp and Json.NET to call the Safeguard API. It
7777
includes calls to Serilog, and if your calling application provides a sink you
7878
will get log information automatically.
7979

80+
## Logging & diagnostics
81+
82+
SafeguardDotNet emits diagnostic logs through Serilog, and surfaces appliance
83+
errors as `SafeguardDotNetException`. The exception carries an `HttpStatusCode`
84+
and a `Response` body string that contains **whatever the Safeguard appliance
85+
returned for the failing call** — this is intentional so that operators can
86+
diagnose appliance-side failures without re-running the request.
87+
88+
Because Safeguard is a credential-management product, the SDK does **not**
89+
inspect, filter, or redact response bodies — they may legitimately contain
90+
fields named `Password`, `ApiKey`, `PrivateKey`, `SshHostKey`, `Token`,
91+
`AccountPassword`, and similar, and the SDK has no safe way to distinguish
92+
"product output the caller requested" from "metadata that happens to share a
93+
name". In rare authentication-failure shapes the response body can also echo
94+
submitted credentials (for example, the Resource Owner Password Credentials
95+
grant may return the submitted username, and some appliance versions echo other
96+
form fields in error envelopes).
97+
98+
**If you forward `SafeguardDotNetException.Message`,
99+
`SafeguardDotNetException.Response`, or the rendered exception (`ex.ToString()`)
100+
to a log sink that is less trusted than the appliance — for example a shared
101+
SIEM, a cloud log aggregator, a support bundle, or a customer-visible error
102+
page — redact at the log layer.** Suggested patterns:
103+
104+
- Log only `ex.HttpStatusCode` and `ex.ErrorCode` for production telemetry;
105+
keep `ex.Response` in a privileged channel (local file, restricted sink) for
106+
on-call diagnosis.
107+
- If you must forward the full response, run it through your organization's
108+
log scrubber first (Serilog `Destructure.ByTransforming<>`, an
109+
`IEnumerable<ILogEventEnricher>`, or a sink-side filter) with rules tuned to
110+
*your* schema, not a generic substring blacklist.
111+
- Avoid `Log.Error(ex, "...")` patterns that capture the full exception object
112+
when the sink is not under your control; prefer
113+
`Log.Error("Safeguard request failed: {Status} {Code}", ex.HttpStatusCode, ex.ErrorCode)`.
114+
115+
The SDK will not be changed to redact response bodies because substring
116+
matching on field names would corrupt legitimate Safeguard fields such as
117+
`PasswordRulesPolicyId`, `ApiKeyName`, `NewPasswordValidUntil`, etc., and
118+
would not actually help in the cases where redaction is needed.
119+
80120
## Resource Owner Password Grant Deprecation
81121

82122
The OAuth2 Resource Owner Password Credential (ROPC) grant type — where

SafeguardDotNet.Core.sln

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -48,6 +48,8 @@ Project("{2150E333-8FDC-42A3-9474-1A3956D46DE8}") = "pipeline-templates", "pipel
4848
pipeline-templates\job-variables.yml = pipeline-templates\job-variables.yml
4949
EndProjectSection
5050
EndProject
51+
Project("{FAE04EC0-301F-11D3-BF4B-00C04F79EFBC}") = "SafeguardDotNetUnitTest", "Test\SafeguardDotNetUnitTest\SafeguardDotNetUnitTest.csproj", "{1B11B367-2F76-49EE-A820-2A0A8B1721B1}"
52+
EndProject
5153
Global
5254
GlobalSection(SolutionConfigurationPlatforms) = preSolution
5355
Debug|Any CPU = Debug|Any CPU
@@ -102,6 +104,10 @@ Global
102104
{0149D659-78E3-476B-81F1-A80BDFA6F8A9}.Debug|Any CPU.Build.0 = Debug|Any CPU
103105
{0149D659-78E3-476B-81F1-A80BDFA6F8A9}.Release|Any CPU.ActiveCfg = Release|Any CPU
104106
{0149D659-78E3-476B-81F1-A80BDFA6F8A9}.Release|Any CPU.Build.0 = Release|Any CPU
107+
{1B11B367-2F76-49EE-A820-2A0A8B1721B1}.Debug|Any CPU.ActiveCfg = Debug|Any CPU
108+
{1B11B367-2F76-49EE-A820-2A0A8B1721B1}.Debug|Any CPU.Build.0 = Debug|Any CPU
109+
{1B11B367-2F76-49EE-A820-2A0A8B1721B1}.Release|Any CPU.ActiveCfg = Release|Any CPU
110+
{1B11B367-2F76-49EE-A820-2A0A8B1721B1}.Release|Any CPU.Build.0 = Release|Any CPU
105111
EndGlobalSection
106112
GlobalSection(SolutionProperties) = preSolution
107113
HideSolutionNode = FALSE
@@ -117,6 +123,7 @@ Global
117123
{C6D34567-3456-4321-9876-345678901CDE} = {DD89D86B-68DA-4EB0-8EBC-60DE3DC2B084}
118124
{0149D659-78E3-476B-81F1-A80BDFA6F8A9} = {867EB36D-7893-444D-900D-29733E8E2636}
119125
{1EB6D64D-7AFB-41DC-B11B-58934D053684} = {9249E337-656D-4970-B45C-7A077C56FA44}
126+
{1B11B367-2F76-49EE-A820-2A0A8B1721B1} = {DD89D86B-68DA-4EB0-8EBC-60DE3DC2B084}
120127
EndGlobalSection
121128
GlobalSection(ExtensibilityGlobals) = postSolution
122129
SolutionGuid = {5A3B2C1D-4E6F-7A8B-9C0D-1E2F3A4B5C6D}

SafeguardDotNet/Event/PersistentSafeguardEventListenerBase.cs

Lines changed: 19 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -18,6 +18,7 @@ internal abstract class PersistentSafeguardEventListenerBase : ISafeguardEventLi
1818

1919
private Task _reconnectTask;
2020
private CancellationTokenSource _reconnectCancel;
21+
private readonly ReconnectBackoff _reconnectBackoff = new ReconnectBackoff();
2122

2223
protected PersistentSafeguardEventListenerBase()
2324
{
@@ -62,13 +63,29 @@ private void PersistentReconnectAndStart()
6263
_eventListener.SetEventListenerStateCallback(_eventListenerStateCallback);
6364
_eventListener.Start();
6465
_eventListener.SetDisconnectHandler(PersistentReconnectAndStart);
66+
67+
// Reset backoff so the next disconnect starts a fresh exponential ramp.
68+
_reconnectBackoff.OnSuccess();
6569
break;
6670
}
6771
catch (Exception ex)
6872
{
69-
Log.Warning("Internal event listener connection error (see debug for more information), sleeping for 5 seconds...");
73+
// Exponential backoff + jitter so a sustained appliance
74+
// outage cannot exhaust local resources or hammer the
75+
// appliance during recovery.
76+
var delay = _reconnectBackoff.GetNextDelay();
77+
Log.Warning(
78+
"Internal event listener connection error (see debug for more information), sleeping for {DelaySeconds:F1} seconds...",
79+
delay.TotalSeconds);
7080
Log.Debug(ex, "Internal event listener connection error.");
71-
Thread.Sleep(5000);
81+
try
82+
{
83+
_reconnectCancel.Token.WaitHandle.WaitOne(delay);
84+
}
85+
catch (ObjectDisposedException)
86+
{
87+
break;
88+
}
7289
}
7390
}
7491
},
Lines changed: 157 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,157 @@
1+
// Copyright (c) One Identity LLC. All rights reserved.
2+
3+
namespace OneIdentity.SafeguardDotNet.Event;
4+
5+
using System;
6+
7+
/// <summary>
8+
/// Exponential reconnect backoff with multiplicative ±25% jitter and a 60-second cap.
9+
/// </summary>
10+
/// <remarks>
11+
/// <para>
12+
/// Every persistent event listener should cap its reconnect frequency so that
13+
/// sustained appliance downtime or a network partition cannot turn the reconnect
14+
/// loop into a resource-exhaustion vector against either the calling process or
15+
/// the appliance.
16+
/// </para>
17+
/// <para>
18+
/// Algorithm:
19+
/// <code>
20+
/// delay_n = min(60s, 2^n * 1s) for n = 0, 1, 2, ...
21+
/// actual_n = delay_n * (0.75 + 0.5 * rng()) // ±25% multiplicative jitter
22+
/// </code>
23+
/// where <c>rng()</c> returns a value in <c>[0.0, 1.0)</c>. The internal counter
24+
/// <c>n</c> advances on every call to <see cref="GetNextDelay"/> and resets to
25+
/// zero on <see cref="OnSuccess"/>. The cap takes effect at n = 6 (2^6 = 64 &gt; 60).
26+
/// </para>
27+
/// <para>
28+
/// The jitter source is injectable so callers and unit tests can substitute a
29+
/// deterministic function for the default <see cref="Random"/>-based source.
30+
/// Values returned outside <c>[0.0, 1.0]</c> are clamped to that range so a
31+
/// misbehaving RNG cannot produce negative or unbounded delays.
32+
/// </para>
33+
/// </remarks>
34+
internal sealed class ReconnectBackoff
35+
{
36+
/// <summary>Initial delay before the first retry (n = 0), in seconds.</summary>
37+
public const double InitialDelaySeconds = 1.0;
38+
39+
/// <summary>Maximum delay between retries, in seconds.</summary>
40+
public const double MaxDelaySeconds = 60.0;
41+
42+
/// <summary>Half-width of the multiplicative jitter band (0.25 = ±25%).</summary>
43+
public const double JitterFraction = 0.25;
44+
45+
private static readonly Random DefaultRandom = new Random();
46+
47+
private readonly Func<double> _jitterSource;
48+
private readonly object _lock = new object();
49+
50+
private int _attempt;
51+
52+
/// <summary>
53+
/// Initializes a new instance of the <see cref="ReconnectBackoff"/> class
54+
/// using a shared thread-safe default jitter source.
55+
/// </summary>
56+
public ReconnectBackoff()
57+
: this(DefaultJitter)
58+
{
59+
}
60+
61+
/// <summary>
62+
/// Initializes a new instance of the <see cref="ReconnectBackoff"/> class
63+
/// with an injected jitter source. Provided primarily for deterministic
64+
/// unit testing of the algorithm.
65+
/// </summary>
66+
/// <param name="jitterSource">
67+
/// A function returning a value in <c>[0.0, 1.0)</c>. Values outside that
68+
/// range are clamped. Must not be <c>null</c>.
69+
/// </param>
70+
public ReconnectBackoff(Func<double> jitterSource)
71+
{
72+
_jitterSource = jitterSource ?? throw new ArgumentNullException(nameof(jitterSource));
73+
}
74+
75+
/// <summary>
76+
/// Computes the next delay and advances the internal attempt counter.
77+
/// </summary>
78+
/// <returns>A <see cref="TimeSpan"/> with the delay to wait before the next reconnect attempt.</returns>
79+
public TimeSpan GetNextDelay()
80+
{
81+
int currentAttempt;
82+
lock (_lock)
83+
{
84+
currentAttempt = _attempt;
85+
_attempt++;
86+
}
87+
88+
var baseSeconds = ComputeBaseDelaySeconds(currentAttempt);
89+
var raw = _jitterSource();
90+
var clamped = ClampUnitInterval(raw);
91+
var jitterMultiplier = 1.0 - JitterFraction + (2.0 * JitterFraction * clamped);
92+
return TimeSpan.FromSeconds(baseSeconds * jitterMultiplier);
93+
}
94+
95+
/// <summary>
96+
/// Resets the attempt counter so the next call to <see cref="GetNextDelay"/>
97+
/// returns an n = 0 (≈1 second) delay. Call this immediately after a
98+
/// successful reconnect.
99+
/// </summary>
100+
public void OnSuccess()
101+
{
102+
lock (_lock)
103+
{
104+
_attempt = 0;
105+
}
106+
}
107+
108+
private static double ComputeBaseDelaySeconds(int attempt)
109+
{
110+
// 2^attempt * InitialDelaySeconds, capped at MaxDelaySeconds.
111+
// attempt is clamped at 30 to avoid double overflow even though the cap
112+
// engages long before that point.
113+
var safeAttempt = ClampAttempt(attempt);
114+
var doubled = InitialDelaySeconds * Math.Pow(2.0, safeAttempt);
115+
return doubled > MaxDelaySeconds ? MaxDelaySeconds : doubled;
116+
}
117+
118+
private static int ClampAttempt(int attempt)
119+
{
120+
if (attempt < 0)
121+
{
122+
return 0;
123+
}
124+
125+
if (attempt > 30)
126+
{
127+
return 30;
128+
}
129+
130+
return attempt;
131+
}
132+
133+
private static double ClampUnitInterval(double value)
134+
{
135+
if (value < 0.0)
136+
{
137+
return 0.0;
138+
}
139+
140+
if (value > 1.0)
141+
{
142+
return 1.0;
143+
}
144+
145+
return value;
146+
}
147+
148+
private static double DefaultJitter()
149+
{
150+
// Random is not thread-safe in netstandard2.0. Synchronize on the
151+
// shared instance so concurrent reconnect loops can share one RNG.
152+
lock (DefaultRandom)
153+
{
154+
return DefaultRandom.NextDouble();
155+
}
156+
}
157+
}
Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
// Copyright (c) One Identity LLC. All rights reserved.
2+
3+
using System.Runtime.CompilerServices;
4+
5+
[assembly: InternalsVisibleTo("SafeguardDotNetUnitTest")]

Samples/README.md

Lines changed: 31 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -34,3 +34,34 @@ most secure automated Safeguard API authentication mechanism.
3434

3535
This project was initially developed using Visual Studio 2017. It targets .NET
3636
Framework 4.6.2. It can be modified to suit your needs.
37+
38+
## TLS Certificate Validation
39+
40+
Both sample `app.config` files ship with `SafeguardIgnoreSsl=true`. This setting
41+
exists so the samples run out-of-the-box against **development appliances that
42+
use self-signed TLS certificates**. It tells the SDK to skip Safeguard appliance
43+
certificate validation when negotiating the TLS handshake.
44+
45+
**Disabling TLS validation removes the SDK's protection against man-in-the-middle
46+
attacks.** Treat the default in these samples as a starting point for local
47+
development only.
48+
49+
When you adapt a sample for a real deployment:
50+
51+
- **Production deployments must set `SafeguardIgnoreSsl=false`** in `app.config`
52+
(or omit the key and rely on the SDK default). The appliance certificate chain
53+
must then validate against the host's trusted root store.
54+
- If the appliance uses an internal / private CA, **install that CA certificate
55+
in the host machine's trust store** (`Cert:\LocalMachine\Root` on Windows or
56+
the OS trust bundle on Linux/macOS) rather than disabling validation.
57+
- For certificate pinning or other custom trust policies, supply a
58+
`RemoteCertificateValidationCallback` via the `Safeguard.Connect(...)`
59+
overloads that accept a validation callback, instead of setting
60+
`SafeguardIgnoreSsl=true`. Implement the pinning logic (e.g. compare against
61+
a known SPKI hash) inside that callback.
62+
- Code review your sample fork: searching for `IgnoreSsl` /
63+
`SafeguardIgnoreSsl` / `-IgnoreSsl` should yield only configuration
64+
intentionally scoped to non-production environments.
65+
66+
The `SafeguardIgnoreSsl` flag is a developer convenience, not a deployment
67+
default. Leaving it set to `true` in production is a security defect.

Samples/ServiceNowTicketValidator/app.config

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -66,6 +66,16 @@
6666
<add key="SafeguardAddress" value="" />
6767
<add key="SafeguardClientCertificateThumbprint" value="" />
6868
<add key="SafeguardApiVersion" value="4" />
69+
<!--
70+
SafeguardIgnoreSsl: development / self-signed appliances ONLY.
71+
When set to "true" the Safeguard appliance TLS certificate is NOT validated,
72+
which removes protection against man-in-the-middle attacks.
73+
Production deployments MUST set this to "false" so the appliance certificate
74+
chain is verified against the trusted root store. If the appliance uses a
75+
private CA, install that CA in the host trust store instead of disabling
76+
validation. See ../README.md ("TLS Certificate Validation") for details and
77+
for pointers to configuring a custom certificate-validation callback.
78+
-->
6979
<add key="SafeguardIgnoreSsl" value="true" />
7080
<add key="ServiceNowDnsName" value="" />
7181
<add key="ServiceNowClientSecret" value="" />

0 commit comments

Comments
 (0)