Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 3 additions & 3 deletions packages/core/src/fallback/handler.ts
Original file line number Diff line number Diff line change
Expand Up @@ -130,19 +130,19 @@ async function processIntent(
config: Config,
intent: FallbackIntent | null,
fallbackModel: string,
): Promise<boolean> {
): Promise<string | boolean> {
switch (intent) {
case 'retry_always':
// TODO(telemetry): Implement generic fallback event logging. Existing
// logFlashFallback is specific to a single Model.
config.activateFallbackMode(fallbackModel);
return true;
return fallbackModel;

case 'retry_once':
// For distinct retry (retry_once), we do NOT set the active model permanently.
// The FallbackStrategy will handle routing to the available model for this turn
// based on the availability service state (which is updated before this).
return true;
return fallbackModel;

case 'retry_with_credits':
return true;
Expand Down
45 changes: 45 additions & 0 deletions packages/core/src/utils/retry.test.ts
Original file line number Diff line number Diff line change
Expand Up @@ -665,6 +665,51 @@ describe('retryWithBackoff', () => {
expect(mockFn).toHaveBeenCalledTimes(1);
},
);

it('should wait before retrying when fallback returns same model and fails terminally', async () => {
let attempts = 0;
const mockFn = vi.fn().mockImplementation(async () => {
attempts++;
if (attempts > 2) {
throw new Error('Test terminated to prevent infinite loop');
}
throw new TerminalQuotaError('Quota exhausted', {} as any);
});

const mockPolicy = {
model: 'same-model',
actions: {},
stateTransitions: {},
};
const getAvailabilityContext = vi
.fn()
.mockReturnValue({ policy: mockPolicy });

const promise = retryWithBackoff(mockFn, {
maxAttempts: 2,
initialDelayMs: 10,
onPersistent429: async () => 'same-model',
getAvailabilityContext,
authType: 'oauth-personal',
});

// Handle rejection early to avoid unhandled rejection warnings
const catchPromise = promise.catch((e) => e);

// At this point, it should have failed once and be waiting.
expect(mockFn).toHaveBeenCalledTimes(1);

// We need to advance timers to allow retries
await vi.advanceTimersByTimeAsync(10); // 1st retry delay
await vi.advanceTimersByTimeAsync(20); // 2nd retry delay

const result = await catchPromise;
expect(result).toBeInstanceOf(Error);
if (result instanceof Error) {
expect(result.message).toBe('Test terminated to prevent infinite loop');
}
expect(mockFn).toHaveBeenCalledTimes(3);
});
});
it('should abort the retry loop when the signal is aborted', async () => {
const abortController = new AbortController();
Expand Down
49 changes: 47 additions & 2 deletions packages/core/src/utils/retry.ts
Original file line number Diff line number Diff line change
Expand Up @@ -307,14 +307,31 @@ export async function retryWithBackoff<T>(
) {
if (onPersistent429) {
try {
const currentModel = getAvailabilityContext?.()?.policy.model;

const fallbackModel = await onPersistent429(
authType,
classifiedError,
);
if (fallbackModel) {
attempt = 0; // Reset attempts and retry with the new model.
currentDelay = initialDelayMs;
continue;

// Only continue (immediate retry) if fallbackModel is a NEW model
if (
typeof fallbackModel === 'string' &&
fallbackModel !== currentModel
) {
continue;
} else {
// If it's the same model (or a boolean retry signal), wait before retrying
currentDelay = await applyBackoffDelay(
currentDelay,
maxDelayMs,
signal,
);
continue;
}
Comment on lines +321 to +334

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-high high

The retry logic is vulnerable to a Denial of Service (DoS) due to uncontrolled retry loops and non-exponential backoff. This occurs because the attempt counter and currentDelay are reset before checking if the fallbackModel is new. Specifically, if currentModel is undefined, the condition fallbackModel !== currentModel incorrectly evaluates to true, leading to immediate retries without delay. This violates the repository rule requiring a mechanism to limit retry attempts to prevent infinite loops.

              // Only continue (immediate retry) if fallbackModel is a NEW model
              if (
                typeof fallbackModel === 'string' &&
                currentModel !== undefined &&
                fallbackModel !== currentModel
              ) {
                continue;
              } else {
                // If it's the same model (or a boolean retry signal), wait before retrying
                currentDelay = await applyBackoffDelay(
                  currentDelay,
                  maxDelayMs,
                  signal,
                );
                continue;
              }
References
  1. A recursive error/reconnect handler is acceptable as long as it includes a mechanism to limit the maximum number of retry attempts to prevent infinite loops.

}
Comment on lines +310 to 335

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-high high

The fallback logic in retryWithBackoff contains a logic flaw that can lead to an infinite retry loop, bypassing the maxAttempts limit. This occurs because the attempt counter is unconditionally reset to 0 when a fallback model is suggested, even if the model remains the same. This violates the requirement for recursive handlers to have a mechanism to limit retry attempts. Additionally, the logic should use the resolved primary model from the availability context to ensure consistency and avoid re-evaluating conditions. Resetting currentDelay when falling back to the same model is too aggressive; exponential backoff should continue unless a new model is actually introduced.

            const fallbackModel = await onPersistent429(
              authType,
              classifiedError,
            );
            if (fallbackModel) {
              const currentContext = getAvailabilityContext?.();
              const currentModel = currentContext?.policy.model;

              if (
                typeof fallbackModel === 'string' &&
                fallbackModel !== currentModel
              ) {
                attempt = 0;
                currentDelay = initialDelayMs;
                continue;
              } else {
                const jitter = currentDelay * 0.3 * (Math.random() * 2 - 1);
                const delayWithJitter = Math.max(0, currentDelay + jitter);
                await delay(delayWithJitter, signal);
                currentDelay = Math.min(maxDelayMs, currentDelay * 2);
                continue;
              }
            }
References
  1. A recursive error/reconnect handler is acceptable as long as it includes a mechanism to limit the maximum number of retry attempts to prevent infinite loops.
  2. Avoid overlapping checks in model resolution logic by relying on condensed sources of truth. Use the resolved primary model directly rather than re-evaluating conditions.
  3. In retry loops, re-evaluate dynamic conditions like the maximum number of attempts on each iteration to ensure the logic adapts to potential state changes.

} catch (fallbackError) {
debugLogger.warn('Fallback to Flash model failed:', fallbackError);
Expand Down Expand Up @@ -356,14 +373,31 @@ export async function retryWithBackoff<T>(
);
if (onPersistent429) {
try {
const currentModel = getAvailabilityContext?.()?.policy.model;

const fallbackModel = await onPersistent429(
authType,
classifiedError,
);
if (fallbackModel) {
attempt = 0; // Reset attempts and retry with the new model.
currentDelay = initialDelayMs;
continue;

// Only continue (immediate retry) if fallbackModel is a NEW model
if (
typeof fallbackModel === 'string' &&
fallbackModel !== currentModel
) {
continue;
} else {
// If it's the same model (or a boolean retry signal), wait before retrying
currentDelay = await applyBackoffDelay(
currentDelay,
maxDelayMs,
signal,
);
continue;
}
Comment on lines +387 to +400

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-high high

This section contains a similar Denial of Service (DoS) vulnerability. The attempt counter and currentDelay are reset before verifying if the fallbackModel is new. This, combined with the backoff delay being skipped if currentModel is undefined, can lead to an infinite loop and broken exponential backoff. This violates the repository rule requiring a mechanism to limit retry attempts to prevent infinite loops.

                // Only continue (immediate retry) if fallbackModel is a NEW model
                if (
                  typeof fallbackModel === 'string' &&
                  currentModel !== undefined &&
                  fallbackModel !== currentModel
                ) {
                  continue;
                } else {
                  // If it's the same model (or a boolean retry signal), wait before retrying
                  currentDelay = await applyBackoffDelay(
                    currentDelay,
                    maxDelayMs,
                    signal,
                  );
                  continue;
                }
References
  1. A recursive error/reconnect handler is acceptable as long as it includes a mechanism to limit the maximum number of retry attempts to prevent infinite loops.

}
Comment on lines +376 to 401

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-high high

This block duplicates the logic flaw where the attempt counter is prematurely reset, potentially causing infinite loops. To ensure consistency and reduce logic duplication, this model resolution and retry logic should be unified. The check should rely on the resolved primary model from the context rather than static constants or redundant evaluations.

              const fallbackModel = await onPersistent429(
                authType,
                classifiedError,
              );
              if (fallbackModel) {
                const currentContext = getAvailabilityContext?.();
                const currentModel = currentContext?.policy.model;

                if (
                  typeof fallbackModel === 'string' &&
                  fallbackModel !== currentModel
                ) {
                  attempt = 0;
                  currentDelay = initialDelayMs;
                  continue;
                } else {
                  const jitter = currentDelay * 0.3 * (Math.random() * 2 - 1);
                  const delayWithJitter = Math.max(0, currentDelay + jitter);
                  await delay(delayWithJitter, signal);
                  currentDelay = Math.min(maxDelayMs, currentDelay * 2);
                  continue;
                }
              }
References
  1. Recursive handlers must include a mechanism to limit the maximum number of retry attempts to prevent infinite loops.
  2. Avoid overlapping checks in model resolution logic and reduce logic duplication by using the resolved primary model directly.

} catch (fallbackError) {
debugLogger.warn('Model fallback failed:', fallbackError);
Expand Down Expand Up @@ -472,3 +506,14 @@ function logRetryAttempt(
debugLogger.warn(message, error); // Default to warn if error type is unknown
}
}

async function applyBackoffDelay(
currentDelay: number,
maxDelayMs: number,
signal?: AbortSignal,
): Promise<number> {
const jitter = currentDelay * 0.3 * (Math.random() * 2 - 1);
const delayWithJitter = Math.max(0, currentDelay + jitter);
await delay(delayWithJitter, signal);
return Math.min(maxDelayMs, currentDelay * 2);
}
Loading