Skip to content

set retry time interval configurable, increase the http client read timeout#6320

Merged
oeyh merged 2 commits into
opensearch-project:mainfrom
Zhangxunmt:main
Dec 4, 2025
Merged

set retry time interval configurable, increase the http client read timeout#6320
oeyh merged 2 commits into
opensearch-project:mainfrom
Zhangxunmt:main

Conversation

@Zhangxunmt

Copy link
Copy Markdown
Collaborator

Description

  1. There's occasionally read time out issues in ml processing invoking ml-commons to create batch jobs. The error message is below.
Failed to execute HTTP request due to IO issue: Read timed out
  1. Currently the throttled events are retried every 3 seconds based on the processor doExecute() frequency. In some conditions, retry attempts are hammering the remote Bedrock service causing unstoppable throttling errors.

This PR increased the http client read timeout from 3 to 30 seconds, and implemented a configurable retry time interval with the default value of 60 seconds to reduce the TPS sent to the remote AI server. Since Bedrock throttles request based on requests per minutes, so 60 seconds should be a good default value.

Issues Resolved

Resolves #[Issue number to be closed when this PR is merged]

Check List

  • New functionality includes testing.
  • New functionality has a documentation issue. Please link to it in this PR.
    • New functionality has javadoc added
  • Commits are signed with a real name per the DCO

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@Zhangxunmt Zhangxunmt changed the title set retry time interval configurabl, increase the http client read timeout set retry time interval configurable, increase the http client read timeout Dec 3, 2025
Comment on lines +36 to +38
public static final int DEFAULT_RETRY_INTERVAL = 60; // default retry interval is 1 minute
private static final int MIN_RETRY_INTERVAL = 3;
public static final int MAX_RETRY_INTERVAL = 300;

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: suggest specifying the time unit in the name, eg. DEFAULT_RETRY_INTERVAL_SECONDS, etc.


@JsonPropertyDescription("The retry interval for the throttled records. Default is 60s.")
@JsonProperty(value = "retry_interval_seconds", defaultValue = "" + DEFAULT_RETRY_INTERVAL)
private int retry_interval_seconds = DEFAULT_RETRY_INTERVAL;

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use camelCase here: retryIntervalSeconds

Comment on lines +127 to +129
if (retry_interval_seconds < MIN_RETRY_INTERVAL || retry_interval_seconds > MAX_RETRY_INTERVAL) {
throw new IllegalArgumentException(String.format("retry interval for throttled records of %d seconds is not valid, valid range is %d - %d", retry_interval_seconds, MIN_RETRY_INTERVAL, MAX_RETRY_INTERVAL));
}

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will use the hibernate.validator as the validation per taylor's comment. I think it's fine to just use Durations.
@DurationMin(seconds = 3)
@DurationMax(seconds = 300)


@JsonPropertyDescription("The retry interval for the throttled records. Default is 60s.")
@JsonProperty(value = "retry_interval_seconds", defaultValue = "" + DEFAULT_RETRY_INTERVAL)
private int retry_interval_seconds = DEFAULT_RETRY_INTERVAL;

@graytaylor0 graytaylor0 Dec 3, 2025

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would use Duration type for this and rename to retry_interval. With duration type users can either put an ISO_8601 time ("PT3M") or a simple duration time ("60s"). You can validate it with annotations (ex:

)

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

makes sense to me. Initially I think using Duration is an overkill since we don't need a big interval anyways (probably 5 mins is the biggest). But to make it consistent with retryTimeWindow as a Duration too, I think changing it to Duration is fine.

return;
}

LOG.info("Processing {} throttled records ({}s since last retry)",

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this potentially be a NOISY log? If so you should mark it as NOISY

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes it's Ok to make it Noisy. I thought this is only printed every x minutes so probably not that frequent. I will mark it as NOISY in the next revision since it still repeats every 60s in busy conditions.

…d timeout

Signed-off-by: Xun Zhang <xunzh@amazon.com>
Signed-off-by: Xun Zhang <xunzh@amazon.com>
@oeyh oeyh merged commit 72a85f5 into opensearch-project:main Dec 4, 2025
45 of 47 checks passed
eatulban pushed a commit to eatulban/data-prepper that referenced this pull request Dec 11, 2025
…imeout (opensearch-project#6320)

* set retry time interval configurable and increase the http client read timeout

Signed-off-by: Xun Zhang <xunzh@amazon.com>

* address comments

Signed-off-by: Xun Zhang <xunzh@amazon.com>

---------

Signed-off-by: Xun Zhang <xunzh@amazon.com>
wandna-amazon pushed a commit to wandna-amazon/data-prepper that referenced this pull request Jan 8, 2026
…imeout (opensearch-project#6320)

* set retry time interval configurable and increase the http client read timeout

Signed-off-by: Xun Zhang <xunzh@amazon.com>

* address comments

Signed-off-by: Xun Zhang <xunzh@amazon.com>

---------

Signed-off-by: Xun Zhang <xunzh@amazon.com>
Signed-off-by: Nathan Wand <wandna@amazon.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants