[DOC] Agent-level Token Limit parameter

**What do you want to do?**
 
- [ ] Request a change to existing documentation
- [x] Add new documentation
- [ ] Report a technical problem with the documentation
- [ ] Other 

**Tell us about your request.** Provide a summary of the request.
The documentation should be updated to include the new token_limit feature proposed in issue #4728.

Add a new paragraph under the existing Tracking token usage section explaining the token_limit parameter and its behavior.

The documentation should clarify that token limiting works through an over-exhaustion mechanism: token usage is validated after LLM calls complete by checking the cumulative token usage against the configured token_limit. This means execution is stopped once the accumulated token usage exceeds the configured threshold.

Also mention that token limit support has been added for:

Plan-Execute-Reflect (PRE) agents
V1 agents
V2 agents

The explanation should make it clear that token usage aggregation includes all LLM calls performed during execution, including planning, execution, reflection, and subagent calls.

**Version:** List the OpenSearch version to which this issue applies, e.g. 2.14, 2.12--2.14, or all.
Upcoming 3.7 

**What other resources are available?** Provide links to related issues, POCs, steps for testing, etc.
Issue: https://github.com/opensearch-project/ml-commons/issues/4728
PR: https://github.com/opensearch-project/ml-commons/pull/4820

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[DOC] Agent-level Token Limit parameter #12430

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

[DOC] Agent-level Token Limit parameter #12430

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions