Skip to content

[DOC] Add Bedrock Cohere Rerank v3.5 connector blueprint#4832

Open
gauravSsinha wants to merge 2 commits into
opensearch-project:mainfrom
substrai:feature/bedrock-cohere-rerank-connector-blueprint
Open

[DOC] Add Bedrock Cohere Rerank v3.5 connector blueprint#4832
gauravSsinha wants to merge 2 commits into
opensearch-project:mainfrom
substrai:feature/bedrock-cohere-rerank-connector-blueprint

Conversation

@gauravSsinha
Copy link
Copy Markdown

Description

Adds a connector blueprint for the Cohere Rerank v3.5 model on Amazon Bedrock (cohere.rerank-v3-5:0). This blueprint was missing from docs/remote_inference_blueprints/ despite the built-in pre/post processing functions being implemented in #3254.

Issues Resolved

Resolves #4831
Related: #3254, #3396

Changes Made

Added docs/remote_inference_blueprints/bedrock_connector_cohere_rerank_blueprint.md covering:

  • Self-managed OpenSearch configuration with AWS credentials
  • AWS OpenSearch Service configuration with IAM role ARN
  • Built-in functions (connector.pre_process.cohere.rerank and connector.post_process.cohere.rerank) for OpenSearch 2.19+
  • Custom Painless scripts for earlier OpenSearch versions (fallback)
  • Model group creation, model registration, and deployment
  • Inference testing via both Predict API and text similarity API
  • Integration with OpenSearch reranking search pipeline
  • query_text_path usage to avoid duplicate query specification

Context

The Cohere Rerank model on Bedrock is widely used for improving RAG pipeline relevance. The built-in pre/post processing functions were added in #3254 (closed Jan 2025), but no corresponding blueprint documentation was created. This PR fills that gap, following the same structure as existing blueprints (e.g., bedrock_connector_cohere_cohere.embed-english-v3_blueprint.md, bedrock_connector_titan_embedding_blueprint.md).

Testing

  • Verified blueprint structure matches existing blueprints in the repository
  • Connector configuration validated against the official OpenSearch reranking tutorial
  • Built-in function names verified against source code in MLPreProcessFunction.java and MLPostProcessFunction.java

Checklist

  • DCO sign-off on all commits
  • Follows existing blueprint documentation format
  • No code changes (documentation only)

Add a new connector blueprint for the Cohere Rerank v3.5 model on
Amazon Bedrock. This enables OpenSearch users to leverage Bedrock's
reranking capabilities for improving search relevance in RAG pipelines.

The blueprint includes:
- Self-managed OpenSearch configuration with AWS credentials
- AWS OpenSearch Service configuration with IAM role
- Built-in pre/post processing functions (connector.pre_process.cohere.rerank
  and connector.post_process.cohere.rerank) for OpenSearch 2.19+
- Custom Painless scripts for earlier OpenSearch versions
- Model registration, deployment, and inference testing
- Integration with OpenSearch reranking search pipeline
- query_text_path usage to avoid duplicate query specification

Resolves opensearch-project#4831
Related: opensearch-project#3254, opensearch-project#3396

Signed-off-by: Gaurav Kumar Sinha <gaurav@substrai.dev>
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 26, 2026

PR Reviewer Guide 🔍

(Review updated until commit 6050fa7)

Here are some key observations to aid the review process:

🧪 No relevant tests
🔒 Security concerns

Sensitive information exposure:
The blueprint includes placeholder text instructing users to insert AWS credentials (access_key, secret_key, session_token) directly into connector configurations (lines 36-38, 122-124). While a security note is present at line 63, it only appears after the first credential example and does not cover the second occurrence in section 2.3. Users following section 2.3 might miss the warning and commit credentials to version control or expose them in production configurations.

✅ No TODO sections
🔀 No multiple PR themes
⚡ No major issues detected

@gauravSsinha gauravSsinha requested a deployment to ml-commons-cicd-env-require-approval May 26, 2026 02:36 — with GitHub Actions Waiting
@gauravSsinha gauravSsinha requested a deployment to ml-commons-cicd-env-require-approval May 26, 2026 02:36 — with GitHub Actions Waiting
@gauravSsinha gauravSsinha requested a deployment to ml-commons-cicd-env-require-approval May 26, 2026 02:36 — with GitHub Actions Waiting
@gauravSsinha gauravSsinha requested a deployment to ml-commons-cicd-env-require-approval May 26, 2026 02:36 — with GitHub Actions Waiting
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 26, 2026

PR Code Suggestions ✨

Latest suggestions up to 6050fa7
Explore these optional code suggestions:

CategorySuggestion                                                                                                                                    Impact
Possible issue
Add newline character escaping

The Painless script does not handle newline characters in document text, which could
break the JSON structure. Add escaping for newline (\n) and carriage return (\r)
characters to prevent JSON parsing errors when documents contain line breaks.

docs/remote_inference_blueprints/bedrock_connector_cohere_rerank_blueprint.md [142]

-"pre_process_function": "\n    def query_text = params.query_text;\n    def text_docs = params.text_docs;\n    def textDocsBuilder = new StringBuilder('[');\n    for (int i=0; i<text_docs.length; i++) {\n      textDocsBuilder.append('\"');\n      textDocsBuilder.append(text_docs[i].replace('\\\\', '\\\\\\\\').replace('\"', '\\\\\"'));\n      textDocsBuilder.append('\"');\n      if (i<text_docs.length - 1) {\n        textDocsBuilder.append(',');\n      }\n    }\n    textDocsBuilder.append(']');\n    def escapedQuery = query_text.replace('\\\\', '\\\\\\\\').replace('\"', '\\\\\"');\n    def parameters = '{ \"query\": \"' + escapedQuery + '\",  \"documents\": ' + textDocsBuilder.toString() + ' }';\n    return  '{\"parameters\": ' + parameters + '}';\n  "
+"pre_process_function": "\n    def query_text = params.query_text;\n    def text_docs = params.text_docs;\n    def textDocsBuilder = new StringBuilder('[');\n    for (int i=0; i<text_docs.length; i++) {\n      textDocsBuilder.append('\"');\n      textDocsBuilder.append(text_docs[i].replace('\\\\', '\\\\\\\\').replace('\"', '\\\\\"').replace('\\n', '\\\\n').replace('\\r', '\\\\r'));\n      textDocsBuilder.append('\"');\n      if (i<text_docs.length - 1) {\n        textDocsBuilder.append(',');\n      }\n    }\n    textDocsBuilder.append(']');\n    def escapedQuery = query_text.replace('\\\\', '\\\\\\\\').replace('\"', '\\\\\"').replace('\\n', '\\\\n').replace('\\r', '\\\\r');\n    def parameters = '{ \"query\": \"' + escapedQuery + '\",  \"documents\": ' + textDocsBuilder.toString() + ' }';\n    return  '{\"parameters\": ' + parameters + '}';\n  "
Suggestion importance[1-10]: 8

__

Why: This is a critical bug fix. Without escaping newline and carriage return characters in the Painless script, documents containing line breaks will produce invalid JSON, causing parsing errors and connector failures. The suggestion correctly identifies the issue and provides the proper escaping for both query_text and text_docs.

Medium

Previous suggestions

Suggestions up to commit 1a5618e
CategorySuggestion                                                                                                                                    Impact
Security
Escape special characters in JSON

The Painless script does not escape special characters in query_text or text_docs
elements, which can break JSON formatting if documents contain quotes, newlines, or
backslashes. This could cause request failures or injection vulnerabilities. Add
proper JSON escaping for string values.

docs/remote_inference_blueprints/bedrock_connector_cohere_rerank_blueprint.md [143]

-"pre_process_function": "\n    def query_text = params.query_text;\n    def text_docs = params.text_docs;\n    def textDocsBuilder = new StringBuilder('[');\n    for (int i=0; i<text_docs.length; i++) {\n      textDocsBuilder.append('\"');\n      textDocsBuilder.append(text_docs[i]);\n      textDocsBuilder.append('\"');\n      if (i<text_docs.length - 1) {\n        textDocsBuilder.append(',');\n      }\n    }\n    textDocsBuilder.append(']');\n    def parameters = '{ \"query\": \"' + query_text + '\",  \"documents\": ' + textDocsBuilder.toString() + ' }';\n    return  '{\"parameters\": ' + parameters + '}';\n  ",
+"pre_process_function": "\n    def query_text = params.query_text.replace('\\\\', '\\\\\\\\').replace('\"', '\\\\\"').replace('\\n', '\\\\n').replace('\\r', '\\\\r');\n    def text_docs = params.text_docs;\n    def textDocsBuilder = new StringBuilder('[');\n    for (int i=0; i<text_docs.length; i++) {\n      def escaped = text_docs[i].replace('\\\\', '\\\\\\\\').replace('\"', '\\\\\"').replace('\\n', '\\\\n').replace('\\r', '\\\\r');\n      textDocsBuilder.append('\"');\n      textDocsBuilder.append(escaped);\n      textDocsBuilder.append('\"');\n      if (i<text_docs.length - 1) {\n        textDocsBuilder.append(',');\n      }\n    }\n    textDocsBuilder.append(']');\n    def parameters = '{ \"query\": \"' + query_text + '\",  \"documents\": ' + textDocsBuilder.toString() + ' }';\n    return  '{\"parameters\": ' + parameters + '}';\n  ",
Suggestion importance[1-10]: 8

__

Why: This is a critical security and correctness issue. The Painless script in pre_process_function does not escape special characters like quotes, backslashes, or newlines in query_text or text_docs, which can break JSON formatting or introduce injection vulnerabilities when documents contain these characters.

Medium
General
Hardcode api_version in request body

The request_body uses string interpolation for api_version without quotes, which
will result in invalid JSON if the value is not a number. Since api_version is set
to 2 in parameters, this works, but the template is fragile. Consider using proper
JSON formatting or document this requirement.

docs/remote_inference_blueprints/bedrock_connector_cohere_rerank_blueprint.md [56]

-"request_body": "{ \"documents\": ${parameters.documents}, \"query\": \"${parameters.query}\", \"api_version\": ${parameters.api_version} }",
+"request_body": "{ \"documents\": ${parameters.documents}, \"query\": \"${parameters.query}\", \"api_version\": 2 }",
Suggestion importance[1-10]: 3

__

Why: While the suggestion correctly identifies that api_version is interpolated without quotes, the current approach is intentional and works correctly since api_version is set to 2 (a number). Hardcoding reduces flexibility without significant benefit, as the parameter is already defined in the configuration.

Low

…less script

- Hardcode api_version as 2 in request_body instead of using parameter
  substitution, since the value is fixed for this model
- Add .replace() calls in pre_process_function to escape backslashes
  and double quotes in query_text and text_docs, preventing malformed
  JSON when documents contain special characters
- Add security note recommending IAM roles or Secrets Manager over
  hardcoded credentials

Signed-off-by: Gaurav Kumar Sinha <gaurav@substrai.dev>
@github-actions
Copy link
Copy Markdown

Persistent review updated to latest commit 6050fa7

@gauravSsinha gauravSsinha temporarily deployed to ml-commons-cicd-env-require-approval May 26, 2026 04:57 — with GitHub Actions Inactive
@gauravSsinha gauravSsinha had a problem deploying to ml-commons-cicd-env-require-approval May 26, 2026 04:57 — with GitHub Actions Failure
@gauravSsinha gauravSsinha temporarily deployed to ml-commons-cicd-env-require-approval May 26, 2026 04:57 — with GitHub Actions Inactive
@gauravSsinha gauravSsinha had a problem deploying to ml-commons-cicd-env-require-approval May 26, 2026 04:57 — with GitHub Actions Failure
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[DOC] Add connector blueprint for Bedrock Cohere Rerank v3.5

1 participant