From cd0d9196319e004edb9b2f257b318e32b62eadeb Mon Sep 17 00:00:00 2001 From: Gabriel Weyer <159976942+gweyeratlassian@users.noreply.github.com> Date: Fri, 6 Mar 2026 05:36:48 +1100 Subject: [PATCH 01/28] Add RomanianAnalyzer to 3.x breaking changes (#12008) * Add RomanianAnalyzer to 3.x breaking changes Signed-off-by: Gabriel Weyer <159976942+gweyeratlassian@users.noreply.github.com> * Improve Romanian analyzer breaking change wording Co-authored-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Signed-off-by: Gabriel Weyer <159976942+gweyeratlassian@users.noreply.github.com> --------- Signed-off-by: Gabriel Weyer <159976942+gweyeratlassian@users.noreply.github.com> Co-authored-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Signed-off-by: Eric Pugh --- _about/breaking-changes.md | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/_about/breaking-changes.md b/_about/breaking-changes.md index 0f057d4fea7..bb1cde9d872 100644 --- a/_about/breaking-changes.md +++ b/_about/breaking-changes.md @@ -209,3 +209,7 @@ For more information, see pull request [#9813](https://github.com/opensearch-pro - The `CatIndexTool` is removed in favor of the `ListIndexTool`. +### Romanian analysis + +The `romanian` analyzer now supports Romanian in its modern Unicode form and normalizes cedilla characters to their comma-based equivalents. Because both forms are still in use, we recommend reindexing existing Romanian documents to ensure consistent analysis and search behavior. + From 69ab1cd58e2d98a9953b54dad117c8a9c7c86a85 Mon Sep 17 00:00:00 2001 From: Eric Pugh Date: Tue, 10 Mar 2026 17:51:42 -0400 Subject: [PATCH 02/28] first pass of tutorial Signed-off-by: Eric Pugh --- _search-plugins/search-relevance/judgments.md | 12 +- _tutorials/index.md | 4 +- _tutorials/llm-as-a-judge-tutorial.md | 742 ++++++++++++++++++ 3 files changed, 751 insertions(+), 7 deletions(-) create mode 100644 _tutorials/llm-as-a-judge-tutorial.md diff --git a/_search-plugins/search-relevance/judgments.md b/_search-plugins/search-relevance/judgments.md index 77f55a7db4b..0daa8e577f0 100644 --- a/_search-plugins/search-relevance/judgments.md +++ b/_search-plugins/search-relevance/judgments.md @@ -24,8 +24,8 @@ Search Relevance Workbench supports all types of judgments: ## Explicit judgments Search Relevance Workbench offers two ways to integrate explicit judgments: -* Importing judgments that were collected using a process outside of OpenSearch -* AI-assisted judgments that use LLMs +* Importing judgments that were collected using a process outside of OpenSearch. +* Generating judgments using LLM-as-a-Judge. ### Importing judgments @@ -105,12 +105,12 @@ Parameter | Data type | Description `type` | String | Set to `IMPORT_JUDGMENT`. `judgmentRatings` | Array | A list of JSON objects containing the judgments. Judgments are grouped by query, each containing a nested map in which document IDs (`docId`) serve as keys and their floating-point ratings serve as values. -### Creating AI-assisted judgments +### Using LLM-as-a-Judge -If you want to use judgments in your experimentation process but do not have a team of humans or the user behavior data to calculate judgments based on interactions, you can use an LLM in Search Relevance Workbench to generate judgments. +If you want to use judgments in your experimentation process but do not have a team of humans or the user behavior data to calculate judgments based on interactions, you can use an LLM in Search Relevance Workbench to generate judgments, aka _LLM-as-a-Judge_. #### Prerequisites -To use AI-assisted judgment generation, ensure that you have configured the following components: +To use LLM-as-a-Judge, ensure that you have configured the following components: * A connector to an LLM to use for generating the judgments. For more information, see [Creating connectors for third-party ML platforms]({{site.url}}{{site.baseurl}}/ml-commons-plugin/remote-models/connectors/). * A query set: Together with the `size` parameter, the query set defines the scope for generating judgments. For each query, the top k documents are retrieved from the specified index, where k is defined in the `size` parameter. @@ -120,7 +120,7 @@ The AI-assisted judgment process works as follows: - For each query, the top k documents are retrieved using the defined search configuration, which includes the index information. The query and each document from the result list create a query/document pair. - Each query and document pair forms a query/document pair. - The LLM is then called with a predefined prompt (stored as a static variable in the backend) to generate a judgment for each query/document pair. -- All generated judgments are stored in the judgments index for reuse in future experiments. +- All generated judgments are stored in the judgments cache index for reuse in future experiments. To create a judgment list, provide the model ID of the LLM, an available query set, and a created search configuration: diff --git a/_tutorials/index.md b/_tutorials/index.md index 632b0066a74..e40080f5951 100644 --- a/_tutorials/index.md +++ b/_tutorials/index.md @@ -28,6 +28,9 @@ cards: - heading: "Faceted search" description: "Build filterable search experiences for applications like e-commerce or location search" link: "/tutorials/faceted-search/" + - heading: "LLM-as-a-Judge" + description: "Getting started with LLM as a Judge for search relevance evaluation" + link: "/tutorials/llm-as-a-judge-tutorial/" --- # Tutorials @@ -35,4 +38,3 @@ cards: Follow our step-by-step tutorials to learn how to use OpenSearch features. {% include cards.html cards=page.cards %} - diff --git a/_tutorials/llm-as-a-judge-tutorial.md b/_tutorials/llm-as-a-judge-tutorial.md new file mode 100644 index 00000000000..672109b8bb4 --- /dev/null +++ b/_tutorials/llm-as-a-judge-tutorial.md @@ -0,0 +1,742 @@ +--- +layout: default +title: Getting started with LLM as a Judge for search relevance evaluation +has_children: false +parent: Search Relevance Workbench +nav_order: 4 +steps: + - heading: "Set up ML Commons and create an external LLM connector" + link: "/tutorials/search-relevance/llm-as-judge-tutorial/#step-1-set-up-ml-commons-and-create-an-external-llm-connector" + - heading: "Create a simple search index with sample data" + link: "/tutorials/search-relevance/llm-as-judge-tutorial/#step-2-create-a-simple-search-index-with-sample-data" + - heading: "Create search configurations" + link: "/tutorials/search-relevance/llm-as-judge-tutorial/#step-3-create-search-configurations" + - heading: "Create a query set" + link: "/tutorials/search-relevance/llm-as-judge-tutorial/#step-4-create-a-query-set" + - heading: "Generate LLM judgments" + link: "/tutorials/search-relevance/llm-as-judge-tutorial/#step-5-generate-llm-judgments" + - heading: "Run experiments with LLM judgments" + link: "/tutorials/search-relevance/llm-as-judge-tutorial/#step-6-run-experiments-with-llm-judgments" +--- + +# Getting started with LLM as a Judge for search relevance evaluation + +LLM as a Judge is a technique that leverages large language models to automatically evaluate search result relevance, providing a scalable and consistent approach to search quality assessment. + +In this tutorial, you'll learn how to: + +- **Set up external LLM integration**: Connect OpenSearch to external LLM providers like OpenAI, AWS Bedrock, or others. +- **Generate automated judgments**: Use LLMs to evaluate search result relevance without manual annotation. +- **Compare search configurations**: Run experiments to determine which search approach performs better using LLM-generated judgments. + +## OpenSearch components for LLM as a Judge + +In this tutorial, you'll use the following OpenSearch components: + +- [ML Commons plugin]({{site.url}}{{site.baseurl}}/ml-commons-plugin/index/) for LLM integration +- [Search Relevance Workbench]({{site.url}}{{site.baseurl}}/search-plugins/search-relevance/index/) for evaluation workflows +- [Remote model connectors]({{site.url}}{{site.baseurl}}/ml-commons-plugin/remote-models/index/) for external LLM APIs +- [Search configurations]({{site.url}}{{site.baseurl}}/search-plugins/search-relevance/search-configuration/) for defining search strategies +- [Query sets]({{site.url}}{{site.baseurl}}/search-plugins/search-relevance/query-set/) for organizing test queries +- [Judgments]({{site.url}}{{site.baseurl}}/search-plugins/search-relevance/judgment/) for storing relevance assessments +- [Experiments]({{site.url}}{{site.baseurl}}/search-plugins/search-relevance/experiment/) for evaluating search quality + +## Prerequisites + +For this tutorial, you'll need: + +- OpenSearch 3.5 or newer with the Search Relevance Workbench plugin installed +- ML Commons plugin installed and configured +- An API key for an external LLM provider (OpenAI, AWS Bedrock, etc.) + +First, enable the Search Relevance Workbench and configure ML Commons: + +```json +PUT /_cluster/settings +{ + "persistent": { + "plugins.search_relevance.workbench_enabled": true, + "plugins.ml_commons.only_run_on_ml_node": "false", + "plugins.ml_commons.model_access_control_enabled": "true", + "plugins.ml_commons.allow_registering_model_via_url": "true" + } +} +``` +{% include copy-curl.html %} + +## Tutorial + +This tutorial consists of the following steps: + +{% include list.html list_items=page.steps%} + +You can follow this tutorial by using your command line or the OpenSearch Dashboards [Dev Tools console]({{site.url}}{{site.baseurl}}/dashboards/dev-tools/run-queries/). + +Some steps in the tutorial contain optional Test it{: .text-delta} sections. You can confirm that the step completed successfully by running the requests in these sections. + +After you're done, follow the steps in the [Clean up](#clean-up) section to delete all created components. + +### Step 1: Set up ML Commons and create an external LLM connector + +First, you'll create a connector to an external LLM service. This tutorial uses OpenAI's GPT models, but you can adapt it for other providers like AWS Bedrock. + +#### Step 1(a): Create an ML connector + +Create a connector to OpenAI's chat completion API. Replace `YOUR_API_KEY` with your actual OpenAI API key: + +```json +POST /_plugins/_ml/connectors/_create +{ + "name": "OpenAI Chat Connector", + "description": "Connector to OpenAI Chat API for LLM judgments", + "version": "1", + "protocol": "http", + "parameters": { + "endpoint": "api.openai.com", + "model": "gpt-3.5-turbo" + }, + "credential": { + "openAI_key": "YOUR_API_KEY" + }, + "actions": [ + { + "action_type": "predict", + "method": "POST", + "url": "https://api.openai.com/v1/chat/completions", + "headers": { + "Authorization": "Bearer ${credential.openAI_key}", + "Content-Type": "application/json" + }, + "request_body": "{ \"model\": \"${parameters.model}\", \"messages\": ${parameters.messages}, \"temperature\": 0 }" + } + ] +} +``` +{% include copy-curl.html %} + +The response contains the connector ID: + +```json +{ + "connector_id": "abc123def456" +} +``` + +You will use the returned `connector_id` in the next step. + +#### Step 1(b): Register and deploy the model + +Register and deploy the connector as a remote model: + +```json +POST /_plugins/_ml/models/_register?deploy=true +{ + "name": "openai_gpt-3.5-turbo", + "function_name": "remote", + "description": "External LLM model via OpenAI", + "connector_id": "abc123def456" +} +``` +{% include copy-curl.html %} + +Registering a model is an asynchronous task. OpenSearch sends back a task ID for this task: + +```json +{ + "task_id": "aFeif4oB5Vm0Tdw8yoN7", + "status": "CREATED" +} +``` + +You can check the status of the task by using the [Get ML Task API]({{site.url}}{{site.baseurl}}/ml-commons-plugin/api/tasks-apis/get-task/): + +```json +GET /_plugins/_ml/tasks/aFeif4oB5Vm0Tdw8yoN7 +``` + +OpenSearch saves the registered model in the model index. Deploying a model creates a model instance and caches the model in memory. + +Once the task is complete, the task state changes to `COMPLETED` and the [Get ML Task API]({{site.url}}{{site.baseurl}}/ml-commons-plugin/api/tasks-apis/get-task/) response contains the `model_id` for the deployed model (which is different from the initial `task_id`): + +```json +{ + "model_id": "DQmk2ZwBqLOthQZKMqU-", + "task_type": "REGISTER_MODEL", + "function_name": "REMOTE", + "state": "COMPLETED", + "worker_node": [ + "rbmK-mMDQfecqr41sOjEsA" + ], + "create_time": 1773177942532, + "last_update_time": 1773177942677, + "is_async": false +} +``` + +You'll need the `model_id` in order to use the deployed model for several of the following steps. + +{% include copy-curl.html %} + +
+ + Test it + + {: .text-delta} + +Test the model connection: + +```json +POST /_plugins/_ml/models/MODEL_ID_HERE/_predict +{ + "parameters": { + "messages": "[{\"role\": \"user\", \"content\": \"Say hello in one word\"}]" + } +} +``` +{% include copy-curl.html %} + +You should receive a response from the LLM. +
+ +### Step 2: Create a simple search index with sample data + +Now you'll create a simple index with product data for testing search relevance. + +#### Step 2(a): Create the index + +```json +PUT /products +{ + "mappings": { + "properties": { + "title": { + "type": "text" + }, + "description": { + "type": "text" + }, + "category": { + "type": "keyword" + }, + "brand": { + "type": "keyword" + }, + "price": { + "type": "float" + } + } + } +} +``` +{% include copy-curl.html %} + +#### Step 2(b): Index sample documents + +Add sample product documents using the bulk API: + +```json +POST /products/_bulk +{"index":{"_id":"1"}} +{"title":"Samsung 55-inch 4K Smart TV","description":"Ultra HD Smart TV with HDR and built-in streaming apps","category":"Electronics","brand":"Samsung","price":599.99} +{"index":{"_id":"2"}} +{"title":"LG 65-inch OLED TV","description":"Premium OLED display with perfect blacks and vibrant colors","category":"Electronics","brand":"LG","price":1299.99} +{"index":{"_id":"3"}} +{"title":"Sony Wireless Headphones","description":"Noise-canceling over-ear headphones with 30-hour battery","category":"Electronics","brand":"Sony","price":199.99} +{"index":{"_id":"4"}} +{"title":"Apple MacBook Pro 14-inch","description":"Professional laptop with M2 chip and Retina display","category":"Computers","brand":"Apple","price":1999.99} +{"index":{"_id":"5"}} +{"title":"Dell Gaming Monitor 27-inch","description":"High refresh rate gaming monitor with G-Sync support","category":"Computers","brand":"Dell","price":399.99} +``` + +{% include copy-curl.html %} + +
+ + Test it + + {: .text-delta} + +Verify the documents were indexed: + +```json +GET /products/_search +{ + "query": { + "match_all": {} + } +} +``` +{% include copy-curl.html %} +
+ +### Step 3: Create search configuration "baseline" + +Search configuration defines a search strategy to evaluate using the LLM-as-a-Judge judgments. + +```json +PUT /_plugins/_search_relevance/search_configurations +{ + "name": "baseline", + "query": "{\"query\":{\"multi_match\":{\"query\":\"%SearchText%\",\"fields\":[\"title\",\"description\",\"category\",\"brand\"]}}}", + "index": "products" +} +``` +{% include copy-curl.html %} + +The response contains the search configuration ID: + +```json +{ + "search_configuration_id": "baseline_config_id" +} +``` + +
+ + Test it + + {: .text-delta} + +List all search configurations: + +```json +GET _plugins/_search_relevance/search_configurations/_search +{ + "query": + { + "match_all": {} + } +} +``` +{% include copy-curl.html %} +
+ +### Step 4: Create a query set + +Query sets contain the test queries you'll use for evaluation. + +```json +PUT /_plugins/_search_relevance/query_sets +{ + "name": "Electronics Queries", + "description": "Test queries for electronics products", + "sampling": "manual", + "querySetQueries": [ + {"queryText": "smart tv"}, + {"queryText": "laptop computer"}, + {"queryText": "wireless headphones"} + ] +} +``` +{% include copy-curl.html %} + +The response contains the query set ID: + +```json +{ + "query_set_id": "electronics_queries_id" +} +``` + +
+ + Test it + + {: .text-delta} + +Verify the query set was created: + +```json +GET /_plugins/_search_relevance/query_sets/_search +{ + "query": { + "match": { + "name": "Electronics Queries" + } + } +} +``` +{% include copy-curl.html %} +
+ +### Step 5: Generate LLM judgments + +Now you'll create an LLM judgment that uses your deployed model to evaluate search results. + +```json +PUT /_plugins/_search_relevance/judgments +{ + "name": "LLM Judgment via OpenAI", + "description": "Uses GPT-3.5-turbo to evaluate product search results", + "type": "LLM_JUDGMENT", + "modelId": "MODEL_ID_HERE", + "querySetId": "electronics_queries_id", + "searchConfigurationList": ["baseline_config_id"], + "size": 10, + "tokenLimit": 4000, + "contextFields": ["title", "description", "category"], + "ignoreFailure": false, + "llmJudgmentRatingType": "SCORE0_1", + "promptTemplate": "Rate the relevance of these search results {{hits}} for the query '{{queryText}}' on a scale of 0-1, where 0 is completely irrelevant and 1 is perfectly relevant. Consider the product title, description, and category.", + "overwriteCache": false +} +``` +{% include copy-curl.html %} + +The response contains the judgment ID: + +```json +{ + "judgment_id": "LLM_JUDGEMENT_ID" +} +``` + +The LLM judgment process runs asynchronously. Wait a few moments for the judgments to be generated, then check the status: + +```json +GET /search-relevance-judgment/_doc/LLM_JUDGEMENT_ID +``` +{% include copy-curl.html %} + +You can see the judgments and how they were arrived at: + +```json +{ + "_index": "search-relevance-judgment", + "_id": "d6f73218-ad6b-408f-b6ab-186a47b27e87", + "_version": 2, + "_seq_no": 10, + "_primary_term": 1, + "found": true, + "_source": { + "id": "d6f73218-ad6b-408f-b6ab-186a47b27e87", + "timestamp": "2026-03-10T21:37:13.837Z", + "name": "LLM Judgment via OpenAI", + "status": "COMPLETED", + "type": "LLM_JUDGMENT", + "metadata": { + "contextFields": [ + "title", + "description", + "category" + ], + "ignoreFailure": false, + "llmJudgmentRatingType": "SCORE0_1", + "size": 10, + "modelId": "DQmk2ZwBqLOthQZKMqU-", + "overwriteCache": false, + "searchConfigurationList": [ + "0fa1fedb-4bcb-469d-9fcb-2a5cd6709e1d" + ], + "tokenLimit": 4000, + "promptTemplate": "Rate the relevance of these search results {{hits}} for the query '{{queryText}}' on a scale of 0-1, where 0 is completely irrelevant and 1 is perfectly relevant. Consider the product title, description, and category.", + "querySetId": "4c6bf6f4-c2e4-4c76-a668-82de11d14846" + }, + "judgmentRatings": [ + { + "query": "smart tv", + "ratings": [ + { + "docId": "1", + "rating": "0.9" + }, + { + "docId": "2", + "rating": "0.8" + } + ] + }, + { + "query": "laptop computer", + "ratings": [ + { + "docId": "4", + "rating": "0.9" + } + ] + }, + { + "query": "wireless headphones", + "ratings": [ + { + "docId": "3", + "rating": "1.0" + } + ] + } + ] + } +} +``` + + +
+ + Test it + + {: .text-delta} + +Check the judgment cache to see the individual generated ratings: + +```json +GET /.plugins-search-relevance-judgment-cache/_search +{ + "size": 5, + "query": { + "match_all": {} + } +} +``` +{% include copy-curl.html %} + +You should see documents with ratings between 0 and 1 generated by the LLM. +
+ +### Step 6: Run experiments with LLM judgments + +Finally, you'll create an experiment to evaluate your `baseline` search configuration using the LLM-generated judgments. + +**SHOULD THIS JUST BE SCREENSHOTS OF SRW????** + +#### Step 6(a): Create a pointwise experiment + +```json +PUT /_plugins/_search_relevance/experiments +{ + "querySetId": "electronics_queries_id", + "searchConfigurationList": ["baseline_config_id"], + "judgmentList": ["llm_judgment_id"], + "size": 10, + "type": "POINTWISE_EVALUATION" +} +``` +{% include copy-curl.html %} + +The response contains the experiment ID: + +```json +{ + "experiment_id": "pointwise_experiment_id" +} +``` + +#### Step 6(b): View experiment results + +```json +GET /_plugins/_search_relevance/experiments/pointwise_experiment_id +``` +{% include copy-curl.html %} + +The response shows evaluation metrics comparing your search configurations: + +
+ + Results + + {: .text-delta} + +```json +{ + "experiment_id": "pointwise_experiment_id", + "query_set_id": "electronics_queries_id", + "search_configuration_list": ["baseline_config_id"], + "judgment_list": ["llm_judgment_id"], + "type": "POINTWISE_EVALUATION", + "results": { + "baseline_config_id": { + "precision_at_1": 0.67, + "precision_at_3": 0.56, + "precision_at_5": 0.48, + "precision_at_10": 0.42, + "recall_at_1": 0.67, + "recall_at_3": 0.78, + "recall_at_5": 0.85, + "recall_at_10": 0.92, + "ndcg_at_1": 0.67, + "ndcg_at_3": 0.71, + "ndcg_at_5": 0.73, + "ndcg_at_10": 0.75 + }, + "title_boosted_config_id": { + "precision_at_1": 0.78, + "precision_at_3": 0.63, + "precision_at_5": 0.52, + "precision_at_10": 0.45, + "recall_at_1": 0.78, + "recall_at_3": 0.84, + "recall_at_5": 0.89, + "recall_at_10": 0.94, + "ndcg_at_1": 0.78, + "ndcg_at_3": 0.79, + "ndcg_at_5": 0.81, + "ndcg_at_10": 0.83 + } + } +} +``` + +In this example, the title-boosted configuration shows better performance across most metrics, indicating that boosting the title field improves search relevance for these queries. +
+ +
+ + Test it + + {: .text-delta} + +You can also view detailed evaluation results: + +```json +GET /search-relevance-evaluation-result/_search +{ + "query": { + "match": { + "experiment_id": "pointwise_experiment_id" + } + } +} +``` +{% include copy-curl.html %} +
+ +## Advanced features + +### Custom prompt templates + +You can customize the prompt template to focus on specific aspects of relevance: + +```json +PUT /_plugins/_search_relevance/judgments +{ + "name": "Custom Prompt Judgment", + "type": "LLM_JUDGMENT", + "modelId": "MODEL_ID_HERE", + "querySetId": "electronics_queries_id", + "searchConfigurationList": ["baseline_config_id"], + "promptTemplate": "As an e-commerce search expert, evaluate how well these products {{hits}} match the user's search for '{{queryText}}'. Consider product relevance, brand reputation, and price competitiveness. Rate each result from 0-1.", + "llmJudgmentRatingType": "SCORE0_1" +} +``` +{% include copy-curl.html %} + +### Binary relevance judgments + +For simpler relevance assessment, you can use binary (relevant/irrelevant) judgments: + +```json +PUT /_plugins/_search_relevance/judgments +{ + "name": "Binary LLM Judgment", + "type": "LLM_JUDGMENT", + "modelId": "MODEL_ID_HERE", + "querySetId": "electronics_queries_id", + "searchConfigurationList": ["baseline_config_id"], + "llmJudgmentRatingType": "RELEVANT_IRRELEVANT", + "promptTemplate": "Determine if these search results {{hits}} are relevant or irrelevant for the query '{{queryText}}'. Consider exact matches and semantic relevance." +} +``` +{% include copy-curl.html %} + +### Using different LLM providers + +You can adapt the connector configuration for other providers: + +#### AWS Bedrock example: + +```json +POST /_plugins/_ml/connectors/_create +{ + "name": "AWS Bedrock Connector", + "description": "Connector to AWS Bedrock", + "version": "1", + "protocol": "aws_sigv4", + "parameters": { + "region": "us-east-1", + "service_name": "bedrock", + "model": "anthropic.claude-v2" + }, + "credential": { + "access_key": "YOUR_ACCESS_KEY", + "secret_key": "YOUR_SECRET_KEY" + }, + "actions": [ + { + "action_type": "predict", + "method": "POST", + "url": "https://bedrock-runtime.${parameters.region}.amazonaws.com/model/${parameters.model}/invoke", + "request_body": "{ \"prompt\": \"${parameters.messages}\", \"max_tokens_to_sample\": 300 }" + } + ] +} +``` +{% include copy-curl.html %} + +## Clean up + +After you're done, delete the components you've created in this tutorial: + +```json +DELETE /products +``` +{% include copy-curl.html %} + +```json +DELETE /_plugins/_search_relevance/experiments/pointwise_experiment_id +``` +{% include copy-curl.html %} + +```json +DELETE /_plugins/_search_relevance/judgments/llm_judgment_id +``` +{% include copy-curl.html %} + +```json +DELETE /_plugins/_search_relevance/query_sets/electronics_queries_id +``` +{% include copy-curl.html %} + +```json +DELETE /_plugins/_search_relevance/search_configurations/baseline_config_id +``` +{% include copy-curl.html %} + +```json +DELETE /_plugins/_search_relevance/search_configurations/title_boosted_config_id +``` +{% include copy-curl.html %} + +```json +POST /_plugins/_ml/models/MODEL_ID_HERE/_undeploy +``` +{% include copy-curl.html %} + +```json +DELETE /_plugins/_ml/models/MODEL_ID_HERE +``` +{% include copy-curl.html %} + +```json +DELETE /_plugins/_ml/connectors/abc123def456 +``` +{% include copy-curl.html %} + +## Benefits of LLM as a Judge + +- **Scalability**: Generate judgments for thousands of query-document pairs without manual annotation +- **Consistency**: LLMs provide consistent evaluation criteria across all judgments +- **Cost-effective**: Reduce the need for expensive human annotation while maintaining quality +- **Rapid iteration**: Quickly evaluate new search configurations and features +- **Semantic understanding**: LLMs can assess semantic relevance beyond keyword matching + +## Further reading + +- Learn more about [Search Relevance Workbench]({{site.url}}{{site.baseurl}}/search-plugins/search-relevance/index/) +- Explore [ML Commons remote models]({{site.url}}{{site.baseurl}}/ml-commons-plugin/remote-models/index/) +- Read about [search evaluation metrics]({{site.url}}{{site.baseurl}}/search-plugins/search-relevance/evaluation-metrics/) + +## Next steps + +- Experiment with different LLM models and prompt templates +- Create more sophisticated query sets for comprehensive evaluation +- Integrate LLM judgments into your search development workflow +- Compare LLM judgments with human annotations to validate quality From 134a8ff8da7495b77d4ec1a1a40208e6e90612b2 Mon Sep 17 00:00:00 2001 From: Eric Pugh Date: Tue, 10 Mar 2026 19:07:39 -0400 Subject: [PATCH 03/28] add reference docs. Signed-off-by: Eric Pugh --- _config.yml | 4 ++- _search-plugins/search-relevance/judgments.md | 32 ++++++++++++++++--- 2 files changed, 31 insertions(+), 5 deletions(-) diff --git a/_config.yml b/_config.yml index ca86146a7f3..38a1283d425 100644 --- a/_config.yml +++ b/_config.yml @@ -13,7 +13,9 @@ lucene_version: '10_3_2' # Build settings markdown: kramdown -remote_theme: pmarsceill/just-the-docs@v0.3.3 +#remote_theme: benbalter/retlab +#remote_theme: pmarsceill/just-the-docs@v0.3.3 +theme: just-the-docs # Kramdown settings kramdown: diff --git a/_search-plugins/search-relevance/judgments.md b/_search-plugins/search-relevance/judgments.md index 0daa8e577f0..a051d53de53 100644 --- a/_search-plugins/search-relevance/judgments.md +++ b/_search-plugins/search-relevance/judgments.md @@ -107,7 +107,7 @@ Parameter | Data type | Description ### Using LLM-as-a-Judge -If you want to use judgments in your experimentation process but do not have a team of humans or the user behavior data to calculate judgments based on interactions, you can use an LLM in Search Relevance Workbench to generate judgments, aka _LLM-as-a-Judge_. +If you want to use judgments in your experimentation process but do not have a team of humans or the user behavior data to calculate judgments based on interactions, you can use an LLM in Search Relevance Workbench to generate judgments. See the [LLM-as-a-Judge tutorial]({{site.url}}{{site.baseurl}}/tutorials/llm-as-a-judge-tutorial/) for a step by step guide. #### Prerequisites To use LLM-as-a-Judge, ensure that you have configured the following components: @@ -122,23 +122,47 @@ The AI-assisted judgment process works as follows: - The LLM is then called with a predefined prompt (stored as a static variable in the backend) to generate a judgment for each query/document pair. - All generated judgments are stored in the judgments cache index for reuse in future experiments. -To create a judgment list, provide the model ID of the LLM, an available query set, and a created search configuration: +To create a judgment list, provide the model ID of the LLM, an available query set, and a created search configuration. +The below example uses a fairly generic prompt template with a scale of 0.0 to 1.0 for judgments. To winnow down the volume of data to be evaluated by the LLM, and therefore reduce the cost, you can specify which fields from the results to be sent via the `contextFields`. ```json PUT _plugins/_search_relevance/judgments { "name":"AI-assisted judgment list", + "description": "Uses GPT-3.5-turbo to evaluate product search results", "type":"LLM_JUDGMENT", + "modelId":"N8AE1osB0jLkkocYjz7D", "querySetId":"5f0115ad-94b9-403a-912f-3e762870ccf6", "searchConfigurationList":["2f90d4fd-bd5e-450f-95bb-eabe4a740bd1"], "size":5, - "modelId":"N8AE1osB0jLkkocYjz7D", - "contextFields":[] + "contextFields": ["title", "description", "category"], + "llmJudgmentRatingType": "SCORE0_1", + "promptTemplate": "Rate the relevance of these search results {{hits}} for the query '{{queryText}}' on a scale of 0-1, where 0 is completely irrelevant and 1 is perfectly relevant. Consider the product title, description, and category." } ``` {% include copy-curl.html %} +#### Request body fields + +The process of creating LLM based judgments supports the following parameters. + +Parameter | Data type | Description +:--- | :--- | :--- +`name` | String | The name of the judgment list. +`description` | String | Optional. A description of the judgment list. +`type` | String | Set to `LLM_JUDGMENT`. +`modelId` | String | The ID of the deployed ML model to use for generating judgments. Must be a remote model connected to an external LLM service. +`querySetId` | String | The ID of the query set containing the queries to evaluate. +`searchConfigurationList` | Array of strings | List of search configuration IDs to use for retrieving documents to evaluate. +`size` | Integer | The number of top documents to retrieve and evaluate for each query. Default is 10. +`tokenLimit` | Integer | The maximum number of tokens to send to the LLM in a single request. Used to batch documents when the total content exceeds this limit. Default is 4000. +`contextFields` | Array of strings | Optional. Specifies which document fields to include when sending content to the LLM. If not specified, the entire document source is sent. Use this to reduce costs and focus the LLM on relevant fields. +`ignoreFailure` | Boolean | Whether to continue processing other documents if the LLM fails to generate a judgment for some documents. Default is false. +`llmJudgmentRatingType` | String | The type of rating scale to use. Options: `SCORE0_1` (numeric scale 0-1) or `RELEVANT_IRRELEVANT` (binary relevant/irrelevant). +`promptTemplate` | String | Optional. Custom prompt template for the LLM. Supports placeholders: `{{queryText}}`, `{{hits}}`. If not provided, a default template is used. +`overwriteCache` | Boolean | Whether to overwrite existing cached judgments for the same query-document pairs. Default is false (reuse cached judgments). + ## Implicit judgments Implicit judgments are derived from user interactions. Several models use signals from user behavior to calculate these judgments. One such model is Clicks Over Expected Clicks (COEC), a click model implemented in Search Relevance Workbench. From fe54981c0529ef5e195fdd92cac1c6d849c617b9 Mon Sep 17 00:00:00 2001 From: Eric Pugh Date: Fri, 3 Apr 2026 20:47:09 -0400 Subject: [PATCH 04/28] Fix up vale violations Signed-off-by: Eric Pugh --- _search-plugins/search-relevance/judgments.md | 6 +++--- _tutorials/llm-as-a-judge-tutorial.md | 2 +- 2 files changed, 4 insertions(+), 4 deletions(-) diff --git a/_search-plugins/search-relevance/judgments.md b/_search-plugins/search-relevance/judgments.md index a051d53de53..763f22940cf 100644 --- a/_search-plugins/search-relevance/judgments.md +++ b/_search-plugins/search-relevance/judgments.md @@ -107,7 +107,7 @@ Parameter | Data type | Description ### Using LLM-as-a-Judge -If you want to use judgments in your experimentation process but do not have a team of humans or the user behavior data to calculate judgments based on interactions, you can use an LLM in Search Relevance Workbench to generate judgments. See the [LLM-as-a-Judge tutorial]({{site.url}}{{site.baseurl}}/tutorials/llm-as-a-judge-tutorial/) for a step by step guide. +If you want to use judgments in your experimentation process but do not have a team of humans or the user behavior data to calculate judgments based on interactions, you can use an LLM in Search Relevance Workbench to generate judgments. See the [LLM-as-a-Judge tutorial]({{site.url}}{{site.baseurl}}/tutorials/llm-as-a-judge-tutorial/) for a step by step guide. #### Prerequisites To use LLM-as-a-Judge, ensure that you have configured the following components: @@ -124,7 +124,7 @@ The AI-assisted judgment process works as follows: To create a judgment list, provide the model ID of the LLM, an available query set, and a created search configuration. -The below example uses a fairly generic prompt template with a scale of 0.0 to 1.0 for judgments. To winnow down the volume of data to be evaluated by the LLM, and therefore reduce the cost, you can specify which fields from the results to be sent via the `contextFields`. +The following example uses a fairly generic prompt template with a scale of 0.0 to 1.0 for judgments. To winnow down the volume of data to be evaluated by the LLM, and therefore reduce the cost, you can specify which fields from the results to be sent using the `contextFields` parameter. ```json PUT _plugins/_search_relevance/judgments @@ -159,7 +159,7 @@ Parameter | Data type | Description `tokenLimit` | Integer | The maximum number of tokens to send to the LLM in a single request. Used to batch documents when the total content exceeds this limit. Default is 4000. `contextFields` | Array of strings | Optional. Specifies which document fields to include when sending content to the LLM. If not specified, the entire document source is sent. Use this to reduce costs and focus the LLM on relevant fields. `ignoreFailure` | Boolean | Whether to continue processing other documents if the LLM fails to generate a judgment for some documents. Default is false. -`llmJudgmentRatingType` | String | The type of rating scale to use. Options: `SCORE0_1` (numeric scale 0-1) or `RELEVANT_IRRELEVANT` (binary relevant/irrelevant). +`llmJudgmentRatingType` | String | The type of rating scale to use. Options: `SCORE0_1` (numeric scale 0--1) or `RELEVANT_IRRELEVANT` (binary relevant/irrelevant). `promptTemplate` | String | Optional. Custom prompt template for the LLM. Supports placeholders: `{{queryText}}`, `{{hits}}`. If not provided, a default template is used. `overwriteCache` | Boolean | Whether to overwrite existing cached judgments for the same query-document pairs. Default is false (reuse cached judgments). diff --git a/_tutorials/llm-as-a-judge-tutorial.md b/_tutorials/llm-as-a-judge-tutorial.md index 672109b8bb4..195f0f95705 100644 --- a/_tutorials/llm-as-a-judge-tutorial.md +++ b/_tutorials/llm-as-a-judge-tutorial.md @@ -47,7 +47,7 @@ For this tutorial, you'll need: - OpenSearch 3.5 or newer with the Search Relevance Workbench plugin installed - ML Commons plugin installed and configured -- An API key for an external LLM provider (OpenAI, AWS Bedrock, etc.) +- An API key for an external LLM provider (OpenAI, AWS Bedrock) First, enable the Search Relevance Workbench and configure ML Commons: From d0fc81ce9b809af7f9230a1a3e2bf9a18415b30a Mon Sep 17 00:00:00 2001 From: Eric Pugh Date: Fri, 3 Apr 2026 20:54:14 -0400 Subject: [PATCH 05/28] Standarize on the name wikipedia uses for this technique Signed-off-by: Eric Pugh --- _tutorials/index.md | 2 +- _tutorials/llm-as-a-judge-tutorial.md | 10 +++++----- 2 files changed, 6 insertions(+), 6 deletions(-) diff --git a/_tutorials/index.md b/_tutorials/index.md index e40080f5951..f686a8efce5 100644 --- a/_tutorials/index.md +++ b/_tutorials/index.md @@ -29,7 +29,7 @@ cards: description: "Build filterable search experiences for applications like e-commerce or location search" link: "/tutorials/faceted-search/" - heading: "LLM-as-a-Judge" - description: "Getting started with LLM as a Judge for search relevance evaluation" + description: "Getting started with LLM-as-a-Judge for search relevance evaluation" link: "/tutorials/llm-as-a-judge-tutorial/" --- diff --git a/_tutorials/llm-as-a-judge-tutorial.md b/_tutorials/llm-as-a-judge-tutorial.md index 195f0f95705..7bde4209aff 100644 --- a/_tutorials/llm-as-a-judge-tutorial.md +++ b/_tutorials/llm-as-a-judge-tutorial.md @@ -1,6 +1,6 @@ --- layout: default -title: Getting started with LLM as a Judge for search relevance evaluation +title: Getting started with LLM-as-a-Judge for search relevance evaluation has_children: false parent: Search Relevance Workbench nav_order: 4 @@ -19,9 +19,9 @@ steps: link: "/tutorials/search-relevance/llm-as-judge-tutorial/#step-6-run-experiments-with-llm-judgments" --- -# Getting started with LLM as a Judge for search relevance evaluation +# Getting started with LLM-as-a-Judge for search relevance evaluation -LLM as a Judge is a technique that leverages large language models to automatically evaluate search result relevance, providing a scalable and consistent approach to search quality assessment. +LLM-as-a-Judge is a technique that leverages large language models to automatically evaluate search result relevance, providing a scalable and consistent approach to search quality assessment. In this tutorial, you'll learn how to: @@ -29,7 +29,7 @@ In this tutorial, you'll learn how to: - **Generate automated judgments**: Use LLMs to evaluate search result relevance without manual annotation. - **Compare search configurations**: Run experiments to determine which search approach performs better using LLM-generated judgments. -## OpenSearch components for LLM as a Judge +## OpenSearch components for LLM-as-a-Judge In this tutorial, you'll use the following OpenSearch components: @@ -720,7 +720,7 @@ DELETE /_plugins/_ml/connectors/abc123def456 ``` {% include copy-curl.html %} -## Benefits of LLM as a Judge +## Benefits of LLM-as-a-Judge - **Scalability**: Generate judgments for thousands of query-document pairs without manual annotation - **Consistency**: LLMs provide consistent evaluation criteria across all judgments From ca4b090c630ab99758e55eba56e29c707657e7ef Mon Sep 17 00:00:00 2001 From: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Date: Fri, 6 Mar 2026 16:42:46 -0500 Subject: [PATCH 06/28] Clarify the effect of bulk operations on ingest pipelines (#11978) Signed-off-by: Fanit Kolchina Signed-off-by: Eric Pugh --- _api-reference/document-apis/bulk.md | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/_api-reference/document-apis/bulk.md b/_api-reference/document-apis/bulk.md index 99bf887f328..75f6d063e81 100644 --- a/_api-reference/document-apis/bulk.md +++ b/_api-reference/document-apis/bulk.md @@ -96,6 +96,9 @@ By default, this action updates existing documents and returns an error if the d { "doc" : { "title": "World War Z" } } ``` +User-defined ingest pipelines are not executed for update operations. If you need to process documents through an ingest pipeline, use an [upsert](#upsert) operation instead. +{: .note} + ### Upsert To upsert a document, use one of the following options: @@ -115,6 +118,9 @@ To upsert a document, use one of the following options: Use this option when you want to only update specific fields when a document exists but insert a complete document when it doesn't exist. +Upsert operations trigger ingest pipelines, allowing you to preprocess documents before they are indexed or updated. +{: .note} + ### Script You can specify a script for more complex document updates by defining the script with the `source` or `id` from a document: From 2631cde44ebfae9c5697946e99731c5497efd19c Mon Sep 17 00:00:00 2001 From: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Date: Fri, 6 Mar 2026 17:10:06 -0500 Subject: [PATCH 07/28] Add warning about truncation to ML documentation (#11981) * Add warning about truncation to ML documentation Signed-off-by: Fanit Kolchina * Apply suggestions from code review Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> * Apply suggestion from @kolchfa-aws Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> --------- Signed-off-by: Fanit Kolchina Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Signed-off-by: Eric Pugh --- _ingest-pipelines/processors/text-embedding.md | 3 +++ _ml-commons-plugin/pretrained-models.md | 5 ++++- _vector-search/getting-started/auto-generated-embeddings.md | 3 +++ 3 files changed, 10 insertions(+), 1 deletion(-) diff --git a/_ingest-pipelines/processors/text-embedding.md b/_ingest-pipelines/processors/text-embedding.md index 958475751d6..58851e6a970 100644 --- a/_ingest-pipelines/processors/text-embedding.md +++ b/_ingest-pipelines/processors/text-embedding.md @@ -15,6 +15,9 @@ The `text_embedding` processor is used to generate vector embeddings from text f Before using the `text_embedding` processor, you must set up a machine learning (ML) model. For more information, see [Choosing a model]({{site.url}}{{site.baseurl}}/ml-commons-plugin/integrating-ml-models/#choosing-a-model). {: .note} +**Token limits and truncation**: Text embedding models have maximum token limits (typically 512 tokens for BERT-based models). When a document exceeds this limit, the model automatically truncates the text, and the truncated content is not represented in the embeddings. This can significantly impact search relevance because documents may not be returned in search results if the relevant content was truncated. To avoid this issue, split long documents into smaller chunks before generating embeddings. +{: .warning} + The following is the syntax for the `text_embedding` processor: ```json diff --git a/_ml-commons-plugin/pretrained-models.md b/_ml-commons-plugin/pretrained-models.md index e42da6fbb52..f25e1b669c1 100644 --- a/_ml-commons-plugin/pretrained-models.md +++ b/_ml-commons-plugin/pretrained-models.md @@ -23,7 +23,10 @@ Running local models on the CentOS 7 operating system is not supported. Moreover Sentence transformer models map sentences and paragraphs across a dimensional dense vector space. The number of vectors depends on the type of model. You can use these models for use cases such as clustering or semantic search. -The following table provides a list of sentence transformer models and artifact links you can use to download them. Note that you must prefix the model name with `huggingface/`, as shown in the **Model name** column. +The following table provides a list of sentence transformer models and artifact links you can use to download them. Note that you must prefix the model name with `huggingface/`, as shown in the **Model name** column. + +**Token limits and truncation**: Text embedding models have maximum token limits (typically 512 tokens for BERT-based models). When a document exceeds this limit, the model automatically truncates the text, and the truncated content is not represented in the embeddings. This can significantly impact search relevance because documents may not be returned in search results if the relevant content was truncated. To avoid this issue, split long documents into smaller chunks before generating embeddings. +{: .warning} | Model name | Version | Vector dimensions | Auto-truncation | TorchScript artifact | ONNX artifact | |:---|:---|:---|:---|:---|:---| diff --git a/_vector-search/getting-started/auto-generated-embeddings.md b/_vector-search/getting-started/auto-generated-embeddings.md index fdaaa5f0476..f899eff51f9 100644 --- a/_vector-search/getting-started/auto-generated-embeddings.md +++ b/_vector-search/getting-started/auto-generated-embeddings.md @@ -46,6 +46,9 @@ In this example, you'll use the [DistilBERT](https://huggingface.co/docs/transfo Take note of the dimensionality of the model because you'll need it when you set up a vector index. {: .important} +**Token limits and truncation**: Text embedding models have maximum token limits (typically 512 tokens for BERT-based models). When a document exceeds this limit, the model automatically truncates the text, and the truncated content is not represented in the embeddings. This can significantly impact search relevance because documents may not be returned in search results if the relevant content was truncated. To avoid this issue, split long documents into smaller chunks before generating embeddings. +{: .warning} + ## Manual setup For more control over the configuration, you can set up each component manually using the following steps. From 31e8ae3449a49c6518d3ac802141f1a58a147e7a Mon Sep 17 00:00:00 2001 From: Yuan Date: Mon, 9 Mar 2026 21:52:45 +0800 Subject: [PATCH 08/28] Fix create sparse vector index error (#12066) Signed-off-by: xiaoyuan0821 Co-authored-by: x00815292 Signed-off-by: Eric Pugh --- .../supported-field-types/sparse-vector.md | 26 +++++++++---------- 1 file changed, 13 insertions(+), 13 deletions(-) diff --git a/_mappings/supported-field-types/sparse-vector.md b/_mappings/supported-field-types/sparse-vector.md index 4e9c207085a..c40ea40b09f 100644 --- a/_mappings/supported-field-types/sparse-vector.md +++ b/_mappings/supported-field-types/sparse-vector.md @@ -51,19 +51,19 @@ PUT sparse-vector-index "settings": { "index": { "sparse": true - }, - "mappings": { - "properties": { - "sparse_embedding": { - "type": "sparse_vector", - "method": { - "name": "seismic", - "parameters": { - "n_postings": 300, - "cluster_ratio": 0.1, - "summary_prune_ratio": 0.4, - "approximate_threshold": 1000000 - } + } + }, + "mappings": { + "properties": { + "sparse_embedding": { + "type": "sparse_vector", + "method": { + "name": "seismic", + "parameters": { + "n_postings": 300, + "cluster_ratio": 0.1, + "summary_prune_ratio": 0.4, + "approximate_threshold": 1000000 } } } From a7627c02ee2901a292fa76c6f9a71b7ff1d900ae Mon Sep 17 00:00:00 2001 From: Mykola Shestopal Date: Mon, 9 Mar 2026 15:52:55 +0200 Subject: [PATCH 09/28] Fix formatting for Boolean AND examples in dql.md (#12065) Signed-off-by: Mykola Shestopal Signed-off-by: Eric Pugh --- _dashboards/dql.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_dashboards/dql.md b/_dashboards/dql.md index 4b7c0237533..3aff1555117 100644 --- a/_dashboards/dql.md +++ b/_dashboards/dql.md @@ -142,7 +142,7 @@ The following table provides a quick reference for both query language commands. | Numeric range | `page_views >= 100 and page_views <= 300`

`not page_views: 100` (results include documents that don't contain a `page_views` field)

See [Ranges](#ranges)| `page_views:[100 TO 300]`

`page_views:(>=100 AND <=300)`

`page_views:(+>=100 +<=300)`

`page_views:[100 TO *]`

`page_views:>=100`

`NOT page_views:100` (results include documents that don't contain a `page_views` field)

See [Ranges]({{site.url}}{{site.baseurl}}/query-dsl/full-text/query-string/#ranges)| | Date range | `date >= "1939-01-01" and date <= "2013-12-31"`

`not date: "1939-09-08"` | `date:[1939-01-01 TO 2013-12-31]`

`NOT date:1939-09-08`

Supports all numeric range syntax constructs| | Exclusive range | Not supported | `page_views: {100 TO 300}` (returns documents whose `page_views` are between `100` and `300`, excluding `100` and `300`) | -| Boolean `AND` | `media_type:film AND page_views:100`

`media_type:film and page_views:100`| `media_type:film AND page_views:100`

`+media_type:film +page_views:100`| +| Boolean `AND` | `media_type: film AND page_views: 100`

`media_type: film and page_views: 100`| `media_type:film AND page_views:100`

`+media_type:film +page_views:100`| | Boolean `NOT` | `NOT media_type: article`

`not media_type: article` | `NOT media_type:article`

`-media_type:article` | | Boolean `OR` | `title: wind OR description: film`

`title: wind or description: film` | `title: wind OR description: film` | | Required/Prohibited operators | Not supported | Supports both `+` (required operator) and `-` (prohibited operator)

`+title:wind -media_type:article` (returns documents in which `title` contains `wind` but `media_type` does not contain `article`) | From 9d0cc772ebb5afb54d035f5c81a6137a8791b66b Mon Sep 17 00:00:00 2001 From: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Date: Mon, 9 Mar 2026 15:11:40 -0400 Subject: [PATCH 10/28] Add restoring snapshot from a remote-backed cluster docs (#12020) * Add restoring snapshot from a remote-backed cluster docs Signed-off-by: Fanit Kolchina * Formatting Signed-off-by: Fanit Kolchina * Link fix and reword Signed-off-by: Fanit Kolchina --------- Signed-off-by: Fanit Kolchina Signed-off-by: Eric Pugh --- _api-reference/snapshots/restore-snapshot.md | 112 +++++++++++++----- .../snapshots/snapshot-restore.md | 100 ++++++++++++++++ 2 files changed, 184 insertions(+), 28 deletions(-) diff --git a/_api-reference/snapshots/restore-snapshot.md b/_api-reference/snapshots/restore-snapshot.md index 301829e02fe..10a849f7100 100644 --- a/_api-reference/snapshots/restore-snapshot.md +++ b/_api-reference/snapshots/restore-snapshot.md @@ -1,6 +1,6 @@ --- layout: default -title: Restore Snapshot +title: Restore snapshot parent: Snapshot APIs nav_order: 9 @@ -16,7 +16,7 @@ Restores a snapshot of a cluster or specified data streams and indexes. * For information about data streams, see [Data streams]({{site.url}}{{site.baseurl}}/opensearch/data-streams). -If open indexes with the same name that you want to restore already exist in the cluster, you must close, delete, or rename the indexes. See [Example request](#example-request) for information about renaming an index. See [Close index]({{site.url}}{{site.baseurl}}/api-reference/index-apis/close-index/) for information about closing an index. +If open indexes with the same name that you want to restore already exist in the cluster, you must close, delete, or rename the indexes. For information about renaming an index, see [Example requests](#example-requests). For information about closing an index, see [Close index]({{site.url}}{{site.baseurl}}/api-reference/index-apis/close-index/). {: .note} ## Endpoints @@ -29,14 +29,14 @@ POST _snapshot///_restore | Parameter | Data type | Description | :--- | :--- | :--- -| repository | String | Repository containing the snapshot to restore. | -| snapshot | String | Snapshot to restore. | +| `repository` | String | Repository containing the snapshot to restore. | +| `snapshot` | String | Snapshot to restore. | ## Query parameters Parameter | Data type | Description :--- | :--- | :--- -wait_for_completion | Boolean | Whether to wait for snapshot restoration to complete before continuing. | +`wait_for_completion` | Boolean | Whether to wait for snapshot restoration to complete before continuing. | ## Request body fields @@ -44,19 +44,20 @@ All request body parameters are optional. | Parameter | Data type | Description | :--- | :--- | :--- -| ignore_unavailable | Boolean | How to handle data streams or indices that are missing or closed. If `false`, the request returns an error for any data stream or index that is missing or closed. If `true`, the request ignores data streams and indices in indices that are missing or closed. Defaults to `false`. | -| ignore_index_settings | Boolean | A comma-delimited list of index settings that you don't want to restore from a snapshot. | -| include_aliases | Boolean | How to handle index aliases from the original snapshot. If `true`, index aliases from the original snapshot are restored. If `false`, aliases along with associated indices are not restored. Defaults to `true`. | -| include_global_state | Boolean | Whether to restore the current cluster state1. If `false`, the cluster state is not restored. If true, the current cluster state is restored. Defaults to `false`.| -| index_settings | String | A comma-delimited list of settings to add or change in all restored indices. Use this parameter to override index settings during snapshot restoration. For data streams, these index settings are applied to the restored backing indices. | -| indices | String | A comma-delimited list of data streams and indices to restore from the snapshot. Multi-index syntax is supported. By default, a restore operation includes all data streams and indices in the snapshot. If this argument is provided, the restore operation only includes the data streams and indices that you specify. | -| partial | Boolean | How the restore operation will behave if indices in the snapshot do not have all primary shards available. If `false`, the entire restore operation fails if any indices in the snapshot do not have all primary shards available.

If `true`, allows the restoration of a partial snapshot of indices with unavailable shards. Only shards that were successfully included in the snapshot are restored. All missing shards are recreated as empty. By default, the entire restore operation fails if one or more indices included in the snapshot do not have all primary shards available. To change this behavior, set `partial` to `true`. Defaults to `false`. | -| rename_pattern | String | The pattern to apply to the restored data streams and indexes. Data streams and indexes matching the rename pattern will be renamed according to the `rename_replacement` setting.

The rename pattern is applied as defined by the regular expression that supports referencing the original text.

The request fails if two or more data streams or indexes are renamed to the same name. If you rename a restored data stream, its backing indexes are also renamed. For example, if you rename the logs data stream to `recovered-logs`, the backing index `.ds-logs-1` is renamed to `.ds-recovered-logs-1`.

If you rename a restored stream, ensure an index template matches the new stream name. If there are no matching index template names, the stream cannot roll over, and new backing indexes are not created.| -| rename_replacement | String | The rename replacement string.| -| rename_alias_pattern | String | The pattern to apply to the restored aliases. Aliases matching the rename pattern will be renamed according to the `rename_alias_replacement` setting.

The rename pattern is applied as defined by the regular expression that supports referencing the original text.

If two or more aliases are renamed to the same name, these aliases will be merged into one.| -| rename_alias_replacement | String | The rename replacement string for aliases.| -| source_remote_store_repository | String | The name of the remote store repository of the source index being restored. If not provided, the Snapshot Restore API will use the repository that was registered when the snapshot was created. -| wait_for_completion | Boolean | Whether to return a response after the restore operation has completed. If `false`, the request returns a response when the restore operation initializes. If `true`, the request returns a response when the restore operation completes. Default is `false`. | +| `ignore_unavailable` | Boolean | How to handle data streams or indices that are missing or closed. If `false`, the request returns an error for any data stream or index that is missing or closed. If `true`, the request ignores data streams and indices in indices that are missing or closed. Defaults to `false`. | +| `ignore_index_settings` | Boolean | A comma-delimited list of index settings that you don't want to restore from a snapshot. | +| `include_aliases` | Boolean | How to handle index aliases from the original snapshot. If `true`, index aliases from the original snapshot are restored. If `false`, aliases along with associated indices are not restored. Defaults to `true`. | +| `include_global_state` | Boolean | Whether to restore the current cluster state1. If `false`, the cluster state is not restored. If true, the current cluster state is restored. Defaults to `false`.| +| `index_settings` | String | A comma-delimited list of settings to add or change in all restored indices. Use this parameter to override index settings during snapshot restoration. For data streams, these index settings are applied to the restored backing indices. | +| `indices` | String | A comma-delimited list of data streams and indices to restore from the snapshot. Multi-index syntax is supported. By default, a restore operation includes all data streams and indices in the snapshot. If this argument is provided, the restore operation only includes the data streams and indices that you specify. | +| `partial` | Boolean | How the restore operation will behave if indices in the snapshot do not have all primary shards available. If `false`, the entire restore operation fails if any indices in the snapshot do not have all primary shards available.

If `true`, allows the restoration of a partial snapshot of indices with unavailable shards. Only shards that were successfully included in the snapshot are restored. All missing shards are recreated as empty. By default, the entire restore operation fails if one or more indices included in the snapshot do not have all primary shards available. To change this behavior, set `partial` to `true`. Defaults to `false`. | +| `rename_pattern` | String | The pattern to apply to the restored data streams and indexes. Data streams and indexes matching the rename pattern will be renamed according to the `rename_replacement` setting.

The rename pattern is applied as defined by the regular expression that supports referencing the original text.

The request fails if two or more data streams or indexes are renamed to the same name. If you rename a restored data stream, its backing indexes are also renamed. For example, if you rename the logs data stream to `recovered-logs`, the backing index `.ds-logs-1` is renamed to `.ds-recovered-logs-1`.

If you rename a restored stream, ensure an index template matches the new stream name. If there are no matching index template names, the stream cannot roll over, and new backing indexes are not created.| +| `rename_replacement` | String | The rename replacement string.| +| `rename_alias_pattern` | String | The pattern to apply to the restored aliases. Aliases matching the rename pattern will be renamed according to the `rename_alias_replacement` setting.

The rename pattern is applied as defined by the regular expression that supports referencing the original text.

If two or more aliases are renamed to the same name, these aliases will be merged into one.| +| `rename_alias_replacement` | String | The rename replacement string for aliases.| +| `source_remote_store_repository` | String | The name of the remote segment repository of the source index being restored. Required only when both source and target clusters use remote-backed storage and have different remote store repositories. The specified repository must be registered as read-only on the target cluster before restoring. If not provided, the Snapshot Restore API will use the repositories that were registered when the snapshot was created. +| `source_remote_translog_repository` | String | The name of the remote translog repository of the source index being restored. Required only when both source and target clusters use remote-backed storage and have different remote store repositories. The specified repository must be registered as read-only on the target cluster before restoring. +| `wait_for_completion` | Boolean | Whether to return a response after the restore operation has completed. If `false`, the request returns a response when the restore operation initializes. If `true`, the request returns a response when the restore operation completes. Default is `false`. | storage_type | `local` indicates that all snapshot metadata and index data will be downloaded to local storage.

`remote_snapshot` indicates that snapshot metadata will be downloaded to the cluster, but the remote repository will remain the authoritative store of the index data. Data will be downloaded and cached as necessary to service queries. At least one node in the cluster must be configured with the [search role]({{site.url}}{{site.baseurl}}/security/access-control/users-roles/) in order to restore a snapshot using the type `remote_snapshot`.

Defaults to `local`. 1The cluster state includes: @@ -66,7 +67,11 @@ storage_type | `local` indicates that all snapshot metadata and index data will * Ingest pipelines * Index lifecycle policies -## Example request +## Example requests + +The following examples demonstrate different snapshot restore scenarios. + +### Basic restore The following request restores the `opendistro-reports-definitions` index from `my-first-snapshot`. The `rename_pattern` and `rename_replacement` combination causes the index to be renamed to `opendistro-reports-definitions_restored` because duplicate open index names in a cluster are not allowed. @@ -118,11 +123,60 @@ response = client.snapshot.restore( python=step1_python %} +### Cross-cluster restore with remote-backed storage + +Remote-backed storage is a feature where OpenSearch automatically backs up segments and translogs to remote repositories. When restoring snapshots between clusters that **both use remote-backed storage** with different remote store repositories, use both the `source_remote_store_repository` and `source_remote_translog_repository` parameters. + +The following example restores an index from a snapshot taken on a source cluster to a target cluster. In this example, `source-cluster-snapshots` is the snapshot repository containing the snapshot from the source cluster, `snapshot-1` is the snapshot name, `my-remote-index` is the index to restore, `source-remote-segment-repo` is the remote segment repository from the source cluster (must be registered as read-only on the target cluster), and `source-remote-translog-repo` is the remote translog repository from the source cluster (must be registered as read-only on the target cluster): + + +{% capture step1_rest %} +POST /_snapshot/source-cluster-snapshots/snapshot-1/_restore +{ + "indices": "my-remote-index", + "source_remote_store_repository": "source-remote-segment-repo", + "source_remote_translog_repository": "source-remote-translog-repo" +} +{% endcapture %} + +{% capture step1_python %} + + +response = client.snapshot.restore( + repository = "source-cluster-snapshots", + snapshot = "snapshot-1", + body = { + "indices": "my-remote-index", + "source_remote_store_repository": "source-remote-segment-repo", + "source_remote_translog_repository": "source-remote-translog-repo" + } +) + +{% endcapture %} + +{% include code-block.html + rest=step1_rest + python=step1_python %} + + +The target cluster will restore the index and configure it to read remote segments and translogs from the source cluster's remote store repositories. + +For the complete step-by-step procedure, see [Restoring snapshots across remote-backed clusters]({{site.url}}{{site.baseurl}}/tuning-your-cluster/availability-and-recovery/snapshots/snapshot-restore/#restoring-snapshots-across-remote-backed-clusters). + ## Example response Upon success, the response returns the following JSON object: -````json +```json { "snapshot" : { "snapshot" : "my-first-snapshot", @@ -134,21 +188,23 @@ Upon success, the response returns the following JSON object: } } } -```` +``` + Except for the snapshot name, all properties are empty or `0`. This is because any changes made to the volume after the snapshot was generated are lost. However, if you invoke the [Get snapshot]({{site.url}}{{site.baseurl}}/api-reference/snapshots/get-snapshot) API to examine the snapshot, a fully populated snapshot object is returned. ## Response body fields +The following table lists all available response body fields. + | Field | Data type | Description | | :--- | :--- | :--- | -| snapshot | string | Snapshot name. | -| indices | array | Indices in the snapshot. | -| shards | object | Total number of shards created along with number of successful and failed shards. | +| `snapshot` | string | Snapshot name. | +| `indices` | array | Indices in the snapshot. | +| `shards` | object | Total number of shards created along with number of successful and failed shards. | -If open indices in a snapshot already exist in a cluster, and you don't delete, close, or rename them, the API returns an error like the following: -{: .note} +If open indexes in a snapshot already exist in a cluster, and you don't delete, close, or rename them, the API returns an error similar to the following: -````json +```json { "error" : { "root_cause" : [ @@ -162,4 +218,4 @@ If open indices in a snapshot already exist in a cluster, and you don't delete, }, "status" : 500 } -```` +``` diff --git a/_tuning-your-cluster/availability-and-recovery/snapshots/snapshot-restore.md b/_tuning-your-cluster/availability-and-recovery/snapshots/snapshot-restore.md index 9408e3fbbf7..45379059403 100644 --- a/_tuning-your-cluster/availability-and-recovery/snapshots/snapshot-restore.md +++ b/_tuning-your-cluster/availability-and-recovery/snapshots/snapshot-restore.md @@ -560,6 +560,106 @@ POST /_snapshot/my-repository/2/_restore For more information, see [Restore Snapshot API]({{site.url}}{{site.baseurl}}/api-reference/snapshots/restore-snapshot/). +### Restoring snapshots across remote-backed clusters + +When using remote-backed storage, OpenSearch automatically backs up index segments and translogs to remote repositories (typically, Amazon S3). If you're restoring snapshots between clusters that **both use remote-backed storage** with different remote store repositories, you must specify the source cluster's remote store repositories. + +This procedure applies only to clusters with remote-backed storage enabled. For standard OpenSearch clusters without remote-backed storage, use the standard snapshot restore process without these additional parameters. +{: .note} + +The following procedure restores a snapshot from Cluster A to Cluster B, where both clusters use remote-backed storage with different Amazon S3 repositories. + +#### Prerequisites + +Before you start, ensure that you have fulfilled the following prerequisites: + +- Both clusters are running the same OpenSearch version. +- You have obtained the Amazon S3 bucket names, base paths, AWS Key Management Service (KMS) key Amazon Resource Names (ARNs), and domain ARNs from the source cluster's remote store configuration. + +#### Steps + +To restore a snapshot from the source cluster to the target cluster, complete the following steps: + +1. Register the source cluster's snapshot repository on the target cluster as read-only: + + ```json + PUT /_snapshot/source-cluster-snapshots + { + "type": "s3", + "settings": { + "bucket": "source-snapshot-bucket", + "base_path": "snapshots", + "region": "us-east-1", + "readonly": true + } + } + ``` + {% include copy-curl.html %} + +2. Register the source cluster's remote segment repository on the target cluster as read-only: + + ```json + PUT /_snapshot/source-remote-segment-repo + { + "type": "s3", + "settings": { + "bucket": "source-segment-bucket", + "base_path": "remote-store/segments", + "region": "us-east-1", + "amazon_es_kms_enc_ctx": "domainARN=arn:aws:es:us-east-1:123456789012:domain/source-cluster", + "amazon_es_kms_key_arn": "arn:aws:kms:us-east-1:123456789012:key/abcd1234-56ef-78gh-90ij-klmnopqrstuv", + "amazon_es_encryption": "true", + "remote_store_index_shallow_copy": "true", + "readonly": true + } + } + ``` + {% include copy-curl.html %} + +3. Register the source cluster's remote translog repository on the target cluster as read-only: + + ```json + PUT /_snapshot/source-remote-translog-repo + { + "type": "s3", + "settings": { + "bucket": "source-translog-bucket", + "base_path": "remote-store/translogs", + "region": "us-east-1", + "amazon_es_kms_enc_ctx": "domainARN=arn:aws:es:us-east-1:123456789012:domain/source-cluster", + "amazon_es_kms_key_arn": "arn:aws:kms:us-east-1:123456789012:key/abcd1234-56ef-78gh-90ij-klmnopqrstuv", + "amazon_es_encryption": "true", + "remote_store_index_shallow_copy": "true", + "readonly": true + } + } + ``` + {% include copy-curl.html %} + +4. Verify that you can list snapshots in the source snapshot repository: + + ```json + GET /_snapshot/source-cluster-snapshots/_all + ``` + {% include copy-curl.html %} + +5. Restore the snapshot, specifying both the source segment and translog repositories: + + ```json + POST /_snapshot/source-cluster-snapshots/snapshot-1/_restore + { + "indices": "my-index", + "source_remote_store_repository": "source-remote-segment-repo", + "source_remote_translog_repository": "source-remote-translog-repo" + } + ``` + {% include copy-curl.html %} + +The target cluster will restore the index from the snapshot repository and configure the restored indexes to read their remote segments and translogs from the source cluster's remote store repositories. + +If you encounter index name collisions during restore, use the `rename_pattern` and `rename_replacement` parameters to rename the indexes, or delete the conflicting indexes before restoring. +{: .tip} + ### Conflicts and compatibility One way to avoid index naming conflicts when restoring indexes is to use the `rename_pattern` and `rename_replacement` options. You can then, if necessary, use the `_reindex` API to combine the two. However, it may be simpler to delete the indexes that caused the conflict prior to restoring them from a snapshot. From caa0dda20a28d67da7ad310b303ddca49c69828b Mon Sep 17 00:00:00 2001 From: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Date: Mon, 9 Mar 2026 15:11:48 -0400 Subject: [PATCH 11/28] Change copy curl buttons to copy only in workspace docs (#12071) Signed-off-by: Fanit Kolchina Signed-off-by: Eric Pugh --- _dashboards/workspace/apis.md | 2 +- _dashboards/workspace/create-workspace.md | 2 +- _dashboards/workspace/index.md | 2 +- _dashboards/workspace/manage-workspace.md | 2 +- _dashboards/workspace/workspace-acl.md | 8 ++++---- _dashboards/workspace/workspace.md | 14 +++++++------- 6 files changed, 15 insertions(+), 15 deletions(-) diff --git a/_dashboards/workspace/apis.md b/_dashboards/workspace/apis.md index 026a21be19b..6daadab917a 100644 --- a/_dashboards/workspace/apis.md +++ b/_dashboards/workspace/apis.md @@ -6,7 +6,7 @@ nav_order: 10 --- # Workspaces APIs -Introduced 2.18 +**Introduced 2.18** {: .label .label-purple } The Workspaces API provides a set of endpoints for managing workspaces in OpenSearch Dashboards. diff --git a/_dashboards/workspace/create-workspace.md b/_dashboards/workspace/create-workspace.md index 34ba65bb540..a0891279112 100644 --- a/_dashboards/workspace/create-workspace.md +++ b/_dashboards/workspace/create-workspace.md @@ -6,7 +6,7 @@ nav_order: 1 --- # Create a workspace -Introduced 2.18 +**Introduced 2.18** {: .label .label-purple } Before getting started with this tutorial, you must enable the workspace feature flag. See [Enabling the ACL feature]({{site.url}}{{site.baseurl}}/dashboards/workspace/workspace/#enabling-the-workspace-feature) for more information. diff --git a/_dashboards/workspace/index.md b/_dashboards/workspace/index.md index 0d838f805fd..fdae42f24a1 100644 --- a/_dashboards/workspace/index.md +++ b/_dashboards/workspace/index.md @@ -8,7 +8,7 @@ redirect_from: --- # Getting started with workspaces -Introduced 2.18 +**Introduced 2.18** {: .label .label-purple } OpenSearch Dashboards 2.18 introduces an enhanced home page that provides a comprehensive view of all your workspaces. diff --git a/_dashboards/workspace/manage-workspace.md b/_dashboards/workspace/manage-workspace.md index 45733d75be3..fff73a6aea5 100644 --- a/_dashboards/workspace/manage-workspace.md +++ b/_dashboards/workspace/manage-workspace.md @@ -6,7 +6,7 @@ nav_order: 2 --- # Manage workspaces -Introduced 2.18 +**Introduced 2.18** {: .label .label-purple } You can access and modify the workspace details, including name, description, use case, and icon color, on the **Workspace details** page. diff --git a/_dashboards/workspace/workspace-acl.md b/_dashboards/workspace/workspace-acl.md index a3779cfbe0d..33a707b4e2c 100644 --- a/_dashboards/workspace/workspace-acl.md +++ b/_dashboards/workspace/workspace-acl.md @@ -6,7 +6,7 @@ nav_order: 3 --- # Workspace access control lists -Introduced 2.18 +**Introduced 2.18** {: .label .label-purple } Workspace access control lists (ACLs) manage authorization for saved objects `AuthZ(Authorization)` while enabling [Security in OpenSearch]({{site.url}}{{site.baseurl}}/security/) for `AuthN(Authentication)`. @@ -47,7 +47,7 @@ Set all users as admins with this wildcard setting: ```yaml opensearchDashboards.dashboardAdmin.users: ["*"] ``` -{% include copy-curl.html %} +{% include copy.html %} ### Configuring admin access for a single user @@ -56,7 +56,7 @@ Configure a user with the `admin-user-id` setting: ```yaml opensearchDashboards.dashboardAdmin.users: ["admin-user-id"] ``` -{% include copy-curl.html %} +{% include copy.html %} ### Configuring admin access by backend role @@ -65,7 +65,7 @@ Configure a user with the `admin-role` setting: ```yaml opensearchDashboards.dashboardAdmin.groups: ["admin-role"] ``` -{% include copy-curl.html %} +{% include copy.html %} ### Admin-restricted operations diff --git a/_dashboards/workspace/workspace.md b/_dashboards/workspace/workspace.md index 0938c48891f..b12e7f9bc4d 100644 --- a/_dashboards/workspace/workspace.md +++ b/_dashboards/workspace/workspace.md @@ -6,7 +6,7 @@ has_children: true --- # Workspace for OpenSearch Dashboards -Introduced 2.18 +**Introduced 2.18** {: .label .label-purple } The Workspace feature in OpenSearch Dashboards enables you to tailor your environment with use-case-specific configurations. For example, you can create dedicated workspaces for observability scenarios, allowing you to focus on relevant functionalities. Additionally, the Workspace feature enables organization of visual assets, such as dashboards and visualizations, within a workspace with isolated storage. @@ -25,7 +25,7 @@ interface Workspace { uiSettings: Record; } ``` -{% include copy-curl.html %} +{% include copy.html %} The Workspace data model is composed of the following key attributes: @@ -48,7 +48,7 @@ The following object shows a typical Workspace configuration: features: ["use-case-analytics"], } ``` -{% include copy-curl.html %} +{% include copy.html %} The configuration creates the `Analytics team` using the `use-case-observability` feature set. Use cases map to specific feature groups, limiting functionality to the defined set within each workspace. @@ -77,7 +77,7 @@ The following saved object shows a dashboard object associated with the workspac workspaces: ["M5NqCu"] } ``` -{% include copy-curl.html %} +{% include copy.html %} Saved objects support association with multiple workspaces, facilitating cross-team collaboration and resource sharing. This feature is useful when an object is relevant to multiple teams, projects, or use cases. @@ -90,7 +90,7 @@ The following example shows a data source object linked to multiple workspaces: workspaces: ["M5NqCu", "", ""] } ``` -{% include copy-curl.html %} +{% include copy.html %} ## Non-workspace saved objects @@ -108,11 +108,11 @@ uiSettings: overrides: "home:useNewHomePage": true ``` -{% include copy-curl.html %} +{% include copy.html %} If your cluster has the Security plugin installed, then multi-tenancy must be disabled to avoid conflicts with similar workspaces: ```yaml opensearch_security.multitenancy.enabled: false ``` -{% include copy-curl.html %} +{% include copy.html %} From a04148291e762a52beb254a36e83ba0634cfabd6 Mon Sep 17 00:00:00 2001 From: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Date: Tue, 10 Mar 2026 17:40:10 -0400 Subject: [PATCH 12/28] Add 2.19.5 to version history (#12078) * Add 2.19.5 to version history Signed-off-by: Fanit Kolchina * Updated release description Signed-off-by: Fanit Kolchina --------- Signed-off-by: Fanit Kolchina Signed-off-by: Eric Pugh --- _about/version-history.md | 1 + 1 file changed, 1 insertion(+) diff --git a/_about/version-history.md b/_about/version-history.md index 1d9b7c54e2f..7a345520771 100644 --- a/_about/version-history.md +++ b/_about/version-history.md @@ -17,6 +17,7 @@ OpenSearch version | Release highlights | Release date [3.2.0](https://github.com/opensearch-project/opensearch-build/blob/main/release-notes/opensearch-release-notes-3.2.0.md) | Updates Search Relevance Workbench. Makes gRPC APIs generally available. Introduces derived source, updates workload management, semantic field, and star tree functionality. Adds experimental Agentic Memory APIs and Job Scheduler APIs. For a full list of release highlights, see the [Release Notes](https://github.com/opensearch-project/opensearch-build/blob/main/release-notes/opensearch-release-notes-3.2.0.md). | 19 August 2025 [3.1.0](https://github.com/opensearch-project/opensearch-build/blob/main/release-notes/opensearch-release-notes-3.1.0.md) | Makes GPU acceleration for vector index builds generally available. Introduces memory-optimized search for Faiss indexes using Lucene HNSW, semantic field type for streamlined semantic search, and Search Relevance Workbench for search quality optimization. Makes star-tree indexes generally available with support for comprehensive query types. Enhances observability with ML Commons metrics integration, custom index support for OpenTelemetry data, and new PPL commands for JSON manipulation. Improves agent management with Update Agent API and persistent MCP tools. Includes security enhancements with immutable user objects and new resource sharing framework. For a full list of release highlights, see the [Release Notes](https://github.com/opensearch-project/opensearch-build/blob/main/release-notes/opensearch-release-notes-3.1.0.md). | 24 June 2025 [3.0.0](https://github.com/opensearch-project/opensearch-build/blob/main/release-notes/opensearch-release-notes-3.0.0.md) | Upgrades to Lucene 10 for improved indexing and vector search. Adds experimental gRPC support and pull-based ingestion from Kafka and Kinesis. Introduces GPU acceleration for vector operations and semantic sentence highlighting. Improves range query performance and hybrid search with z-score normalization. Adds plan-execute-reflect agents and native MCP protocol support for agentic workflows. Enhances security with a new Java agent replacing the Security Manager. Includes PPL query improvements with lookup, join, and subsearch commands. For a full list of release highlights, see the [Release Notes](https://github.com/opensearch-project/opensearch-build/blob/main/release-notes/opensearch-release-notes-3.0.0.md). | 06 May 2025 +[2.19.5](https://github.com/opensearch-project/opensearch-build/blob/main/release-notes/opensearch-release-notes-2.19.5.md) | Fixes security vulnerabilities with multiple CVE updates across Dashboards plugins including Assistant, Notifications, Observability, and Reporting. Resolves bugs in Alerting, Flow Framework, Security, and k-NN. Includes infrastructure and maintenance updates across multiple components. For a full list of release highlights, see the [Release Notes](https://github.com/opensearch-project/opensearch-build/blob/main/release-notes/opensearch-release-notes-2.19.5.md). | 10 March 2026 [2.19.4](https://github.com/opensearch-project/opensearch-build/blob/main/release-notes/opensearch-release-notes-2.19.4.md) | Fixes critical security vulnerabilities across multiple components including ML Commons, Query Insights Dashboards, and SQL. Resolves bugs in Flow Framework multi-tenancy, Security wildcard matching, and Query Insights time validation. Includes extensive CVE fixes and dependency updates across dashboards plugins. For a full list of release highlights, see the [Release Notes](https://github.com/opensearch-project/opensearch-build/blob/main/release-notes/opensearch-release-notes-2.19.4.md). | 06 November 2025 [2.19.3](https://github.com/opensearch-project/opensearch-build/blob/main/release-notes/opensearch-release-notes-2.19.3.md) | Improves Flow Framework with enhanced memory handling and workflow step processing. Fixes several Query Insights and Query Insights Dashboards issues. Implements security updates across multiple components. Updates infrastructure components and documentation across multiple plugins. For a full list of release highlights, see the [Release Notes](https://github.com/opensearch-project/opensearch-build/blob/main/release-notes/opensearch-release-notes-2.19.3.md). | 22 July 2025 [2.19.2](https://github.com/opensearch-project/opensearch-build/blob/main/release-notes/opensearch-release-notes-2.19.2.md) | Improves query insights with better index handling, a new verbose API parameter, and a default index template. Fixes bugs across Query Insights, Observability, Flow Framework, and Dashboards. Includes multiple CVE fixes, test enhancements, and a new PGP key for artifact verification. For a full list of release highlights, see the [Release Notes](https://github.com/opensearch-project/opensearch-build/blob/main/release-notes/opensearch-release-notes-2.19.2.md). | 29 April 2025 From 7953ca3c7a4a47956979f5f9b8cd0aeadcaeb104 Mon Sep 17 00:00:00 2001 From: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Date: Wed, 11 Mar 2026 13:45:43 -0400 Subject: [PATCH 13/28] Add additional details to search backpressure stats (#11982) Signed-off-by: Fanit Kolchina Signed-off-by: Eric Pugh --- .../search-backpressure.md | 15 +++++++++------ 1 file changed, 9 insertions(+), 6 deletions(-) diff --git a/_tuning-your-cluster/availability-and-recovery/search-backpressure.md b/_tuning-your-cluster/availability-and-recovery/search-backpressure.md index 29982247a70..14a1a0a240a 100644 --- a/_tuning-your-cluster/availability-and-recovery/search-backpressure.md +++ b/_tuning-your-cluster/availability-and-recovery/search-backpressure.md @@ -223,12 +223,12 @@ The response contains the following fields. Field Name | Data type | Description :--- | :--- | :--- search_backpressure | Object | Statistics about search backpressure. -search_backpressure.search_task | Object | Statistics specific to the search task. -search_backpressure.search_task.[resource_tracker_stats](#resource_tracker_stats) | Object | Statistics about the current search tasks. -search_backpressure.search_task.[cancellation_stats](#cancellation_stats) | Object | Statistics about the search tasks canceled since the node last restarted. -search_backpressure.search_shard_task | Object | Statistics specific to the search shard task. -search_backpressure.search_shard_task.[resource_tracker_stats](#resource_tracker_stats) | Object | Statistics about the current search shard tasks. -search_backpressure.search_shard_task.[cancellation_stats](#cancellation_stats) | Object | Statistics about the search shard tasks canceled since the node last restarted. +search_backpressure.search_task | Object | Statistics for search tasks. Contains resource tracker statistics for individual cancellation criteria (heap, CPU, elapsed time) and a summary of all cancellation activity across these trackers. +search_backpressure.search_task.[resource_tracker_stats](#resource_tracker_stats) | Object | Per-tracker statistics showing cancellation counts for each resource type (heap usage, CPU usage, elapsed time) and current resource consumption metrics. +search_backpressure.search_task.[cancellation_stats](#cancellation_stats) | Object | Aggregated cancellation statistics across all resource trackers. The sum of `cancellation_count` and `cancellation_limit_reached_count` equals the total of all resource tracker cancellation counts. +search_backpressure.search_shard_task | Object | Statistics for search shard tasks. Contains resource tracker statistics for individual cancellation criteria (heap, CPU, elapsed time) and a summary of all cancellation activity across these trackers. +search_backpressure.search_shard_task.[resource_tracker_stats](#resource_tracker_stats) | Object | Per-tracker statistics showing cancellation counts for each resource type (heap usage, CPU usage, elapsed time) and current resource consumption metrics. +search_backpressure.search_shard_task.[cancellation_stats](#cancellation_stats) | Object | Aggregated cancellation statistics across all resource trackers. The sum of `cancellation_count` and `cancellation_limit_reached_count` equals the total of all resource tracker cancellation counts. search_backpressure.mode | String | The [mode](#search-backpressure-modes) for search backpressure. ### `resource_tracker_stats` @@ -274,3 +274,6 @@ Field Name | Data type | Description :--- | :--- | :--- cancellation_count | Integer | The total number of tasks marked for cancellation since the node last restarted. cancellation_limit_reached_count | Integer | The number of times when the number of tasks eligible for cancellation exceeded the set cancellation threshold. + +Each resource tracker (heap, CPU, elapsed time) independently identifies tasks that breach its thresholds and increments its own `cancellation_count`. Since a single task may breach multiple resource thresholds, the sum of resource tracker `cancellation_count` values may exceed the top-level `cancellation_count`, which represents the actual number of unique tasks that were cancelled. The `cancellation_limit_reached_count` increments when the cancellation rate limit is reached during an observer iteration, preventing additional cancellations in that iteration. +{: .note} From 0b45ddf842eedf76ace69b85b44cb2293cbd23df Mon Sep 17 00:00:00 2001 From: Yuan Date: Thu, 12 Mar 2026 21:24:16 +0800 Subject: [PATCH 14/28] Create index with explicit mapping for sort by geo distance example (#12089) * Fix create sparse vector index error Signed-off-by: xiaoyuan0821 * Create index with explicit mapping for sort by geo distance example Signed-off-by: xiaoyuan0821 * Add copy buttons and intro sentence to mapping Signed-off-by: Fanit Kolchina --------- Signed-off-by: xiaoyuan0821 Signed-off-by: Fanit Kolchina Co-authored-by: x00815292 Co-authored-by: Fanit Kolchina Signed-off-by: Eric Pugh --- _search-plugins/searching-data/sort.md | 62 +++++++++++++++++++++++--- 1 file changed, 55 insertions(+), 7 deletions(-) diff --git a/_search-plugins/searching-data/sort.md b/_search-plugins/searching-data/sort.md index 8242f976946..bf03bfc78d7 100644 --- a/_search-plugins/searching-data/sort.md +++ b/_search-plugins/searching-data/sort.md @@ -35,6 +35,7 @@ GET shakespeare/_search ] } ``` +{% include copy-curl.html %} The results are sorted by `line_id` in descending order: @@ -258,6 +259,7 @@ GET shakespeare/_search ] } ``` +{% include copy-curl.html %} You can continue to sort by any number of field values to get the results in just the right order. It doesn’t have to be a numerical value—you can also sort by date or timestamp fields: @@ -270,6 +272,7 @@ You can continue to sort by any number of field values to get the results in jus } ] ``` +{% include copy-curl.html %} A text field that is analyzed cannot be used to sort documents, because the inverted index only contains the individual tokenized terms and not the entire string. So you cannot sort by the `play_name`, for example. @@ -294,6 +297,7 @@ GET shakespeare/_search ] } ``` +{% include copy-curl.html %} The results are sorted by the `play_name` field in alphabetical order. @@ -331,6 +335,7 @@ GET shakespeare/_search ] } ``` +{% include copy-curl.html %} ## Sort mode @@ -348,17 +353,21 @@ PUT students/_doc/1 "name": "John Doe", "grades": [70, 90] } +``` +{% include copy-curl.html %} +```json PUT students/_doc/2 { "name": "Mary Major", "grades": [80, 100] } ``` +{% include copy-curl.html %} Sort all students by highest grade average using the `avg` mode: -``` +```json GET students/_search { "query" : { @@ -369,6 +378,7 @@ GET students/_search ] } ``` +{% include copy-curl.html %} The response contains students sorted by `grades` in descending order: @@ -442,6 +452,7 @@ PUT students } } ``` +{% include copy-curl.html %} Index two documents with nested fields: @@ -453,7 +464,10 @@ PUT students/_doc/1 "grades": [70, 90] } } +``` +{% include copy-curl.html %} +```json PUT students/_doc/2 { "name": "Mary Major", @@ -462,6 +476,7 @@ PUT students/_doc/2 } } ``` +{% include copy-curl.html %} When sorting by grade average, provide the path to the nested field: @@ -483,6 +498,7 @@ GET students/_search ] } ``` +{% include copy-curl.html %} ## Handling missing values @@ -496,12 +512,16 @@ PUT students/_doc/1 "name": "John Doe", "average": 80 } +``` +{% include copy-curl.html %} +```json PUT students/_doc/2 { "name": "Mary Major" } ``` +{% include copy-curl.html %} Sort the documents, ordering the document with a missing field first: @@ -521,6 +541,7 @@ GET students/_search ] } ``` +{% include copy-curl.html %} The response lists document 2 first: @@ -582,6 +603,7 @@ PUT students/_doc/1 "average": 80 } ``` +{% include copy-curl.html %} Index a document that does not contain an `average` field in the second index: @@ -591,6 +613,7 @@ PUT students_no_map/_doc/2 "name": "Mary Major" } ``` +{% include copy-curl.html %} Search for all documents in both indexes and sort them by the `average` field: @@ -609,6 +632,7 @@ GET students*/_search ] } ``` +{% include copy-curl.html %} By default, the second index produces an error because the `average` field is not mapped: @@ -677,6 +701,7 @@ GET students*/_search ] } ``` +{% include copy-curl.html %} The response contains both documents: @@ -745,34 +770,55 @@ GET students/_search "track_scores": true } ``` +{% include copy-curl.html %} -## Sorting by geo distance +## Sorting by geodistance You can sort documents by `_geo_distance`. The following parameters are supported. Parameter | Description :--- | :--- -distance_type | Specifies the method of computing the distance. Valid values are `arc` and `plane`. The `plane` method is faster but less accurate for long distances or close to the poles. Default is `arc`. -mode | Specifies how to handle a field with several geopoints. By default, documents are sorted by the shortest distance when the sort order is ascending and by the longest distance when the sort order is descending. Valid values are `min`, `max`, `median`, and `avg`. -unit | Specifies the units used to compute sort values. Default is meters (`m`). -ignore_unmapped | Specifies how to treat an unmapped field. Set `ignore_unmapped` to `true` to ignore unmapped fields. Default is `false` (produce an error when encountering an unmapped field). +`distance_type` | Specifies the method of computing the distance. Valid values are `arc` and `plane`. The `plane` method is faster but less accurate for long distances or close to the poles. Default is `arc`. +`mode` | Specifies how to handle a field with several geopoints. By default, documents are sorted by the shortest distance when the sort order is ascending and by the longest distance when the sort order is descending. Valid values are `min`, `max`, `median`, and `avg`. +`unit` | Specifies the units used to compute sort values. Default is meters (`m`). +`ignore_unmapped` | Specifies how to treat an unmapped field. Set `ignore_unmapped` to `true` to ignore unmapped fields. Default is `false` (produce an error when encountering an unmapped field). The `_geo_distance` parameter does not support `missing_values`. The distance is always considered to be `infinity` when a document does not contain the field used for computing distance. {: .note} -For example, index two documents with geopoints: +For example, create an index and map the `point` field as a `geo_point`: + +```json +PUT testindex1 +{ + "mappings": { + "properties": { + "point": { + "type": "geo_point" + } + } + } +} +``` +{% include copy-curl.html %} + +Index two documents containing geopoints: ```json PUT testindex1/_doc/1 { "point": [74.00, 40.71] } +``` +{% include copy-curl.html %} +```json PUT testindex1/_doc/2 { "point": [73.77, -69.63] } ``` +{% include copy-curl.html %} Search for all documents and sort them by the distance from the provided point: @@ -796,6 +842,7 @@ GET testindex1/_search } } ``` +{% include copy-curl.html %} The response contains the sorted documents: @@ -874,6 +921,7 @@ GET testindex1/_search } } ``` +{% include copy-curl.html %} For each document, the sorting distance is calculated as the minimum, maximum, or average (as specified by the `mode`) of the distances from all points provided in the search to all points in the document. From 55b7d46b9ce269130f4b8cc1d27e684db40ab400 Mon Sep 17 00:00:00 2001 From: Joshua Palis Date: Thu, 12 Mar 2026 12:10:30 -0700 Subject: [PATCH 15/28] Adding rerank documentation for agentic search (#12081) * Adding rerank documentation for agentic search Signed-off-by: Joshua Palis * Doc review Signed-off-by: Fanit Kolchina * Reformat requests and clarify steps Signed-off-by: Fanit Kolchina * Apply suggestion from @kolchfa-aws Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> --------- Signed-off-by: Joshua Palis Signed-off-by: Fanit Kolchina Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Co-authored-by: Fanit Kolchina Co-authored-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Signed-off-by: Eric Pugh --- .../reranking-search-results.md | 4 + .../ai-search/agentic-search/index.md | 2 + .../rerank-agentic-search-results.md | 404 ++++++++++++++++++ 3 files changed, 410 insertions(+) create mode 100644 _vector-search/ai-search/agentic-search/rerank-agentic-search-results.md diff --git a/_search-plugins/search-relevance/reranking-search-results.md b/_search-plugins/search-relevance/reranking-search-results.md index 7d886f77a4b..67b2f15ef95 100644 --- a/_search-plugins/search-relevance/reranking-search-results.md +++ b/_search-plugins/search-relevance/reranking-search-results.md @@ -19,6 +19,10 @@ You can rerank results in the following ways: - [By a field using a cross-encoder]({{site.url}}{{site.baseurl}}/search-plugins/search-relevance/rerank-by-field-cross-encoder/) - [By a field using a late interaction model]({{site.url}}{{site.baseurl}}/search-plugins/search-relevance/rerank-by-field-late-interaction/) +## Reranking in agentic search + +If you're using agentic search, see [Reranking agentic search results]({{site.url}}{{site.baseurl}}/vector-search/ai-search/agentic-search/rerank-agentic-search-results/) for information about reranking search results within agentic search pipelines. + ## Using rerank and normalization processors together When you use a rerank processor in conjunction with a [normalization processor]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/normalization-processor/) and a hybrid query, the rerank processor alters the final document scores. This is because the rerank processor operates after the normalization processor in the search pipeline. diff --git a/_vector-search/ai-search/agentic-search/index.md b/_vector-search/ai-search/agentic-search/index.md index 124bb7d16b1..ef6d4cc2eaf 100644 --- a/_vector-search/ai-search/agentic-search/index.md +++ b/_vector-search/ai-search/agentic-search/index.md @@ -253,6 +253,8 @@ After setting up basic agentic search, you can enhance your implementation with - [Build agentic search flows]({{site.url}}{{site.baseurl}}/vector-search/ai-search/building-agentic-search-flows/) -- Configure agents and execute agentic search using AI search flows in OpenSearch Dashboards. +- [Rerank agentic search results]({{site.url}}{{site.baseurl}}/vector-search/ai-search/agentic-search/rerank-agentic-search-results/) -- Add a rerank search response processor to your agentic search pipeline to futher rerank search results. + ## Next steps - [Inspecting agentic search and continuing conversations]({{site.url}}{{site.baseurl}}/vector-search/ai-search/agentic-search/agent-converse/) -- Inspect agent behavior, view generated DSL, and continue conversations using a memory ID. \ No newline at end of file diff --git a/_vector-search/ai-search/agentic-search/rerank-agentic-search-results.md b/_vector-search/ai-search/agentic-search/rerank-agentic-search-results.md new file mode 100644 index 00000000000..dc40227fa98 --- /dev/null +++ b/_vector-search/ai-search/agentic-search/rerank-agentic-search-results.md @@ -0,0 +1,404 @@ +--- +layout: default +title: Reranking agentic search results +parent: Agentic search +grand_parent: AI search +nav_order: 105 +has_children: false +--- + +# Reranking agentic search results + +Agentic search requests are processed by the [`agentic_query_translator` search request processor]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/agentic-query-translator-processor/), which intercepts the given query text and passes it to the configured agent in order to generate and execute an OpenSearch DSL query. To further adjust relevance scores, search results can also be reranked using the [`rerank` search response processor]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/rerank-processor/). + +## Prerequisite + +Before using agentic search, you must configure an agent with the [`QueryPlanningTool`]({{site.url}}{{site.baseurl}}/ml-commons-plugin/agents-tools/tools/query-planning-tool/). + +## Step 1: Create an index for ingestion + +Create an index for ingestion: + +```json +PUT /iris-index +{ + "mappings": { + "properties": { + "petal_length_in_cm": { + "type": "float" + }, + "petal_width_in_cm": { + "type": "float" + }, + "sepal_length_in_cm": { + "type": "float" + }, + "sepal_width_in_cm": { + "type": "float" + }, + "species": { + "type": "text", + "fields": { + "keyword": { + "type": "keyword", + "ignore_above": 256 + } + } + } + } + } +} +``` +{% include copy-curl.html %} + +## Step 2: Ingest documents into the index + +To ingest documents into the index created in the previous step, send the following request: + +```json +POST _bulk +{ "index": { "_index": "iris-index", "_id": "1" } } +{ "petal_length_in_cm": 1.4, "petal_width_in_cm": 0.2, "sepal_length_in_cm": 5.1, "sepal_width_in_cm": 3.5, "species": "setosa" } +{ "index": { "_index": "iris-index", "_id": "2" } } +{ "petal_length_in_cm": 1.4, "petal_width_in_cm": 0.2, "sepal_length_in_cm": 4.9, "sepal_width_in_cm": 3.0, "species": "setosa" } +{ "index": { "_index": "iris-index", "_id": "3" } } +{ "petal_length_in_cm": 1.3, "petal_width_in_cm": 0.2, "sepal_length_in_cm": 4.7, "sepal_width_in_cm": 3.2, "species": "setosa" } +{ "index": { "_index": "iris-index", "_id": "4" } } +{ "petal_length_in_cm": 1.5, "petal_width_in_cm": 0.2, "sepal_length_in_cm": 4.6, "sepal_width_in_cm": 3.1, "species": "setosa" } +{ "index": { "_index": "iris-index", "_id": "5" } } +{ "petal_length_in_cm": 1.4, "petal_width_in_cm": 0.2, "sepal_length_in_cm": 5.0, "sepal_width_in_cm": 3.6, "species": "setosa" } +{ "index": { "_index": "iris-index", "_id": "6" } } +{ "petal_length_in_cm": 6.6, "petal_width_in_cm": 2.1, "sepal_length_in_cm": 7.6, "sepal_width_in_cm": 3.0, "species": "virginica" } +{ "index": { "_index": "iris-index", "_id": "7" } } +{ "petal_length_in_cm": 4.5, "petal_width_in_cm": 1.7, "sepal_length_in_cm": 4.9, "sepal_width_in_cm": 2.5, "species": "virginica" } +{ "index": { "_index": "iris-index", "_id": "8" } } +{ "petal_length_in_cm": 6.3, "petal_width_in_cm": 1.8, "sepal_length_in_cm": 7.3, "sepal_width_in_cm": 2.9, "species": "virginica" } +{ "index": { "_index": "iris-index", "_id": "9" } } +{ "petal_length_in_cm": 5.8, "petal_width_in_cm": 1.8, "sepal_length_in_cm": 6.7, "sepal_width_in_cm": 2.5, "species": "virginica" } +{ "index": { "_index": "iris-index", "_id": "10" } } +{ "petal_length_in_cm": 6.1, "petal_width_in_cm": 2.5, "sepal_length_in_cm": 7.2, "sepal_width_in_cm": 3.6, "species": "virginica" } +``` +{% include copy-curl.html %} + +## Step 3: Register a model and an agent + +Follow these steps to register a model and an agent: + +1. [Create a model for the agent and QueryPlanningTool]({{site.url}}{{site.baseurl}}/vector-search/ai-search/agentic-search/#step-3-create-a-model-for-the-agent-and-queryplanningtool). +2. [Create an agent]({{site.url}}{{site.baseurl}}/vector-search/ai-search/agentic-search/#step-4-create-an-agent). + +## Step 4: Create a search pipeline + +Create a search pipeline that uses your agent and a `rerank` response processor. This example uses a `by_field` rerank processor that reranks documents based on the `petal_length_in_cm` field. + +### Step 4(a): Configure a by_field rerank processor + +Create an agentic search pipeline containing a `by_field` rerank processor: + +```json +PUT _search/pipeline/agentic-pipeline +{ + "request_processors": [ + { + "agentic_query_translator": { + "agent_id": "your-agent-id-from-step-3" + } + } + ], + "response_processors": [ + { + "rerank": { + "by_field": { + "target_field": "petal_length_in_cm", + "keep_previous_score": true + } + } + }, + { + "agentic_context": { + "dsl_query": true + } + } + ] +} +``` +{% include copy-curl.html %} + +Alternatively, you can use an `ml_opensearch` rerank processor to apply OpenSearch-provided [cross-encoder models]({{site.url}}{{site.baseurl}}/ml-commons-plugin/pretrained-models/#cross-encoder-models) to rerank results. + +### Step 4(b): Configure an ml_opensearch rerank processor + +Register an `ms-marco-MiniLM-L-6-v2` cross-encoder model: + +```json +POST _plugins/_ml/models/_register?deploy=true +{ + "name": "huggingface/cross-encoders/ms-marco-MiniLM-L-6-v2", + "version": "1.0.2", + "model_format": "TORCH_SCRIPT" +} +``` +{% include copy-curl.html %} + +Then configure an `ml_opensearch` rerank processor by providing the model ID returned in the response. You can configure the rerank processor for any text field in your index. In this example, you'll use the `species` field: + +```json +POST _search/pipeline/agentic-pipeline +{ + "request_processors": [ + { + "agentic_query_translator": { + "agent_id": "your-agent-id-from-step-3" + } + } + ], + "response_processors": [ + { + "rerank": { + "ml_opensearch": { + "model_id": "your-cross-encoder-model-id", + "keep_previous_score": true + }, + "context": { + "document_fields": [ + "species" + ] + } + } + }, + { + "agentic_context": { + "dsl_query": true + } + } + ] +} +``` +{% include copy-curl.html %} + +## Step 5: Test a question + +Test your reranking agentic search pipeline by asking a question. + +### Step 5(a): Test the by_field rerank processor + +To test the `by_field` rerank processor, send the following request: + +```json +POST /iris-index/_search?search_pipeline=agentic-pipeline +{ + "query": { + "agentic": { + "query_text": "Show me virginica flowers" + } + } +} +``` +{% include copy-curl.html %} + +The generated DSL query shows that the agent opted to use a basic `term` query on the `species` field of the `iris-index`. The query returns all documents matching the given term. In the response, each document includes two scores: `previous_score` (the original relevance score) and `_score` (the updated score after reranking). The documents are ranked by petal length in descending order: + +```json +{ + "took": 3402, + "timed_out": false, + "_shards": { + "total": 5, + "successful": 5, + "skipped": 0, + "failed": 0 + }, + "hits": { + "total": { + "value": 5, + "relation": "eq" + }, + "max_score": 6.6, + "hits": [ + { + "_index": "iris-index", + "_id": "6", + "_score": 6.6, + "_source": { + "sepal_width_in_cm": 3.0, + "species": "virginica", + "previous_score": 1.0, + "sepal_length_in_cm": 7.6, + "petal_width_in_cm": 2.1, + "petal_length_in_cm": 6.6 + } + }, + { + "_index": "iris-index", + "_id": "8", + "_score": 6.3, + "_source": { + "sepal_width_in_cm": 2.9, + "species": "virginica", + "previous_score": 1.0, + "sepal_length_in_cm": 7.3, + "petal_width_in_cm": 1.8, + "petal_length_in_cm": 6.3 + } + }, + { + "_index": "iris-index", + "_id": "10", + "_score": 6.1, + "_source": { + "sepal_width_in_cm": 3.6, + "species": "virginica", + "previous_score": 1.0, + "sepal_length_in_cm": 7.2, + "petal_width_in_cm": 2.5, + "petal_length_in_cm": 6.1 + } + }, + { + "_index": "iris-index", + "_id": "9", + "_score": 5.8, + "_source": { + "sepal_width_in_cm": 2.5, + "species": "virginica", + "previous_score": 1.0, + "sepal_length_in_cm": 6.7, + "petal_width_in_cm": 1.8, + "petal_length_in_cm": 5.8 + } + }, + { + "_index": "iris-index", + "_id": "7", + "_score": 4.5, + "_source": { + "sepal_width_in_cm": 2.5, + "species": "virginica", + "previous_score": 1.0, + "sepal_length_in_cm": 4.9, + "petal_width_in_cm": 1.7, + "petal_length_in_cm": 4.5 + } + } + ] + }, + "ext": { + "dsl_query": "{\"query\":{\"term\":{\"species.keyword\":\"virginica\"}}}" + } +} +``` + +### Step 5(b): Test the ml_opensearch rerank processor + +To test the `ml_opensearch` rerank processor, send the following request: + +```json +POST /iris-index/_search?search_pipeline=agentic-pipeline +{ + "query": { + "agentic": { + "query_text": "Show me virginica flowers" + } + }, + "ext": { + "rerank": { + "query_context": { + "query_text": "Show me virginica flowers" + } + } + } +} +``` +{% include copy-curl.html %} + +The model receives both the query text and the document text and generates a new relevance score based on both: + +```json +{ + "took": 2667, + "timed_out": false, + "_shards": { + "total": 5, + "successful": 5, + "skipped": 0, + "failed": 0 + }, + "hits": { + "total": { + "value": 5, + "relation": "eq" + }, + "max_score": 0.65176475, + "hits": [ + { + "_index": "iris-index", + "_id": "9", + "_score": 0.65176475, + "_source": { + "petal_length_in_cm": 5.8, + "petal_width_in_cm": 1.8, + "sepal_length_in_cm": 6.7, + "sepal_width_in_cm": 2.5, + "species": "virginica" + } + }, + { + "_index": "iris-index", + "_id": "7", + "_score": 0.65176475, + "_source": { + "petal_length_in_cm": 4.5, + "petal_width_in_cm": 1.7, + "sepal_length_in_cm": 4.9, + "sepal_width_in_cm": 2.5, + "species": "virginica" + } + }, + { + "_index": "iris-index", + "_id": "8", + "_score": 0.65176475, + "_source": { + "petal_length_in_cm": 6.3, + "petal_width_in_cm": 1.8, + "sepal_length_in_cm": 7.3, + "sepal_width_in_cm": 2.9, + "species": "virginica" + } + }, + { + "_index": "iris-index", + "_id": "10", + "_score": 0.65176475, + "_source": { + "petal_length_in_cm": 6.1, + "petal_width_in_cm": 2.5, + "sepal_length_in_cm": 7.2, + "sepal_width_in_cm": 3.6, + "species": "virginica" + } + }, + { + "_index": "iris-index", + "_id": "6", + "_score": 0.65176475, + "_source": { + "petal_length_in_cm": 6.6, + "petal_width_in_cm": 2.1, + "sepal_length_in_cm": 7.6, + "sepal_width_in_cm": 3.0, + "species": "virginica" + } + } + ] + }, + "ext": { + "dsl_query": "{\"query\":{\"term\":{\"species.keyword\":\"virginica\"}}}" + } +} +``` + +## Related documentation + +- [Reranking search results]({{site.url}}{{site.baseurl}}/search-plugins/search-relevance/reranking-search-results/) +- [Rerank processor]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/rerank-processor/) From 73d01910535be3c35dfdfac2b548159e6624b75c Mon Sep 17 00:00:00 2001 From: Eric Pugh Date: Fri, 3 Apr 2026 21:07:48 -0400 Subject: [PATCH 16/28] fix up links Signed-off-by: Eric Pugh --- _tutorials/llm-as-a-judge-tutorial.md | 14 +++++++------- 1 file changed, 7 insertions(+), 7 deletions(-) diff --git a/_tutorials/llm-as-a-judge-tutorial.md b/_tutorials/llm-as-a-judge-tutorial.md index 7bde4209aff..39c1d73390c 100644 --- a/_tutorials/llm-as-a-judge-tutorial.md +++ b/_tutorials/llm-as-a-judge-tutorial.md @@ -26,20 +26,20 @@ LLM-as-a-Judge is a technique that leverages large language models to automatica In this tutorial, you'll learn how to: - **Set up external LLM integration**: Connect OpenSearch to external LLM providers like OpenAI, AWS Bedrock, or others. -- **Generate automated judgments**: Use LLMs to evaluate search result relevance without manual annotation. -- **Compare search configurations**: Run experiments to determine which search approach performs better using LLM-generated judgments. +- **Generate automated judgments**: Use an LLM to evaluate search result relevance without manual annotation. +- **Evaluate a search configuration**: Run an experiment to evaluate search quality using LLM-generated judgments. ## OpenSearch components for LLM-as-a-Judge In this tutorial, you'll use the following OpenSearch components: - [ML Commons plugin]({{site.url}}{{site.baseurl}}/ml-commons-plugin/index/) for LLM integration -- [Search Relevance Workbench]({{site.url}}{{site.baseurl}}/search-plugins/search-relevance/index/) for evaluation workflows +- [Search Relevance Workbench]({{site.url}}{{site.baseurl}}/search-plugins/search-relevance/using-search-relevance-workbench/) for evaluation workflows - [Remote model connectors]({{site.url}}{{site.baseurl}}/ml-commons-plugin/remote-models/index/) for external LLM APIs -- [Search configurations]({{site.url}}{{site.baseurl}}/search-plugins/search-relevance/search-configuration/) for defining search strategies -- [Query sets]({{site.url}}{{site.baseurl}}/search-plugins/search-relevance/query-set/) for organizing test queries -- [Judgments]({{site.url}}{{site.baseurl}}/search-plugins/search-relevance/judgment/) for storing relevance assessments -- [Experiments]({{site.url}}{{site.baseurl}}/search-plugins/search-relevance/experiment/) for evaluating search quality +- [Search configuration]({{site.url}}{{site.baseurl}}/search-plugins/search-relevance/search-configurations/) for defining search strategies +- [Query set]({{site.url}}{{site.baseurl}}/search-plugins/search-relevance/query-sets/) for organizing test queries +- [Judgments]({{site.url}}{{site.baseurl}}/search-plugins/search-relevance/judgments/) for storing relevance assessments +- [Experiments]({{site.url}}{{site.baseurl}}/search-plugins/search-relevance/experiments/) for evaluating search quality ## Prerequisites From 223dfefe6e046c9ace885e52e2398420922c4198 Mon Sep 17 00:00:00 2001 From: Eric Pugh Date: Fri, 3 Apr 2026 21:11:11 -0400 Subject: [PATCH 17/28] prune back extra verbiage and ideas that arent going to land Signed-off-by: Eric Pugh --- _tutorials/llm-as-a-judge-tutorial.md | 67 +-------------------------- 1 file changed, 1 insertion(+), 66 deletions(-) diff --git a/_tutorials/llm-as-a-judge-tutorial.md b/_tutorials/llm-as-a-judge-tutorial.md index 39c1d73390c..3dd10cf5aaf 100644 --- a/_tutorials/llm-as-a-judge-tutorial.md +++ b/_tutorials/llm-as-a-judge-tutorial.md @@ -671,72 +671,7 @@ POST /_plugins/_ml/connectors/_create ``` {% include copy-curl.html %} -## Clean up - -After you're done, delete the components you've created in this tutorial: - -```json -DELETE /products -``` -{% include copy-curl.html %} - -```json -DELETE /_plugins/_search_relevance/experiments/pointwise_experiment_id -``` -{% include copy-curl.html %} - -```json -DELETE /_plugins/_search_relevance/judgments/llm_judgment_id -``` -{% include copy-curl.html %} - -```json -DELETE /_plugins/_search_relevance/query_sets/electronics_queries_id -``` -{% include copy-curl.html %} - -```json -DELETE /_plugins/_search_relevance/search_configurations/baseline_config_id -``` -{% include copy-curl.html %} - -```json -DELETE /_plugins/_search_relevance/search_configurations/title_boosted_config_id -``` -{% include copy-curl.html %} - -```json -POST /_plugins/_ml/models/MODEL_ID_HERE/_undeploy -``` -{% include copy-curl.html %} - -```json -DELETE /_plugins/_ml/models/MODEL_ID_HERE -``` -{% include copy-curl.html %} - -```json -DELETE /_plugins/_ml/connectors/abc123def456 -``` -{% include copy-curl.html %} - -## Benefits of LLM-as-a-Judge - -- **Scalability**: Generate judgments for thousands of query-document pairs without manual annotation -- **Consistency**: LLMs provide consistent evaluation criteria across all judgments -- **Cost-effective**: Reduce the need for expensive human annotation while maintaining quality -- **Rapid iteration**: Quickly evaluate new search configurations and features -- **Semantic understanding**: LLMs can assess semantic relevance beyond keyword matching - ## Further reading -- Learn more about [Search Relevance Workbench]({{site.url}}{{site.baseurl}}/search-plugins/search-relevance/index/) +- Learn more about [Search Relevance Workbench]({{site.url}}{{site.baseurl}}/search-plugins/search-relevance/using-search-relevance-workbench/) - Explore [ML Commons remote models]({{site.url}}{{site.baseurl}}/ml-commons-plugin/remote-models/index/) -- Read about [search evaluation metrics]({{site.url}}{{site.baseurl}}/search-plugins/search-relevance/evaluation-metrics/) - -## Next steps - -- Experiment with different LLM models and prompt templates -- Create more sophisticated query sets for comprehensive evaluation -- Integrate LLM judgments into your search development workflow -- Compare LLM judgments with human annotations to validate quality From 09576a792640d174e0e2726cadbcac49a65556f5 Mon Sep 17 00:00:00 2001 From: Eric Pugh Date: Thu, 21 May 2026 21:12:54 -0400 Subject: [PATCH 18/28] Backout local only change Signed-off-by: Eric Pugh --- _config.yml | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/_config.yml b/_config.yml index 6516dd4f285..67d04738d90 100644 --- a/_config.yml +++ b/_config.yml @@ -13,9 +13,7 @@ lucene_version: '10_4_0' # Build settings markdown: kramdown -#remote_theme: benbalter/retlab -#remote_theme: pmarsceill/just-the-docs@v0.3.3 -theme: just-the-docs +remote_theme: pmarsceill/just-the-docs@v0.3.3 # Kramdown settings kramdown: From a15a8925578fc9db707c63d778a86178e34b4e99 Mon Sep 17 00:00:00 2001 From: Eric Pugh Date: Thu, 21 May 2026 22:15:15 -0400 Subject: [PATCH 19/28] Refinig text and links Signed-off-by: Eric Pugh --- _tutorials/llm-as-a-judge-tutorial.md | 137 ++++---------------------- 1 file changed, 17 insertions(+), 120 deletions(-) diff --git a/_tutorials/llm-as-a-judge-tutorial.md b/_tutorials/llm-as-a-judge-tutorial.md index 3dd10cf5aaf..da895745cc9 100644 --- a/_tutorials/llm-as-a-judge-tutorial.md +++ b/_tutorials/llm-as-a-judge-tutorial.md @@ -6,17 +6,17 @@ parent: Search Relevance Workbench nav_order: 4 steps: - heading: "Set up ML Commons and create an external LLM connector" - link: "/tutorials/search-relevance/llm-as-judge-tutorial/#step-1-set-up-ml-commons-and-create-an-external-llm-connector" + link: "/tutorials/llm-as-a-judge-tutorial/#step-1-set-up-ml-commons-and-create-an-external-llm-connector" - heading: "Create a simple search index with sample data" - link: "/tutorials/search-relevance/llm-as-judge-tutorial/#step-2-create-a-simple-search-index-with-sample-data" + link: "/tutorials/llm-as-a-judge-tutorial/#step-2-create-a-simple-search-index-with-sample-data" - heading: "Create search configurations" - link: "/tutorials/search-relevance/llm-as-judge-tutorial/#step-3-create-search-configurations" + link: "/tutorials/llm-as-a-judge-tutorial/#step-3-create-search-configuration-baseline" - heading: "Create a query set" - link: "/tutorials/search-relevance/llm-as-judge-tutorial/#step-4-create-a-query-set" + link: "/tutorials/llm-as-a-judge-tutorial/#step-4-create-a-query-set" - heading: "Generate LLM judgments" - link: "/tutorials/search-relevance/llm-as-judge-tutorial/#step-5-generate-llm-judgments" + link: "/tutorials/llm-as-a-judge-tutorial/#step-5-generate-llm-judgments" - heading: "Run experiments with LLM judgments" - link: "/tutorials/search-relevance/llm-as-judge-tutorial/#step-6-run-experiments-with-llm-judgments" + link: "/tutorials/llm-as-a-judge-tutorial/#step-6-run-experiments-with-llm-judgments" --- # Getting started with LLM-as-a-Judge for search relevance evaluation @@ -27,7 +27,8 @@ In this tutorial, you'll learn how to: - **Set up external LLM integration**: Connect OpenSearch to external LLM providers like OpenAI, AWS Bedrock, or others. - **Generate automated judgments**: Use an LLM to evaluate search result relevance without manual annotation. -- **Evaluate a search configuration**: Run an experiment to evaluate search quality using LLM-generated judgments. + +You will then be ready to [run an experiment to evaluate search quality]({{site.url}}{{site.baseurl}}/search-plugins/search-relevance/evaluate-search-quality/#creating-a-pointwise-experiment) using the LLM-generated judgments. ## OpenSearch components for LLM-as-a-Judge @@ -74,8 +75,6 @@ You can follow this tutorial by using your command line or the OpenSearch Dashbo Some steps in the tutorial contain optional Test it{: .text-delta} sections. You can confirm that the step completed successfully by running the requests in these sections. -After you're done, follow the steps in the [Clean up](#clean-up) section to delete all created components. - ### Step 1: Set up ML Commons and create an external LLM connector First, you'll create a connector to an external LLM service. This tutorial uses OpenAI's GPT models, but you can adapt it for other providers like AWS Bedrock. @@ -370,14 +369,14 @@ PUT /_plugins/_search_relevance/judgments "description": "Uses GPT-3.5-turbo to evaluate product search results", "type": "LLM_JUDGMENT", "modelId": "MODEL_ID_HERE", - "querySetId": "electronics_queries_id", - "searchConfigurationList": ["baseline_config_id"], + "querySetId": "QUERY_SET_ID_HERE", + "searchConfigurationList": ["SEARCH_CONFIGURATION_ID_HERE"], "size": 10, "tokenLimit": 4000, "contextFields": ["title", "description", "category"], "ignoreFailure": false, "llmJudgmentRatingType": "SCORE0_1", - "promptTemplate": "Rate the relevance of these search results {{hits}} for the query '{{queryText}}' on a scale of 0-1, where 0 is completely irrelevant and 1 is perfectly relevant. Consider the product title, description, and category.", + "promptTemplate": "Rate the relevance of these search results {% raw %}{{hits}}{% endraw %} for the query '{% raw %}{{queryText}}{% endraw %}' on a scale of 0-1, where 0 is completely irrelevant and 1 is perfectly relevant. Consider the product title, description, and category.", "overwriteCache": false } ``` @@ -429,7 +428,7 @@ You can see the judgments and how they were arrived at: "0fa1fedb-4bcb-469d-9fcb-2a5cd6709e1d" ], "tokenLimit": 4000, - "promptTemplate": "Rate the relevance of these search results {{hits}} for the query '{{queryText}}' on a scale of 0-1, where 0 is completely irrelevant and 1 is perfectly relevant. Consider the product title, description, and category.", + "promptTemplate": "Rate the relevance of these search results {% raw %}{{hits}}{% endraw %} for the query '{% raw %}{{queryText}}{% endraw %}' on a scale of 0-1, where 0 is completely irrelevant and 1 is perfectly relevant. Consider the product title, description, and category.", "querySetId": "4c6bf6f4-c2e4-4c76-a668-82de11d14846" }, "judgmentRatings": [ @@ -494,110 +493,8 @@ You should see documents with ratings between 0 and 1 generated by the LLM. ### Step 6: Run experiments with LLM judgments -Finally, you'll create an experiment to evaluate your `baseline` search configuration using the LLM-generated judgments. - -**SHOULD THIS JUST BE SCREENSHOTS OF SRW????** - -#### Step 6(a): Create a pointwise experiment - -```json -PUT /_plugins/_search_relevance/experiments -{ - "querySetId": "electronics_queries_id", - "searchConfigurationList": ["baseline_config_id"], - "judgmentList": ["llm_judgment_id"], - "size": 10, - "type": "POINTWISE_EVALUATION" -} -``` -{% include copy-curl.html %} - -The response contains the experiment ID: +You are now ready to [run an experiment to evaluate search quality]({{site.url}}{{site.baseurl}}/search-plugins/search-relevance/evaluate-search-quality/#creating-a-pointwise-experiment) using the LLM-generated judgments that you have just created. You can reuse the existing search configuration and query set to run the evaluation. -```json -{ - "experiment_id": "pointwise_experiment_id" -} -``` - -#### Step 6(b): View experiment results - -```json -GET /_plugins/_search_relevance/experiments/pointwise_experiment_id -``` -{% include copy-curl.html %} - -The response shows evaluation metrics comparing your search configurations: - -
- - Results - - {: .text-delta} - -```json -{ - "experiment_id": "pointwise_experiment_id", - "query_set_id": "electronics_queries_id", - "search_configuration_list": ["baseline_config_id"], - "judgment_list": ["llm_judgment_id"], - "type": "POINTWISE_EVALUATION", - "results": { - "baseline_config_id": { - "precision_at_1": 0.67, - "precision_at_3": 0.56, - "precision_at_5": 0.48, - "precision_at_10": 0.42, - "recall_at_1": 0.67, - "recall_at_3": 0.78, - "recall_at_5": 0.85, - "recall_at_10": 0.92, - "ndcg_at_1": 0.67, - "ndcg_at_3": 0.71, - "ndcg_at_5": 0.73, - "ndcg_at_10": 0.75 - }, - "title_boosted_config_id": { - "precision_at_1": 0.78, - "precision_at_3": 0.63, - "precision_at_5": 0.52, - "precision_at_10": 0.45, - "recall_at_1": 0.78, - "recall_at_3": 0.84, - "recall_at_5": 0.89, - "recall_at_10": 0.94, - "ndcg_at_1": 0.78, - "ndcg_at_3": 0.79, - "ndcg_at_5": 0.81, - "ndcg_at_10": 0.83 - } - } -} -``` - -In this example, the title-boosted configuration shows better performance across most metrics, indicating that boosting the title field improves search relevance for these queries. -
- -
- - Test it - - {: .text-delta} - -You can also view detailed evaluation results: - -```json -GET /search-relevance-evaluation-result/_search -{ - "query": { - "match": { - "experiment_id": "pointwise_experiment_id" - } - } -} -``` -{% include copy-curl.html %} -
## Advanced features @@ -611,9 +508,9 @@ PUT /_plugins/_search_relevance/judgments "name": "Custom Prompt Judgment", "type": "LLM_JUDGMENT", "modelId": "MODEL_ID_HERE", - "querySetId": "electronics_queries_id", - "searchConfigurationList": ["baseline_config_id"], - "promptTemplate": "As an e-commerce search expert, evaluate how well these products {{hits}} match the user's search for '{{queryText}}'. Consider product relevance, brand reputation, and price competitiveness. Rate each result from 0-1.", + "querySetId": "QUERY_SET_ID_HERE", + "searchConfigurationList": ["SEARCH_CONFIGURATION_ID_HERE"], + "promptTemplate": "As an e-commerce search expert, evaluate how well these products {% raw %}{{hits}}{% endraw %} match the user's search for '{% raw %}{{queryText}}{% endraw %}'. Consider product relevance, brand reputation, and price competitiveness. Rate each result from 0-1.", "llmJudgmentRatingType": "SCORE0_1" } ``` @@ -632,7 +529,7 @@ PUT /_plugins/_search_relevance/judgments "querySetId": "electronics_queries_id", "searchConfigurationList": ["baseline_config_id"], "llmJudgmentRatingType": "RELEVANT_IRRELEVANT", - "promptTemplate": "Determine if these search results {{hits}} are relevant or irrelevant for the query '{{queryText}}'. Consider exact matches and semantic relevance." + "promptTemplate": "Determine if these search results {% raw %}{{hits}}{% endraw %} are relevant or irrelevant for the query '{% raw %}{{queryText}}{% endraw %}'. Consider exact matches and semantic relevance." } ``` {% include copy-curl.html %} From 08940beb112757cbab208836857e1140ae3cc418 Mon Sep 17 00:00:00 2001 From: Eric Pugh Date: Thu, 21 May 2026 22:17:15 -0400 Subject: [PATCH 20/28] Refining text Signed-off-by: Eric Pugh --- _tutorials/llm-as-a-judge-tutorial.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_tutorials/llm-as-a-judge-tutorial.md b/_tutorials/llm-as-a-judge-tutorial.md index da895745cc9..3c3d47dfc7e 100644 --- a/_tutorials/llm-as-a-judge-tutorial.md +++ b/_tutorials/llm-as-a-judge-tutorial.md @@ -493,7 +493,7 @@ You should see documents with ratings between 0 and 1 generated by the LLM. ### Step 6: Run experiments with LLM judgments -You are now ready to [run an experiment to evaluate search quality]({{site.url}}{{site.baseurl}}/search-plugins/search-relevance/evaluate-search-quality/#creating-a-pointwise-experiment) using the LLM-generated judgments that you have just created. You can reuse the existing search configuration and query set to run the evaluation. +Congratulations, you are now ready to [run an experiment to evaluate search quality]({{site.url}}{{site.baseurl}}/search-plugins/search-relevance/evaluate-search-quality/#creating-a-pointwise-experiment) using the LLM-generated judgments that you have just created. You can reuse the existing search configuration and query set from this tutorial when you run the evaluation. ## Advanced features From 10dcf88531b6712bca26a9efb7681e34dda024f7 Mon Sep 17 00:00:00 2001 From: Eric Pugh Date: Fri, 22 May 2026 07:55:43 -0400 Subject: [PATCH 21/28] Try to be clearer in Step 6 Signed-off-by: Eric Pugh --- _tutorials/llm-as-a-judge-tutorial.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_tutorials/llm-as-a-judge-tutorial.md b/_tutorials/llm-as-a-judge-tutorial.md index 3c3d47dfc7e..6e06b339931 100644 --- a/_tutorials/llm-as-a-judge-tutorial.md +++ b/_tutorials/llm-as-a-judge-tutorial.md @@ -493,7 +493,7 @@ You should see documents with ratings between 0 and 1 generated by the LLM. ### Step 6: Run experiments with LLM judgments -Congratulations, you are now ready to [run an experiment to evaluate search quality]({{site.url}}{{site.baseurl}}/search-plugins/search-relevance/evaluate-search-quality/#creating-a-pointwise-experiment) using the LLM-generated judgments that you have just created. You can reuse the existing search configuration and query set from this tutorial when you run the evaluation. +Congratulations, you are now ready to [run an experiment to evaluate search quality]({{site.url}}{{site.baseurl}}/search-plugins/search-relevance/evaluate-search-quality/#creating-a-pointwise-experiment) using the LLM-generated judgments that you have just created. You can reuse the search configuration and query set's that you have already created in this tutorial when you run your first evaluation. ## Advanced features From ab83d431c7476ee20511860da43d6febcc7a49ca Mon Sep 17 00:00:00 2001 From: Eric Pugh Date: Wed, 27 May 2026 09:06:18 -0400 Subject: [PATCH 22/28] Responding to vale feedback Signed-off-by: Eric Pugh --- _tutorials/llm-as-a-judge-tutorial.md | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/_tutorials/llm-as-a-judge-tutorial.md b/_tutorials/llm-as-a-judge-tutorial.md index 6e06b339931..5a07e25ad51 100644 --- a/_tutorials/llm-as-a-judge-tutorial.md +++ b/_tutorials/llm-as-a-judge-tutorial.md @@ -493,11 +493,13 @@ You should see documents with ratings between 0 and 1 generated by the LLM. ### Step 6: Run experiments with LLM judgments -Congratulations, you are now ready to [run an experiment to evaluate search quality]({{site.url}}{{site.baseurl}}/search-plugins/search-relevance/evaluate-search-quality/#creating-a-pointwise-experiment) using the LLM-generated judgments that you have just created. You can reuse the search configuration and query set's that you have already created in this tutorial when you run your first evaluation. +Congratulations, you are now ready to [run an experiment to evaluate search quality]({{site.url}}{{site.baseurl}}/search-plugins/search-relevance/evaluate-search-quality/#creating-a-pointwise-experiment) using the LLM-generated judgments that you have have created. The search configuration and query set that you created during this tutorial can be used as part of running your first evaluation using the new judgement set. ## Advanced features +Once you are comfortable with the basics, you will want to start adopting some of the advanced features. + ### Custom prompt templates You can customize the prompt template to focus on specific aspects of relevance: From 934d115471dc927453ac9195b3bd237d0011476d Mon Sep 17 00:00:00 2001 From: Eric Pugh Date: Wed, 27 May 2026 09:21:52 -0400 Subject: [PATCH 23/28] Ensure standard naming patterns followed Signed-off-by: Eric Pugh --- _tutorials/llm-as-a-judge-tutorial.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/_tutorials/llm-as-a-judge-tutorial.md b/_tutorials/llm-as-a-judge-tutorial.md index 5a07e25ad51..691f6987378 100644 --- a/_tutorials/llm-as-a-judge-tutorial.md +++ b/_tutorials/llm-as-a-judge-tutorial.md @@ -386,14 +386,14 @@ The response contains the judgment ID: ```json { - "judgment_id": "LLM_JUDGEMENT_ID" + "judgment_id": "LLM_JUDGMENT_ID" } ``` The LLM judgment process runs asynchronously. Wait a few moments for the judgments to be generated, then check the status: ```json -GET /search-relevance-judgment/_doc/LLM_JUDGEMENT_ID +GET /search-relevance-judgment/_doc/LLM_JUDGMENT_ID ``` {% include copy-curl.html %} @@ -493,7 +493,7 @@ You should see documents with ratings between 0 and 1 generated by the LLM. ### Step 6: Run experiments with LLM judgments -Congratulations, you are now ready to [run an experiment to evaluate search quality]({{site.url}}{{site.baseurl}}/search-plugins/search-relevance/evaluate-search-quality/#creating-a-pointwise-experiment) using the LLM-generated judgments that you have have created. The search configuration and query set that you created during this tutorial can be used as part of running your first evaluation using the new judgement set. +Congratulations, you are now ready to [run an experiment to evaluate search quality]({{site.url}}{{site.baseurl}}/search-plugins/search-relevance/evaluate-search-quality/#creating-a-pointwise-experiment) using the LLM-generated judgments that you have created. The search configuration and query set that you created during this tutorial can be used as part of running your first evaluation using the new judgment list. ## Advanced features From ea7363cbb52e49179ab7e942c66dc49ff336f2ce Mon Sep 17 00:00:00 2001 From: Eric Pugh Date: Wed, 27 May 2026 09:25:40 -0400 Subject: [PATCH 24/28] use ALL CAPS format for variables Signed-off-by: Eric Pugh --- _tutorials/llm-as-a-judge-tutorial.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/_tutorials/llm-as-a-judge-tutorial.md b/_tutorials/llm-as-a-judge-tutorial.md index 691f6987378..470836ad65c 100644 --- a/_tutorials/llm-as-a-judge-tutorial.md +++ b/_tutorials/llm-as-a-judge-tutorial.md @@ -528,8 +528,8 @@ PUT /_plugins/_search_relevance/judgments "name": "Binary LLM Judgment", "type": "LLM_JUDGMENT", "modelId": "MODEL_ID_HERE", - "querySetId": "electronics_queries_id", - "searchConfigurationList": ["baseline_config_id"], + "querySetId": "QUERY_SET_ID_HERE", + "searchConfigurationList": ["SEARCH_CONFIGURATION_ID_HERE"], "llmJudgmentRatingType": "RELEVANT_IRRELEVANT", "promptTemplate": "Determine if these search results {% raw %}{{hits}}{% endraw %} are relevant or irrelevant for the query '{% raw %}{{queryText}}{% endraw %}'. Consider exact matches and semantic relevance." } From 3dd01b440ec2960a54aa6deff6369b490fff4454 Mon Sep 17 00:00:00 2001 From: Eric Pugh Date: Wed, 27 May 2026 10:55:20 -0400 Subject: [PATCH 25/28] Rework the order of the descriptions of the types of judgments to be ordered with the top level descirpiton, and rewrite a lot of passive language to make ti punchier Signed-off-by: Eric Pugh --- _search-plugins/search-relevance/judgments.md | 279 +++++++++++------- 1 file changed, 173 insertions(+), 106 deletions(-) diff --git a/_search-plugins/search-relevance/judgments.md b/_search-plugins/search-relevance/judgments.md index c878e036e4d..60bec96e940 100644 --- a/_search-plugins/search-relevance/judgments.md +++ b/_search-plugins/search-relevance/judgments.md @@ -10,107 +10,26 @@ has_children: false # Judgments A judgment is a relevance rating assigned to a specific document in the context of a particular query. Multiple judgments are grouped together into judgment lists. -Typically, judgments are categorized into two types---implicit and explicit: +Typically, judgments fall into two types---implicit and explicit: -* Implicit judgments are ratings that were derived from user behavior (for example, what did the user see and select after searching?) -* Explicit judgments were traditionally made by humans, but large language models (LLMs) are increasingly being used to perform this task. +* Implicit judgments are ratings derived from user behavior (for example, what did the user see and select after searching?) +* Humans have traditionally produced explicit judgments, but large language models (LLMs) are increasingly taking on this task. Search Relevance Workbench supports all types of judgments: +* Using LLMs, typically called LLM-as-a-Judge, to generate judgments by evaluating search results using a prompt. * Generating implicit judgments based on data that adheres to the User Behavior Insights (UBI) schema specification. -* Using LLMs to generate judgments by connecting OpenSearch to an API or an internally or externally hosted model. -* Importing externally created judgments. +* Importing judgments that were collected using a process outside of SRW. -## Explicit judgments +## Using LLM-as-a-Judge -Search Relevance Workbench offers two ways to integrate explicit judgments: -* Importing judgments that were collected using a process outside of OpenSearch. -* Generating judgments using LLM-as-a-Judge. +Generate explicit judgments with an LLM in Search Relevance Workbench when you don't have human annotators available, or you need to scale up the number of judgments beyond what humans can provide. -### Importing judgments +The [LLM-as-a-Judge tutorial]({{site.url}}{{site.baseurl}}/tutorials/llm-as-a-judge-tutorial/) walks you through the process step-by-step. -You may already have external processes for generating judgments. Regardless of the judgment type or the way it was generated, you can import it into Search Relevance Workbench. +### Prerequisites -#### Example request - -```json -PUT _plugins/_search_relevance/judgments -{ - "name": "Imported Judgments", - "description": "Judgments generated outside SRW", - "type": "IMPORT_JUDGMENT", - "judgmentRatings": [ - { - "query": "red dress", - "ratings": [ - { - "docId": "B077ZJXCTS", - "rating": "3.000" - }, - { - "docId": "B071S6LTJJ", - "rating": "2.000" - }, - { - "docId": "B01IDSPDJI", - "rating": "2.000" - }, - { - "docId": "B07QRCGL3G", - "rating": "0.000" - }, - { - "docId": "B074V6Q1DR", - "rating": "1.000" - } - ] - }, - { - "query": "blue jeans", - "ratings": [ - { - "docId": "B07L9V4Y98", - "rating": "0.000" - }, - { - "docId": "B01N0DSRJC", - "rating": "1.000" - }, - { - "docId": "B001CRAWCQ", - "rating": "1.000" - }, - { - "docId": "B075DGJZRM", - "rating": "2.000" - }, - { - "docId": "B009ZD297U", - "rating": "2.000" - } - ] - } - ] -} -``` -{% include copy-curl.html %} - -#### Request body fields - -The process of importing judgments supports the following parameters. - -Parameter | Data type | Description -`name` | String | The name of the judgment list. -`description` | String | An optional description of the judgment list. -`type` | String | Set to `IMPORT_JUDGMENT`. -`judgmentRatings` | Array | A list of JSON objects containing the judgments. Judgments are grouped by query, each containing a nested map in which document IDs (`docId`) serve as keys and their floating-point ratings serve as values. - -### Using LLM-as-a-Judge - -If you want to use judgments in your experimentation process but do not have a team of humans or the user behavior data to calculate judgments based on interactions, you can use an LLM in Search Relevance Workbench to generate judgments. See the [LLM-as-a-Judge tutorial]({{site.url}}{{site.baseurl}}/tutorials/llm-as-a-judge-tutorial/) for a step by step guide. -#### Prerequisites - -To use LLM-as-a-Judge, ensure that you have configured the following components: +To use LLM-as-a-Judge, configure the following components: * A connector to an LLM to use for generating the judgments. For more information, see [Creating connectors for third-party ML platforms]({{site.url}}{{site.baseurl}}/ml-commons-plugin/remote-models/connectors/). * A query set: Together with the `size` parameter, the query set defines the scope for generating judgments. For each query, the top k documents are retrieved from the specified index, where k is defined in the `size` parameter. @@ -118,13 +37,12 @@ To use LLM-as-a-Judge, ensure that you have configured the following components: The AI-assisted judgment process works as follows: - For each query, the top k documents are retrieved using the defined search configuration, which includes the index information. The query and each document from the result list create a query/document pair. -- Each query and document pair forms a query/document pair. -- The LLM is then called with a predefined prompt (stored as a static variable in the backend) to generate a judgment for each query/document pair. +- The LLM is then called with a predefined prompt to generate a judgment for each query/document pair. - All generated judgments are stored in the judgments cache index for reuse in future experiments. To create a judgment list, provide the model ID of the LLM, an available query set, and a created search configuration. -The following example uses a fairly generic prompt template with a scale of 0.0 to 1.0 for judgments. To winnow down the volume of data to be evaluated by the LLM, and therefore reduce the cost, you can specify which fields from the results to be sent using the `contextFields` parameter. +The following example uses a generic prompt template with a scale of 0.0 to 1.0. To reduce the volume of data sent to the LLM (and therefore the cost), use the `contextFields` parameter to specify which fields from each result to include. ```json PUT _plugins/_search_relevance/judgments @@ -143,9 +61,9 @@ PUT _plugins/_search_relevance/judgments ``` {% include copy-curl.html %} -#### Request body fields +### Request body fields -The process of creating LLM based judgments supports the following parameters. +Use the following parameters to create LLM-based judgments: Parameter | Data type | Description :--- | :--- | :--- @@ -163,10 +81,81 @@ Parameter | Data type | Description `promptTemplate` | String | Optional. Custom prompt template for the LLM. Supports placeholders: `{{queryText}}`, `{{hits}}`. If not provided, a default template is used. `overwriteCache` | Boolean | Whether to overwrite existing cached judgments for the same query-document pairs. Default is false (reuse cached judgments). +### Custom prompt templates + +You can customize the prompt template to focus on specific aspects of relevance: + +```json +PUT /_plugins/_search_relevance/judgments +{ + "name": "Custom Prompt Judgment", + "type": "LLM_JUDGMENT", + "modelId": "MODEL_ID_HERE", + "querySetId": "QUERY_SET_ID_HERE", + "searchConfigurationList": ["SEARCH_CONFIGURATION_ID_HERE"], + "promptTemplate": "As an e-commerce search expert, evaluate how well these products {% raw %}{{hits}}{% endraw %} match the user's search for '{% raw %}{{queryText}}{% endraw %}'. Consider product relevance, brand reputation, and price competitiveness. Rate each result from 0-1.", + "llmJudgmentRatingType": "SCORE0_1" +} +``` +{% include copy-curl.html %} + +### Binary relevance judgments + +For simpler relevance assessment, you can use binary (relevant/irrelevant) judgments: + +```json +PUT /_plugins/_search_relevance/judgments +{ + "name": "Binary LLM Judgment", + "type": "LLM_JUDGMENT", + "modelId": "MODEL_ID_HERE", + "querySetId": "QUERY_SET_ID_HERE", + "searchConfigurationList": ["SEARCH_CONFIGURATION_ID_HERE"], + "llmJudgmentRatingType": "RELEVANT_IRRELEVANT", + "promptTemplate": "Determine if these search results {% raw %}{{hits}}{% endraw %} are relevant or irrelevant for the query '{% raw %}{{queryText}}{% endraw %}'. Consider exact matches and semantic relevance." +} +``` +{% include copy-curl.html %} + +### Using different LLM providers + +You can adapt the connector configuration for other providers: + +#### AWS Bedrock example + +```json +POST /_plugins/_ml/connectors/_create +{ + "name": "AWS Bedrock Connector", + "description": "Connector to AWS Bedrock", + "version": "1", + "protocol": "aws_sigv4", + "parameters": { + "region": "us-east-1", + "service_name": "bedrock", + "model": "anthropic.claude-v2" + }, + "credential": { + "access_key": "YOUR_ACCESS_KEY", + "secret_key": "YOUR_SECRET_KEY" + }, + "actions": [ + { + "action_type": "predict", + "method": "POST", + "url": "https://bedrock-runtime.${parameters.region}.amazonaws.com/model/${parameters.model}/invoke", + "request_body": "{ \"prompt\": \"${parameters.messages}\", \"max_tokens_to_sample\": 300 }" + } + ] +} +``` +{% include copy-curl.html %} + ## Implicit judgments -Implicit judgments are derived from user interactions. Several models use signals from user behavior to calculate these judgments. One such model is Clicks Over Expected Clicks (COEC), a click model implemented in Search Relevance Workbench. -The data used to derive relevance labels is based on past user behavior. The data follows the [User Behavior Insights schema specification]({{site.url}}{{site.baseurl}}/search-plugins/ubi/schemas/). The two key interaction types for implicit judgments are *impressions* and *clicks* that occur after a user query. In practice, this means that all events in the `ubi_events` index with an `impression` or `click` recorded in the `action_name` field are used to model implicit judgments. +Implicit judgments are derived from past user interactions. Search Relevance Workbench supports the Clicks Over Expected Clicks (COEC) click model, which uses *impression* and *click* signals to calculate judgments. + +Input data must follow the [User Behavior Insights schema specification]({{site.url}}{{site.baseurl}}/search-plugins/ubi/schemas/). COEC uses every event in the `ubi_events` index whose `action_name` is `impression` or `click`. COEC calculates an expected click-through rate (CTR) for each rank. It does this by dividing the total number of clicks by the total number of impressions observed at that rank, based on all events in `ubi_events`. This ratio represents the expected CTR for that position. For each document displayed in a hit list after a query, the average CTR at that rank serves as the expected value for the query/document pair. COEC calculates the actual CTR for the query/document pair and divides it by this expected rank-based CTR. This means that query/document pairs with a higher CTR than the average for that rank will have a judgment value greater than 1. Conversely, if the CTR is lower than average, the judgment value will be lower than 1. @@ -175,12 +164,12 @@ Note that depending on the tracking implementation, multiple clicks for a single For query-document observations that occur at different positions, all impressions and clicks are assumed to have occurred at the lowest (best) position. This approach biases the final judgment toward lower values, reflecting the common trend that higher-ranked results typically receive higher CTRs. {: .note} -#### Example request +### Example request ```json PUT _plugins/_search_relevance/judgments { - "name": "Implicit Judgements", + "name": "Implicit Judgments", "clickModel": "coec", "type": "UBI_JUDGMENT", "maxRank": 20 @@ -188,9 +177,9 @@ PUT _plugins/_search_relevance/judgments ``` {% include copy-curl.html %} -#### Request body fields +### Request body fields -The process of creating implicit judgments supports the following parameters. +Use the following parameters to create implicit judgments: Parameter | Data type | Description `name` | String | The name of the judgment list. @@ -200,13 +189,91 @@ Parameter | Data type | Description `startDate` | Date | The optional starting date from which behavioral data events are considered for implicit judgment generation. The format is`yyyy-MM-dd`. `endDate` | Date | The optional end date until which behavioral data events are considered for implicit judgment generation. The format is`yyyy-MM-dd`. +## Importing judgments + +You may already have external processes for generating judgments. Regardless of the judgment type or the way they were generated, you can import them into Search Relevance Workbench. + +#### Example request + +```json +PUT _plugins/_search_relevance/judgments +{ + "name": "Imported Judgments", + "description": "Judgments generated outside SRW", + "type": "IMPORT_JUDGMENT", + "judgmentRatings": [ + { + "query": "red dress", + "ratings": [ + { + "docId": "B077ZJXCTS", + "rating": "3.000" + }, + { + "docId": "B071S6LTJJ", + "rating": "2.000" + }, + { + "docId": "B01IDSPDJI", + "rating": "2.000" + }, + { + "docId": "B07QRCGL3G", + "rating": "0.000" + }, + { + "docId": "B074V6Q1DR", + "rating": "1.000" + } + ] + }, + { + "query": "blue jeans", + "ratings": [ + { + "docId": "B07L9V4Y98", + "rating": "0.000" + }, + { + "docId": "B01N0DSRJC", + "rating": "1.000" + }, + { + "docId": "B001CRAWCQ", + "rating": "1.000" + }, + { + "docId": "B075DGJZRM", + "rating": "2.000" + }, + { + "docId": "B009ZD297U", + "rating": "2.000" + } + ] + } + ] +} +``` +{% include copy-curl.html %} + +#### Request body fields + +Use the following parameters to import judgments: + +Parameter | Data type | Description +`name` | String | The name of the judgment list. +`description` | String | An optional description of the judgment list. +`type` | String | Set to `IMPORT_JUDGMENT`. +`judgmentRatings` | Array | A list of JSON objects containing the judgments. Judgments are grouped by query, each containing a nested map in which document IDs (`docId`) serve as keys and their floating-point ratings serve as values. + ## Managing judgment lists You can retrieve or delete judgment lists using the following APIs. ### View a judgment list -You can retrieve a judgment list using the judgment list ID. +Retrieve a judgment list by its ID. #### Endpoint @@ -327,7 +394,7 @@ GET _plugins/_search_relevance/judgments/b54f791a-3b02-49cb-a06c-46ab650b2ade ### Delete a judgment list -You can delete a judgment list using the judgment list ID. +Delete a judgment list by its ID. #### Endpoint @@ -363,7 +430,7 @@ DELETE _plugins/_search_relevance/judgments/b54f791a-3b02-49cb-a06c-46ab650b2ade ### Search for a judgment list -You can search for available judgment lists using query DSL. By default, the `judgmentRatings.ratings` data is not returned. To include the `judgmentRatings.ratings` data, specify the `_source` field in the query. +Search for judgment lists using query DSL. The response excludes `judgmentRatings.ratings` by default; to include it, specify the `_source` field in the query. #### Endpoints @@ -372,7 +439,7 @@ GET _plugins/_search_relevance/judgments/_search POST _plugins/_search_relevance/judgments/_search ``` -#### Example request: +#### Example request Search for judgment lists that include the exact query `red dress`: From 6ab78ed1d61cb570f4600d66217eb42662783fc6 Mon Sep 17 00:00:00 2001 From: Eric Pugh Date: Wed, 27 May 2026 10:55:51 -0400 Subject: [PATCH 26/28] Move the advanced feature text to the ref page, they arent part of the tutorial Signed-off-by: Eric Pugh --- _tutorials/llm-as-a-judge-tutorial.md | 76 +-------------------------- 1 file changed, 1 insertion(+), 75 deletions(-) diff --git a/_tutorials/llm-as-a-judge-tutorial.md b/_tutorials/llm-as-a-judge-tutorial.md index 470836ad65c..0644547a9b5 100644 --- a/_tutorials/llm-as-a-judge-tutorial.md +++ b/_tutorials/llm-as-a-judge-tutorial.md @@ -495,82 +495,8 @@ You should see documents with ratings between 0 and 1 generated by the LLM. Congratulations, you are now ready to [run an experiment to evaluate search quality]({{site.url}}{{site.baseurl}}/search-plugins/search-relevance/evaluate-search-quality/#creating-a-pointwise-experiment) using the LLM-generated judgments that you have created. The search configuration and query set that you created during this tutorial can be used as part of running your first evaluation using the new judgment list. - -## Advanced features - -Once you are comfortable with the basics, you will want to start adopting some of the advanced features. - -### Custom prompt templates - -You can customize the prompt template to focus on specific aspects of relevance: - -```json -PUT /_plugins/_search_relevance/judgments -{ - "name": "Custom Prompt Judgment", - "type": "LLM_JUDGMENT", - "modelId": "MODEL_ID_HERE", - "querySetId": "QUERY_SET_ID_HERE", - "searchConfigurationList": ["SEARCH_CONFIGURATION_ID_HERE"], - "promptTemplate": "As an e-commerce search expert, evaluate how well these products {% raw %}{{hits}}{% endraw %} match the user's search for '{% raw %}{{queryText}}{% endraw %}'. Consider product relevance, brand reputation, and price competitiveness. Rate each result from 0-1.", - "llmJudgmentRatingType": "SCORE0_1" -} -``` -{% include copy-curl.html %} - -### Binary relevance judgments - -For simpler relevance assessment, you can use binary (relevant/irrelevant) judgments: - -```json -PUT /_plugins/_search_relevance/judgments -{ - "name": "Binary LLM Judgment", - "type": "LLM_JUDGMENT", - "modelId": "MODEL_ID_HERE", - "querySetId": "QUERY_SET_ID_HERE", - "searchConfigurationList": ["SEARCH_CONFIGURATION_ID_HERE"], - "llmJudgmentRatingType": "RELEVANT_IRRELEVANT", - "promptTemplate": "Determine if these search results {% raw %}{{hits}}{% endraw %} are relevant or irrelevant for the query '{% raw %}{{queryText}}{% endraw %}'. Consider exact matches and semantic relevance." -} -``` -{% include copy-curl.html %} - -### Using different LLM providers - -You can adapt the connector configuration for other providers: - -#### AWS Bedrock example: - -```json -POST /_plugins/_ml/connectors/_create -{ - "name": "AWS Bedrock Connector", - "description": "Connector to AWS Bedrock", - "version": "1", - "protocol": "aws_sigv4", - "parameters": { - "region": "us-east-1", - "service_name": "bedrock", - "model": "anthropic.claude-v2" - }, - "credential": { - "access_key": "YOUR_ACCESS_KEY", - "secret_key": "YOUR_SECRET_KEY" - }, - "actions": [ - { - "action_type": "predict", - "method": "POST", - "url": "https://bedrock-runtime.${parameters.region}.amazonaws.com/model/${parameters.model}/invoke", - "request_body": "{ \"prompt\": \"${parameters.messages}\", \"max_tokens_to_sample\": 300 }" - } - ] -} -``` -{% include copy-curl.html %} - ## Further reading - Learn more about [Search Relevance Workbench]({{site.url}}{{site.baseurl}}/search-plugins/search-relevance/using-search-relevance-workbench/) +- Learn about customizing your prompt and other [advanced features]({{site.url}}{{site.baseurl}}/search-plugins/search-relevance/judgments/#using-llm-as-a-judge) - Explore [ML Commons remote models]({{site.url}}{{site.baseurl}}/ml-commons-plugin/remote-models/index/) From c4e03a806dfb2a5bebfa99a2bb6f980435c5ccf5 Mon Sep 17 00:00:00 2001 From: Fanit Kolchina Date: Thu, 28 May 2026 13:54:03 -0400 Subject: [PATCH 27/28] Doc review Signed-off-by: Fanit Kolchina --- .../OpenSearch/SubstitutionsBritish.yml | 1 + _search-plugins/search-relevance/judgments.md | 150 ++++--- _tutorials/index.md | 19 +- _tutorials/llm-as-a-judge-tutorial.md | 382 ++---------------- 4 files changed, 138 insertions(+), 414 deletions(-) diff --git a/.github/vale/styles/OpenSearch/SubstitutionsBritish.yml b/.github/vale/styles/OpenSearch/SubstitutionsBritish.yml index fbe0a178de5..b3e7e7b37b0 100644 --- a/.github/vale/styles/OpenSearch/SubstitutionsBritish.yml +++ b/.github/vale/styles/OpenSearch/SubstitutionsBritish.yml @@ -5,6 +5,7 @@ level: error action: name: replace swap: + 'acknowledgement': acknowledgment 'analyse': analyze 'authorise': authorize 'behaviour': behavior diff --git a/_search-plugins/search-relevance/judgments.md b/_search-plugins/search-relevance/judgments.md index 60bec96e940..4dae8030a40 100644 --- a/_search-plugins/search-relevance/judgments.md +++ b/_search-plugins/search-relevance/judgments.md @@ -10,39 +10,40 @@ has_children: false # Judgments A judgment is a relevance rating assigned to a specific document in the context of a particular query. Multiple judgments are grouped together into judgment lists. -Typically, judgments fall into two types---implicit and explicit: +Typically, judgments are categorized as two types---implicit and explicit: -* Implicit judgments are ratings derived from user behavior (for example, what did the user see and select after searching?) -* Humans have traditionally produced explicit judgments, but large language models (LLMs) are increasingly taking on this task. +- Implicit judgments are ratings derived from user behavior (for example, what did the user see and select after searching?). +- Humans have traditionally produced explicit judgments, but large language models (LLMs) are increasingly used for this task. -Search Relevance Workbench supports all types of judgments: +Search Relevance Workbench (SRW) supports all types of judgments: -* Using LLMs, typically called LLM-as-a-Judge, to generate judgments by evaluating search results using a prompt. -* Generating implicit judgments based on data that adheres to the User Behavior Insights (UBI) schema specification. -* Importing judgments that were collected using a process outside of SRW. +- Using LLMs as automated judges (an approach known as LLM-as-a-Judge) to generate judgments by evaluating search results using a prompt. +- Generating implicit judgments based on data that adheres to the User Behavior Insights (UBI) schema specification. +- Importing judgments that were collected using a process outside of SRW. ## Using LLM-as-a-Judge -Generate explicit judgments with an LLM in Search Relevance Workbench when you don't have human annotators available, or you need to scale up the number of judgments beyond what humans can provide. +Generate explicit judgments with an LLM in SRW when you don't have human annotators available, or you need to scale up the number of judgments beyond what humans can provide. -The [LLM-as-a-Judge tutorial]({{site.url}}{{site.baseurl}}/tutorials/llm-as-a-judge-tutorial/) walks you through the process step-by-step. +For step-by-step instructions, see [Using LLM-as-a-Judge for search relevance]({{site.url}}{{site.baseurl}}/tutorials/llm-as-a-judge-tutorial/). ### Prerequisites To use LLM-as-a-Judge, configure the following components: -* A connector to an LLM to use for generating the judgments. For more information, see [Creating connectors for third-party ML platforms]({{site.url}}{{site.baseurl}}/ml-commons-plugin/remote-models/connectors/). -* A query set: Together with the `size` parameter, the query set defines the scope for generating judgments. For each query, the top k documents are retrieved from the specified index, where k is defined in the `size` parameter. -* A search configuration: A search configuration defines how documents are retrieved for use in query/document pairs. +- A connector to an LLM to use for generating the judgments. For more information, see [Connectors]({{site.url}}{{site.baseurl}}/ml-commons-plugin/remote-models/connectors/). +- A query set: Together with the `size` parameter, the query set defines the scope for generating judgments. For each query, the top k documents are retrieved from the specified index, in which k is defined by the `size` parameter. +- A search configuration: A search configuration defines how documents are retrieved for use in query-document pairs. -The AI-assisted judgment process works as follows: -- For each query, the top k documents are retrieved using the defined search configuration, which includes the index information. The query and each document from the result list create a query/document pair. -- The LLM is then called with a predefined prompt to generate a judgment for each query/document pair. +The AI-assisted judgment process consists of the following steps: + +- For each query, the top k documents are retrieved using the defined search configuration, which includes the index information. The query and each document from the result list create a query-document pair. +- The LLM is then called with a predefined prompt to generate a judgment for each query-document pair. - All generated judgments are stored in the judgments cache index for reuse in future experiments. To create a judgment list, provide the model ID of the LLM, an available query set, and a created search configuration. -The following example uses a generic prompt template with a scale of 0.0 to 1.0. To reduce the volume of data sent to the LLM (and therefore the cost), use the `contextFields` parameter to specify which fields from each result to include. +The following example uses a generic prompt template with a scale of 0.0 to 1.0. To reduce the volume of data sent to the LLM (and therefore the cost), use the `contextFields` parameter to specify which fields from each result to include: ```json PUT _plugins/_search_relevance/judgments @@ -56,30 +57,30 @@ PUT _plugins/_search_relevance/judgments "size":5, "contextFields": ["title", "description", "category"], "llmJudgmentRatingType": "SCORE0_1", - "promptTemplate": "Rate the relevance of these search results {{hits}} for the query '{{queryText}}' on a scale of 0-1, where 0 is completely irrelevant and 1 is perfectly relevant. Consider the product title, description, and category." + "promptTemplate": "Rate the relevance of these search results {% raw %}{{hits}}{% endraw %} for the query '{% raw %}{{queryText}}{% endraw %}' on a scale of 0-1, where 0 is completely irrelevant and 1 is perfectly relevant. Consider the product title, description, and category." } ``` {% include copy-curl.html %} ### Request body fields -Use the following parameters to create LLM-based judgments: - -Parameter | Data type | Description -:--- | :--- | :--- -`name` | String | The name of the judgment list. -`description` | String | Optional. A description of the judgment list. -`type` | String | Set to `LLM_JUDGMENT`. -`modelId` | String | The ID of the deployed ML model to use for generating judgments. Must be a remote model connected to an external LLM service. -`querySetId` | String | The ID of the query set containing the queries to evaluate. -`searchConfigurationList` | Array of strings | List of search configuration IDs to use for retrieving documents to evaluate. -`size` | Integer | The number of top documents to retrieve and evaluate for each query. Default is 10. -`tokenLimit` | Integer | The maximum number of tokens to send to the LLM in a single request. Used to batch documents when the total content exceeds this limit. Default is 4000. -`contextFields` | Array of strings | Optional. Specifies which document fields to include when sending content to the LLM. If not specified, the entire document source is sent. Use this to reduce costs and focus the LLM on relevant fields. -`ignoreFailure` | Boolean | Whether to continue processing other documents if the LLM fails to generate a judgment for some documents. Default is false. -`llmJudgmentRatingType` | String | The type of rating scale to use. Options: `SCORE0_1` (numeric scale 0--1) or `RELEVANT_IRRELEVANT` (binary relevant/irrelevant). -`promptTemplate` | String | Optional. Custom prompt template for the LLM. Supports placeholders: `{{queryText}}`, `{{hits}}`. If not provided, a default template is used. -`overwriteCache` | Boolean | Whether to overwrite existing cached judgments for the same query-document pairs. Default is false (reuse cached judgments). +The following table lists the parameters for creating LLM-based judgments. + +| Parameter | Data type | Description | +| :--- | :--- | :--- | +| `name` | String | The name of the judgment list. | +| `description` | String | Optional. A description of the judgment list. | +| `type` | String | Set to `LLM_JUDGMENT`. | +| `modelId` | String | The ID of the deployed machine learning (ML) model to use for generating judgments. Must be a remote model connected to an external LLM service. | +| `querySetId` | String | The ID of the query set containing the queries to evaluate. | +| `searchConfigurationList` | Array of strings | The list of search configuration IDs to use for retrieving documents to evaluate. | +| `size` | Integer | The number of top documents to retrieve and evaluate for each query. Default is `10`. | +| `tokenLimit` | Integer | The maximum number of tokens to send to the LLM in a single request. Used to batch documents when the total content exceeds this limit. Default is `4,000`. | +| `contextFields` | Array of strings | Optional. Specifies which document fields to include when sending content to the LLM. If not specified, the entire document source is sent. Use this parameter to reduce costs and focus the LLM on relevant fields. | +| `ignoreFailure` | Boolean | Whether to continue processing other documents if the LLM fails to generate a judgment for some documents. Default is `false`. | +| `llmJudgmentRatingType` | String | The type of rating scale to use. Valid values are `SCORE0_1` (numeric scale 0--1) and `RELEVANT_IRRELEVANT` (binary relevant/irrelevant). Use `SCORE0_1` for graded relevance metrics such as NDCG. Use `RELEVANT_IRRELEVANT` for binary metrics such as precision and recall. | +| `promptTemplate` | String | Optional. A custom prompt template for the LLM. Supports {% raw %}`{{queryText}}`{% endraw %} and {% raw %}`{{hits}}`{% endraw %} placeholders. If not provided, the default template is used. | +| `overwriteCache` | Boolean | Whether to overwrite existing cached judgments for the same query-document pairs. Default is `false` (reuse cached judgments). | ### Custom prompt templates @@ -119,15 +120,17 @@ PUT /_plugins/_search_relevance/judgments ### Using different LLM providers -You can adapt the connector configuration for other providers: +You can adapt the connector configuration for other providers. + +#### Amazon Bedrock example -#### AWS Bedrock example +The following example creates a connector for Amazon Bedrock: ```json POST /_plugins/_ml/connectors/_create { - "name": "AWS Bedrock Connector", - "description": "Connector to AWS Bedrock", + "name": "Amazon Bedrock Connector", + "description": "Connector to Amazon Bedrock", "version": "1", "protocol": "aws_sigv4", "parameters": { @@ -153,19 +156,24 @@ POST /_plugins/_ml/connectors/_create ## Implicit judgments -Implicit judgments are derived from past user interactions. Search Relevance Workbench supports the Clicks Over Expected Clicks (COEC) click model, which uses *impression* and *click* signals to calculate judgments. +Implicit judgments are derived from past user interactions. SRW supports the Clicks Over Expected Clicks (COEC) click model, which uses *impression* and *click* signals to calculate judgments. -Input data must follow the [User Behavior Insights schema specification]({{site.url}}{{site.baseurl}}/search-plugins/ubi/schemas/). COEC uses every event in the `ubi_events` index whose `action_name` is `impression` or `click`. -COEC calculates an expected click-through rate (CTR) for each rank. It does this by dividing the total number of clicks by the total number of impressions observed at that rank, based on all events in `ubi_events`. This ratio represents the expected CTR for that position. +Input data must follow the [UBI index schemas]({{site.url}}{{site.baseurl}}/search-plugins/ubi/schemas/). COEC uses every event in the `ubi_events` index with an `action_name` of `impression` or `click`. -For each document displayed in a hit list after a query, the average CTR at that rank serves as the expected value for the query/document pair. COEC calculates the actual CTR for the query/document pair and divides it by this expected rank-based CTR. This means that query/document pairs with a higher CTR than the average for that rank will have a judgment value greater than 1. Conversely, if the CTR is lower than average, the judgment value will be lower than 1. +COEC calculates an expected click-through rate (CTR) for each rank by dividing the total number of clicks by the total number of impressions observed at that rank, based on all events in `ubi_events`. This ratio represents the expected CTR for that position. -Note that depending on the tracking implementation, multiple clicks for a single query can be recorded in the `ubi_events` index. As a result, the average CTR can sometimes exceed 1 (or 100%). -For query-document observations that occur at different positions, all impressions and clicks are assumed to have occurred at the lowest (best) position. This approach biases the final judgment toward lower values, reflecting the common trend that higher-ranked results typically receive higher CTRs. +For each document displayed in a hit list after a query, the average CTR at that rank serves as the expected value for the query-document pair. COEC calculates the actual CTR for the query-document pair and divides it by this expected rank-based CTR. Consequently, query-document pairs with a higher CTR than the average for that rank have a judgment value greater than 1. Conversely, if the CTR is lower than average, the judgment value is lower than 1. + +Depending on the tracking implementation, multiple clicks for a single query can be recorded in the `ubi_events` index. Consequently, the average CTR can sometimes exceed 1 (or 100%). +{: .note} + +For query-document observations that occur at different positions, all impressions and clicks are assumed to have occurred at the lowest (best) position. This aggregation approach biases the final judgment toward lower values, reflecting the common trend that higher-ranked results typically receive higher CTRs. {: .note} ### Example request +The following example creates an implicit judgment list using the COEC click model: + ```json PUT _plugins/_search_relevance/judgments { @@ -179,21 +187,24 @@ PUT _plugins/_search_relevance/judgments ### Request body fields -Use the following parameters to create implicit judgments: +The following table lists the parameters for creating implicit judgments. -Parameter | Data type | Description -`name` | String | The name of the judgment list. -`clickModel` | String | The model used to calculate implicit judgments. Only `coec` (Clicks Over Expected Clicks) is supported. -`type` | String | Set to `UBI_JUDGMENT`. -`maxRank` | Integer | The maximum rank to consider when including events in the judgment calculation. -`startDate` | Date | The optional starting date from which behavioral data events are considered for implicit judgment generation. The format is`yyyy-MM-dd`. -`endDate` | Date | The optional end date until which behavioral data events are considered for implicit judgment generation. The format is`yyyy-MM-dd`. +| Parameter | Data type | Description | +| :--- | :--- | :--- | +| `name` | String | The name of the judgment list. | +| `clickModel` | String | The model used to calculate implicit judgments. Only `coec` (Clicks Over Expected Clicks) is supported. | +| `type` | String | Set to `UBI_JUDGMENT`. | +| `maxRank` | Integer | The maximum rank to consider when including events in the judgment calculation. | +| `startDate` | Date | An optional starting date from which behavioral data events are considered for implicit judgment generation. The format is `yyyy-MM-dd`. | +| `endDate` | Date | An optional end date until which behavioral data events are considered for implicit judgment generation. The format is `yyyy-MM-dd`. | ## Importing judgments -You may already have external processes for generating judgments. Regardless of the judgment type or the way they were generated, you can import them into Search Relevance Workbench. +You may already have external processes for generating judgments. Regardless of the judgment type or the way they were generated, you can import them into SRW. -#### Example request +### Example request + +The following example imports a set of judgments for two queries: ```json PUT _plugins/_search_relevance/judgments @@ -257,21 +268,22 @@ PUT _plugins/_search_relevance/judgments ``` {% include copy-curl.html %} -#### Request body fields +### Request body fields -Use the following parameters to import judgments: +The following table lists the parameters for importing judgments. -Parameter | Data type | Description -`name` | String | The name of the judgment list. -`description` | String | An optional description of the judgment list. -`type` | String | Set to `IMPORT_JUDGMENT`. -`judgmentRatings` | Array | A list of JSON objects containing the judgments. Judgments are grouped by query, each containing a nested map in which document IDs (`docId`) serve as keys and their floating-point ratings serve as values. +| Parameter | Data type | Description | +| :--- | :--- | :--- | +| `name` | String | The name of the judgment list. | +| `description` | String | An optional description of the judgment list. | +| `type` | String | Set to `IMPORT_JUDGMENT`. | +| `judgmentRatings` | Array | A list of JSON objects containing the judgments. Judgments are grouped by query, each containing a nested map in which document IDs (`docId`) serve as keys and their floating-point ratings serve as values. | ## Managing judgment lists You can retrieve or delete judgment lists using the following APIs. -### View a judgment list +### Viewing a judgment list Retrieve a judgment list by its ID. @@ -281,7 +293,7 @@ Retrieve a judgment list by its ID. GET _plugins/_search_relevance/judgments/{judgment_list_id} ``` -### Path parameters +#### Path parameters The following table lists the available path parameters. @@ -392,7 +404,7 @@ GET _plugins/_search_relevance/judgments/b54f791a-3b02-49cb-a06c-46ab650b2ade -### Delete a judgment list +### Deleting a judgment list Delete a judgment list by its ID. @@ -428,9 +440,9 @@ DELETE _plugins/_search_relevance/judgments/b54f791a-3b02-49cb-a06c-46ab650b2ade } ``` -### Search for a judgment list +### Searching for a judgment list -Search for judgment lists using query DSL. The response excludes `judgmentRatings.ratings` by default; to include it, specify the `_source` field in the query. +Search for judgment lists using query domain-specific language (DSL). The response excludes `judgmentRatings.ratings` by default; to include it, specify the `_source` field in the query. #### Endpoints @@ -441,7 +453,7 @@ POST _plugins/_search_relevance/judgments/_search #### Example request -Search for judgment lists that include the exact query `red dress`: +The following example searches for judgment lists that include the exact query `red dress`: ```json GET _plugins/_search_relevance/judgments/_search @@ -504,3 +516,7 @@ GET _plugins/_search_relevance/judgments/_search } } ``` + +## Related documentation + +- [Automate search relevance evaluation using LLMs]({{site.url}}{{site.baseurl}}/tutorials/llm-as-a-judge-tutorial/) \ No newline at end of file diff --git a/_tutorials/index.md b/_tutorials/index.md index f686a8efce5..f031f7af8ba 100644 --- a/_tutorials/index.md +++ b/_tutorials/index.md @@ -9,13 +9,14 @@ permalink: /tutorials/ redirect_from: - /ml-commons-plugin/tutorials/ - /ml-commons-plugin/tutorials/index/ -cards: +getting_started_cards: - heading: "Searching data 101" description: "Learn the fundamentals of search and explore OpenSearch query languages and types" link: "/getting-started/search-data/" - heading: "OpenSearch Dashboards" description: "Start visualizing your data with interactive dashboards and powerful analytics tools" link: "/dashboards/quickstart/" +tutorial_cards: - heading: "Vector search" description: "Implement similarity search using vectors and enhance results with AI capabilities" link: "/tutorials/vector-search/" @@ -29,12 +30,22 @@ cards: description: "Build filterable search experiences for applications like e-commerce or location search" link: "/tutorials/faceted-search/" - heading: "LLM-as-a-Judge" - description: "Getting started with LLM-as-a-Judge for search relevance evaluation" + description: "Automate search relevance evaluation using LLMs" link: "/tutorials/llm-as-a-judge-tutorial/" --- # Tutorials -Follow our step-by-step tutorials to learn how to use OpenSearch features. +Follow step-by-step tutorials to learn how to use OpenSearch features. -{% include cards.html cards=page.cards %} +## Getting started + +Learn the basics of searching and visualizing data in OpenSearch. + +{% include cards.html cards=page.getting_started_cards %} + +## Building search features using OpenSearch + +Implement specific search features end to end. + +{% include cards.html cards=page.tutorial_cards %} diff --git a/_tutorials/llm-as-a-judge-tutorial.md b/_tutorials/llm-as-a-judge-tutorial.md index 0644547a9b5..a321192b830 100644 --- a/_tutorials/llm-as-a-judge-tutorial.md +++ b/_tutorials/llm-as-a-judge-tutorial.md @@ -1,56 +1,24 @@ --- layout: default -title: Getting started with LLM-as-a-Judge for search relevance evaluation +title: LLM-as-a-Judge has_children: false -parent: Search Relevance Workbench -nav_order: 4 -steps: - - heading: "Set up ML Commons and create an external LLM connector" - link: "/tutorials/llm-as-a-judge-tutorial/#step-1-set-up-ml-commons-and-create-an-external-llm-connector" - - heading: "Create a simple search index with sample data" - link: "/tutorials/llm-as-a-judge-tutorial/#step-2-create-a-simple-search-index-with-sample-data" - - heading: "Create search configurations" - link: "/tutorials/llm-as-a-judge-tutorial/#step-3-create-search-configuration-baseline" - - heading: "Create a query set" - link: "/tutorials/llm-as-a-judge-tutorial/#step-4-create-a-query-set" - - heading: "Generate LLM judgments" - link: "/tutorials/llm-as-a-judge-tutorial/#step-5-generate-llm-judgments" - - heading: "Run experiments with LLM judgments" - link: "/tutorials/llm-as-a-judge-tutorial/#step-6-run-experiments-with-llm-judgments" +nav_order: 70 --- -# Getting started with LLM-as-a-Judge for search relevance evaluation +# Using LLM-as-a-Judge for search relevance -LLM-as-a-Judge is a technique that leverages large language models to automatically evaluate search result relevance, providing a scalable and consistent approach to search quality assessment. +LLM-as-a-Judge is a technique that uses large language models (LLMs) to automatically evaluate search result relevance. Manually annotating search results is time-consuming and inconsistent across annotators. LLM-as-a-Judge automates this process, enabling frequent and repeatable evaluation of search quality. -In this tutorial, you'll learn how to: - -- **Set up external LLM integration**: Connect OpenSearch to external LLM providers like OpenAI, AWS Bedrock, or others. -- **Generate automated judgments**: Use an LLM to evaluate search result relevance without manual annotation. - -You will then be ready to [run an experiment to evaluate search quality]({{site.url}}{{site.baseurl}}/search-plugins/search-relevance/evaluate-search-quality/#creating-a-pointwise-experiment) using the LLM-generated judgments. - -## OpenSearch components for LLM-as-a-Judge - -In this tutorial, you'll use the following OpenSearch components: - -- [ML Commons plugin]({{site.url}}{{site.baseurl}}/ml-commons-plugin/index/) for LLM integration -- [Search Relevance Workbench]({{site.url}}{{site.baseurl}}/search-plugins/search-relevance/using-search-relevance-workbench/) for evaluation workflows -- [Remote model connectors]({{site.url}}{{site.baseurl}}/ml-commons-plugin/remote-models/index/) for external LLM APIs -- [Search configuration]({{site.url}}{{site.baseurl}}/search-plugins/search-relevance/search-configurations/) for defining search strategies -- [Query set]({{site.url}}{{site.baseurl}}/search-plugins/search-relevance/query-sets/) for organizing test queries -- [Judgments]({{site.url}}{{site.baseurl}}/search-plugins/search-relevance/judgments/) for storing relevance assessments -- [Experiments]({{site.url}}{{site.baseurl}}/search-plugins/search-relevance/experiments/) for evaluating search quality +After completing this tutorial, you can [run an experiment to evaluate search quality]({{site.url}}{{site.baseurl}}/search-plugins/search-relevance/evaluate-search-quality/#creating-a-pointwise-experiment) using the LLM-generated judgments. ## Prerequisites -For this tutorial, you'll need: +For this tutorial, you need an API key for an external LLM provider (OpenAI, Amazon Bedrock). -- OpenSearch 3.5 or newer with the Search Relevance Workbench plugin installed -- ML Commons plugin installed and configured -- An API key for an external LLM provider (OpenAI, AWS Bedrock) +Using an external LLM incurs API costs based on the number of queries and results evaluated. +{: .note} -First, enable the Search Relevance Workbench and configure ML Commons: +First, enable the Search Relevance Workbench and configure the following settings: ```json PUT /_cluster/settings @@ -65,23 +33,9 @@ PUT /_cluster/settings ``` {% include copy-curl.html %} -## Tutorial - -This tutorial consists of the following steps: - -{% include list.html list_items=page.steps%} - -You can follow this tutorial by using your command line or the OpenSearch Dashboards [Dev Tools console]({{site.url}}{{site.baseurl}}/dashboards/dev-tools/run-queries/). +### Step 1: Configure a model -Some steps in the tutorial contain optional Test it{: .text-delta} sections. You can confirm that the step completed successfully by running the requests in these sections. - -### Step 1: Set up ML Commons and create an external LLM connector - -First, you'll create a connector to an external LLM service. This tutorial uses OpenAI's GPT models, but you can adapt it for other providers like AWS Bedrock. - -#### Step 1(a): Create an ML connector - -Create a connector to OpenAI's chat completion API. Replace `YOUR_API_KEY` with your actual OpenAI API key: +First, create a connector to an externally hosted LLM. This tutorial uses OpenAI, but you can adapt it for other providers such as Amazon Bedrock. Replace `YOUR_API_KEY` with your OpenAI API key: ```json POST /_plugins/_ml/connectors/_create @@ -113,19 +67,7 @@ POST /_plugins/_ml/connectors/_create ``` {% include copy-curl.html %} -The response contains the connector ID: - -```json -{ - "connector_id": "abc123def456" -} -``` - -You will use the returned `connector_id` in the next step. - -#### Step 1(b): Register and deploy the model - -Register and deploy the connector as a remote model: +Then register and deploy the model. Replace `{connector_id}` with the ID returned in the previous response: ```json POST /_plugins/_ml/models/_register?deploy=true @@ -133,105 +75,34 @@ POST /_plugins/_ml/models/_register?deploy=true "name": "openai_gpt-3.5-turbo", "function_name": "remote", "description": "External LLM model via OpenAI", - "connector_id": "abc123def456" + "connector_id": "{connector_id}" } ``` {% include copy-curl.html %} -Registering a model is an asynchronous task. OpenSearch sends back a task ID for this task: +This is an asynchronous operation. To verify the task status, use the [Get ML task]({{site.url}}{{site.baseurl}}/ml-commons-plugin/api/tasks-apis/get-task/) API. Once the state is `COMPLETED`, OpenSearch returns the `model_id` you'll use in the following steps. -```json -{ - "task_id": "aFeif4oB5Vm0Tdw8yoN7", - "status": "CREATED" -} -``` +### Step 2: Create a search index -You can check the status of the task by using the [Get ML Task API]({{site.url}}{{site.baseurl}}/ml-commons-plugin/api/tasks-apis/get-task/): - -```json -GET /_plugins/_ml/tasks/aFeif4oB5Vm0Tdw8yoN7 -``` - -OpenSearch saves the registered model in the model index. Deploying a model creates a model instance and caches the model in memory. - -Once the task is complete, the task state changes to `COMPLETED` and the [Get ML Task API]({{site.url}}{{site.baseurl}}/ml-commons-plugin/api/tasks-apis/get-task/) response contains the `model_id` for the deployed model (which is different from the initial `task_id`): - -```json -{ - "model_id": "DQmk2ZwBqLOthQZKMqU-", - "task_type": "REGISTER_MODEL", - "function_name": "REMOTE", - "state": "COMPLETED", - "worker_node": [ - "rbmK-mMDQfecqr41sOjEsA" - ], - "create_time": 1773177942532, - "last_update_time": 1773177942677, - "is_async": false -} -``` - -You'll need the `model_id` in order to use the deployed model for several of the following steps. - -{% include copy-curl.html %} - -
- - Test it - - {: .text-delta} - -Test the model connection: - -```json -POST /_plugins/_ml/models/MODEL_ID_HERE/_predict -{ - "parameters": { - "messages": "[{\"role\": \"user\", \"content\": \"Say hello in one word\"}]" - } -} -``` -{% include copy-curl.html %} - -You should receive a response from the LLM. -
- -### Step 2: Create a simple search index with sample data - -Now you'll create a simple index with product data for testing search relevance. - -#### Step 2(a): Create the index +Create a `products` index: ```json PUT /products { "mappings": { "properties": { - "title": { - "type": "text" - }, - "description": { - "type": "text" - }, - "category": { - "type": "keyword" - }, - "brand": { - "type": "keyword" - }, - "price": { - "type": "float" - } + "title": { "type": "text" }, + "description": { "type": "text" }, + "category": { "type": "keyword" }, + "brand": { "type": "keyword" }, + "price": { "type": "float" } } } } ``` {% include copy-curl.html %} -#### Step 2(b): Index sample documents - -Add sample product documents using the bulk API: +Index example documents into the index: ```json POST /products/_bulk @@ -246,31 +117,11 @@ POST /products/_bulk {"index":{"_id":"5"}} {"title":"Dell Gaming Monitor 27-inch","description":"High refresh rate gaming monitor with G-Sync support","category":"Computers","brand":"Dell","price":399.99} ``` - {% include copy-curl.html %} -
- - Test it - - {: .text-delta} +### Step 3: Create a search configuration -Verify the documents were indexed: - -```json -GET /products/_search -{ - "query": { - "match_all": {} - } -} -``` -{% include copy-curl.html %} -
- -### Step 3: Create search configuration "baseline" - -Search configuration defines a search strategy to evaluate using the LLM-as-a-Judge judgments. +A _search configuration_ defines a search strategy to evaluate. The `%SearchText%` placeholder is replaced with each query from the query set during evaluation: ```json PUT /_plugins/_search_relevance/search_configurations @@ -282,37 +133,9 @@ PUT /_plugins/_search_relevance/search_configurations ``` {% include copy-curl.html %} -The response contains the search configuration ID: - -```json -{ - "search_configuration_id": "baseline_config_id" -} -``` - -
- - Test it - - {: .text-delta} - -List all search configurations: - -```json -GET _plugins/_search_relevance/search_configurations/_search -{ - "query": - { - "match_all": {} - } -} -``` -{% include copy-curl.html %} -
- ### Step 4: Create a query set -Query sets contain the test queries you'll use for evaluation. +Create a query set containing test queries for evaluation: ```json PUT /_plugins/_search_relevance/query_sets @@ -329,38 +152,9 @@ PUT /_plugins/_search_relevance/query_sets ``` {% include copy-curl.html %} -The response contains the query set ID: - -```json -{ - "query_set_id": "electronics_queries_id" -} -``` - -
- - Test it - - {: .text-delta} - -Verify the query set was created: - -```json -GET /_plugins/_search_relevance/query_sets/_search -{ - "query": { - "match": { - "name": "Electronics Queries" - } - } -} -``` -{% include copy-curl.html %} -
- ### Step 5: Generate LLM judgments -Now you'll create an LLM judgment that uses your deployed model to evaluate search results. +Create an LLM judgment that uses your deployed model to evaluate search results. Replace `{model_id}`, `{query_set_id}`, and `{search_configuration_id}` with the IDs returned in previous steps: ```json PUT /_plugins/_search_relevance/judgments @@ -368,9 +162,9 @@ PUT /_plugins/_search_relevance/judgments "name": "LLM Judgment via OpenAI", "description": "Uses GPT-3.5-turbo to evaluate product search results", "type": "LLM_JUDGMENT", - "modelId": "MODEL_ID_HERE", - "querySetId": "QUERY_SET_ID_HERE", - "searchConfigurationList": ["SEARCH_CONFIGURATION_ID_HERE"], + "modelId": "{model_id}", + "querySetId": "{query_set_id}", + "searchConfigurationList": ["{search_configuration_id}"], "size": 10, "tokenLimit": 4000, "contextFields": ["title", "description", "category"], @@ -382,121 +176,23 @@ PUT /_plugins/_search_relevance/judgments ``` {% include copy-curl.html %} -The response contains the judgment ID: - -```json -{ - "judgment_id": "LLM_JUDGMENT_ID" -} -``` - -The LLM judgment process runs asynchronously. Wait a few moments for the judgments to be generated, then check the status: - -```json -GET /search-relevance-judgment/_doc/LLM_JUDGMENT_ID -``` -{% include copy-curl.html %} - -You can see the judgments and how they were arrived at: +For a description of all request body parameters, see [Judgments]({{site.url}}{{site.baseurl}}/search-plugins/search-relevance/judgments/#request-body-fields). -```json -{ - "_index": "search-relevance-judgment", - "_id": "d6f73218-ad6b-408f-b6ab-186a47b27e87", - "_version": 2, - "_seq_no": 10, - "_primary_term": 1, - "found": true, - "_source": { - "id": "d6f73218-ad6b-408f-b6ab-186a47b27e87", - "timestamp": "2026-03-10T21:37:13.837Z", - "name": "LLM Judgment via OpenAI", - "status": "COMPLETED", - "type": "LLM_JUDGMENT", - "metadata": { - "contextFields": [ - "title", - "description", - "category" - ], - "ignoreFailure": false, - "llmJudgmentRatingType": "SCORE0_1", - "size": 10, - "modelId": "DQmk2ZwBqLOthQZKMqU-", - "overwriteCache": false, - "searchConfigurationList": [ - "0fa1fedb-4bcb-469d-9fcb-2a5cd6709e1d" - ], - "tokenLimit": 4000, - "promptTemplate": "Rate the relevance of these search results {% raw %}{{hits}}{% endraw %} for the query '{% raw %}{{queryText}}{% endraw %}' on a scale of 0-1, where 0 is completely irrelevant and 1 is perfectly relevant. Consider the product title, description, and category.", - "querySetId": "4c6bf6f4-c2e4-4c76-a668-82de11d14846" - }, - "judgmentRatings": [ - { - "query": "smart tv", - "ratings": [ - { - "docId": "1", - "rating": "0.9" - }, - { - "docId": "2", - "rating": "0.8" - } - ] - }, - { - "query": "laptop computer", - "ratings": [ - { - "docId": "4", - "rating": "0.9" - } - ] - }, - { - "query": "wireless headphones", - "ratings": [ - { - "docId": "3", - "rating": "1.0" - } - ] - } - ] - } -} -``` - - -
- - Test it - - {: .text-delta} - -Check the judgment cache to see the individual generated ratings: +The judgment process runs asynchronously. To verify the status, retrieve the judgment by its ID: ```json -GET /.plugins-search-relevance-judgment-cache/_search -{ - "size": 5, - "query": { - "match_all": {} - } -} +GET /search-relevance-judgment/_doc/{judgment_id} ``` {% include copy-curl.html %} -You should see documents with ratings between 0 and 1 generated by the LLM. -
+When the `status` field is `COMPLETED`, the `judgmentRatings` array contains the generated relevance scores for each query-document pair. -### Step 6: Run experiments with LLM judgments +## Next steps -Congratulations, you are now ready to [run an experiment to evaluate search quality]({{site.url}}{{site.baseurl}}/search-plugins/search-relevance/evaluate-search-quality/#creating-a-pointwise-experiment) using the LLM-generated judgments that you have created. The search configuration and query set that you created during this tutorial can be used as part of running your first evaluation using the new judgment list. +You are now ready to [run an experiment to evaluate search quality]({{site.url}}{{site.baseurl}}/search-plugins/search-relevance/evaluate-search-quality/#creating-a-pointwise-experiment) with the LLM-generated judgments. The search configuration and query set that you created during this tutorial can serve as inputs for your first evaluation. -## Further reading +## Related documentation -- Learn more about [Search Relevance Workbench]({{site.url}}{{site.baseurl}}/search-plugins/search-relevance/using-search-relevance-workbench/) -- Learn about customizing your prompt and other [advanced features]({{site.url}}{{site.baseurl}}/search-plugins/search-relevance/judgments/#using-llm-as-a-judge) -- Explore [ML Commons remote models]({{site.url}}{{site.baseurl}}/ml-commons-plugin/remote-models/index/) +- [Search Relevance Workbench]({{site.url}}{{site.baseurl}}/search-plugins/search-relevance/using-search-relevance-workbench/) +- [Using LLM-as-a-Judge]({{site.url}}{{site.baseurl}}/search-plugins/search-relevance/judgments/#using-llm-as-a-judge) +- [Connecting to externally hosted models]({{site.url}}{{site.baseurl}}/ml-commons-plugin/remote-models/index/) From e9603f274f6f412d2075b3b25d4a290d8bd0b009 Mon Sep 17 00:00:00 2001 From: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Date: Thu, 28 May 2026 13:55:13 -0400 Subject: [PATCH 28/28] Apply suggestion from @kolchfa-aws Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> --- _tutorials/llm-as-a-judge-tutorial.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_tutorials/llm-as-a-judge-tutorial.md b/_tutorials/llm-as-a-judge-tutorial.md index a321192b830..84640c942d0 100644 --- a/_tutorials/llm-as-a-judge-tutorial.md +++ b/_tutorials/llm-as-a-judge-tutorial.md @@ -18,7 +18,7 @@ For this tutorial, you need an API key for an external LLM provider (OpenAI, Ama Using an external LLM incurs API costs based on the number of queries and results evaluated. {: .note} -First, enable the Search Relevance Workbench and configure the following settings: +Enable the Search Relevance Workbench and configure the following settings: ```json PUT /_cluster/settings