diff --git a/_query-dsl/geo-and-xy/xy.md b/_query-dsl/geo-and-xy/xy.md
index d0ed61c050..c62e4a94eb 100644
--- a/_query-dsl/geo-and-xy/xy.md
+++ b/_query-dsl/geo-and-xy/xy.md
@@ -14,7 +14,7 @@ To search for documents that contain [xy point]({{site.url}}{{site.baseurl}}/ope
 
 ## Spatial relations
 
-When you provide an xy shape to the xy query, the xy fields are matched using the following spatial relations to the provided shape.
+When you provide an xy shape to the xy query, the xy fields in the documents are matched using the following spatial relations to the provided shape.
 
 Relation | Description | Supporting xy field type
 :--- | :--- | :--- 
@@ -33,7 +33,7 @@ You can define the shape in an xy query either by providing a new shape definiti
 
 To provide a new shape to an xy query, define it in the `xy_shape` field.
 
-The following example illustrates searching for documents with xy shapes that match an xy shape defined at query time.
+The following example illustrates how to search for documents containing xy shapes that match an xy shape defined at query time.
 
 First, create an index and map the `geometry` field as an `xy_shape`:
 
diff --git a/_search-plugins/search-pipelines/collapse-processor.md b/_search-plugins/search-pipelines/collapse-processor.md
index cea0a15396..8a2723efa7 100644
--- a/_search-plugins/search-pipelines/collapse-processor.md
+++ b/_search-plugins/search-pipelines/collapse-processor.md
@@ -1,13 +1,15 @@
 ---
 layout: default
 title: Collapse
-nav_order: 7
+nav_order: 10
 has_children: false
 parent: Search processors
 grand_parent: Search pipelines
 ---
 
 # Collapse processor
+Introduced 2.12
+{: .label .label-purple }
 
 The `collapse` response processor discards hits that have the same value for a particular field as a previous document in the result set.
 This is similar to passing the `collapse` parameter in a search request, but the response processor is applied to the
diff --git a/_search-plugins/search-pipelines/filter-query-processor.md b/_search-plugins/search-pipelines/filter-query-processor.md
index 6c68821a27..799d393e42 100644
--- a/_search-plugins/search-pipelines/filter-query-processor.md
+++ b/_search-plugins/search-pipelines/filter-query-processor.md
@@ -1,13 +1,15 @@
 ---
 layout: default
 title: Filter query
-nav_order: 10
+nav_order: 20
 has_children: false
 parent: Search processors
 grand_parent: Search pipelines
 ---
 
 # Filter query processor
+Introduced 2.8
+{: .label .label-purple }
 
 The `filter_query` search request processor intercepts a search request and applies an additional query to the request, filtering the results. This is useful when you don't want to rewrite existing queries in your application but need additional filtering of the results.
 
diff --git a/_search-plugins/search-pipelines/ml-inference-search-request.md b/_search-plugins/search-pipelines/ml-inference-search-request.md
new file mode 100644
index 0000000000..a072458a41
--- /dev/null
+++ b/_search-plugins/search-pipelines/ml-inference-search-request.md
@@ -0,0 +1,531 @@
+---
+layout: default
+title: ML inference (request)
+nav_order: 30
+has_children: false
+parent: Search processors
+grand_parent: Search pipelines
+---
+
+# ML inference search request processor
+Introduced 2.16
+{: .label .label-purple }
+
+The `ml_inference` search request processor is used to invoke registered machine learning (ML) models in order to rewrite queries using the model output.
+
+**PREREQUISITE**<br>
+Before using the `ml_inference` search request processor, you must have either a local ML model hosted on your OpenSearch cluster or an externally hosted model connected to your OpenSearch cluster through the ML Commons plugin. For more information about local models, see [Using ML models within OpenSearch]({{site.url}}{{site.baseurl}}/ml-commons-plugin/using-ml-models/).
+For more information about externally hosted models, see [Connecting to externally hosted models]({{site.url}}{{site.baseurl}}/ml-commons-plugin/remote-models/index/).
+{: .note}
+
+## Syntax
+
+The following is the syntax for the `ml-inference` search request processor:
+
+```json
+{
+  "ml_inference": {
+    "model_id": "<model_id>",
+    "function_name": "<function_name>",
+    "full_response_path": "<full_response_path>",
+    "query_template": "<query_template>",
+    "model_config": {
+      "<model_config_field>": "<config_value>"
+    },
+    "model_input": "<model_input>",
+    "input_map": [
+      {
+        "<model_input_field>": "<query_input_field>"
+      }
+    ],
+    "output_map": [
+      {
+        "<query_output_field>": "<model_output_field>"
+      }
+    ]
+  }
+}
+```
+{% include copy-curl.html %}
+
+## Configuration parameters
+
+The following table lists the required and optional parameters for the `ml-inference` search request processor.
+
+| Parameter | Data type | Required/Optional | Description |
+|:--| :--- |:---|:---|
+| `model_id`| String | Required | The ID of the ML model used by the processor. |
+| `query_template` | String   | Optional  | A query string template used to construct a new query containing a `new_document_field`. Often used when rewriting a search query to a new query type. |
+| `function_name` | String    | Optional for externally hosted models<br/><br/>Required for local models | The function name of the ML model configured in the processor. For local models, valid values are `sparse_encoding`, `sparse_tokenize`, `text_embedding`, and `text_similarity`. For externally hosted models, valid value is `remote`. Default is `remote`.   |
+| `model_config` | Object    | Optional   | Custom configuration options for the ML model. For more information, see [The `model_config` object]({{site.url}}{{site.baseurl}}/ml-commons-plugin/api/model-apis/register-model/#the-model_config-object).  |
+| `model_input` | String    | Optional for externally hosted models<br/><br/>Required for local models | A template that defines the input field format expected by the model. Each local model type might use a different set of inputs. For externally hosted models, default is `"{ \"parameters\": ${ml_inference.parameters} }`. |
+| `input_map` | Array | Required  | An array specifying how to map query string fields to the model input fields. Each element of the array is a map in the `"<model_input_field>": "<query_input_field>"` format and corresponds to one model invocation of a document field. If no input mapping is specified for an externally hosted model, then all document fields are passed to the model directly as input. The `input_map` size indicates the number of times the model is invoked (the number of Predict API requests). |
+| `<model_input_field>`  | String    | Required | The model input field name.  |
+| `<query_input_field>`  | String    | Required | The name or JSON path of the query field used as the model input. |
+| `output_map`  | Array | Required | An array specifying how to map the model output fields to new fields in the query string. Each element of the array is a map in the `"<query_output_field>": "<model_output_field>"` format. |
+| `<query_output_field>` | String    | Required | The name of the query field in which the model's output (specified by `model_output`) is stored.  |
+| `<model_output_field>` | String    | Required | The name or JSON path of the field in the model output to be stored in the `query_output_field`.  |
+| `full_response_path`   | Boolean   | Optional  | Set this parameter to `true` if the `model_output_field` contains a full JSON path to the field instead of the field name. The model output will then be fully parsed to get the value of the field. Default is `true` for local models and `false` for externally hosted models.  |
+| `ignore_missing`       | Boolean   | Optional  | If `true` and any of the input fields defined in the `input_map` or `output_map` are missing, then the missing fields are ignored. Otherwise, a missing field causes a failure. Default is `false`.  |
+| `ignore_failure` | Boolean   | Optional | Specifies whether the processor continues execution even if it encounters an error. If `true`, then any failure is ignored and the search continues. If `false`, then any failure causes the search to be canceled. Default is `false`.  |
+| `max_prediction_tasks` | Integer   | Optional  | The maximum number of concurrent model invocations that can run during query search. Default is `10`.  |
+| `description`          | String    | Optional   | A brief description of the processor.  |
+| `tag`                  | String    | Optional | An identifier tag for the processor. Useful for debugging to distinguish between processors of the same type.  |
+
+The `input_map` and `output_map` mappings support standard [JSON path](https://github.com/json-path/JsonPath) notation for specifying complex data structures.
+{: .note}
+
+## Using the processor
+
+Follow these steps to use the processor in a pipeline. You must provide a model ID, `input_map`, and `output_map` when creating the processor. Before testing a pipeline using the processor, make sure that the model is successfully deployed. You can check the model state using the [Get Model API]({{site.url}}{{site.baseurl}}/ml-commons-plugin/api/model-apis/get-model/).
+
+For local models, you must provide a `model_input` field that specifies the model input format. Add any input fields in `model_config` to `model_input`.
+
+For externally hosted models, the `model_input` field is optional, and its default value
+is `"{ \"parameters\": ${ml_inference.parameters} }`.
+
+### Setup
+
+Create an index named `my_index` and index two documents:
+
+```json
+POST /my_index/_doc/1
+{
+  "passage_text": "I am excited",
+  "passage_language": "en",
+  "label": "POSITIVE",
+  "passage_embedding": [
+    2.3886719,
+    0.032714844,
+    -0.22229004
+    ...]
+}
+```
+{% include copy-curl.html %}
+
+```json
+POST /my_index/_doc/2
+{
+  "passage_text": "I am sad",
+  "passage_language": "en",
+  "label": "NEGATIVE",
+  "passage_embedding": [
+    1.7773438,
+    0.4309082,
+    1.8857422,
+    0.95996094,
+    ...]
+}
+```
+{% include copy-curl.html %}
+
+When you run a term query on the created index without a search pipeline, the query searches for documents that contain the exact term specified in the query. The following query does not return any results because the query text does not match any of the documents in the index:
+
+```json
+GET /my_index/_search
+{
+  "query": {
+    "term": {
+      "passage_text": {
+        "value": "happy moments",
+        "boost": 1
+      }
+    }
+  }
+}
+```
+
+By using a model, the search pipeline can dynamically rewrite the term value to enhance or alter the search results based on the model inference. This means the model takes an initial input from the search query, processes it, and then updates the query term to reflect the model inference, potentially improving the relevance of the search results.
+
+### Example: Externally hosted model
+
+The following example configures an `ml_inference` processor with an externally hosted model.
+
+**Step 1: Create a pipeline**
+
+This example demonstrates how to create a search pipeline for an externally hosted sentiment analysis model that rewrites the term query value. The model requires an `inputs` field and produces results in a `label` field. Because the `function_name` is not specified, it defaults to `remote`, indicating an externally hosted model.
+
+The term query value is rewritten based on the model's output. The `ml_inference` processor in the search request needs an `input_map` to retrieve the query field value for the model input and an `output_map` to assign the model output to the query string.
+
+In this example, an `ml_inference` search request processor is used for the following term query:
+
+```json
+ {
+  "query": {
+    "term": {
+      "label": {
+        "value": "happy moments",
+        "boost": 1
+      }
+    }
+  }
+}
+```
+
+The following request creates a search pipeline that rewrites the preceding term query:
+
+```json
+PUT /_search/pipeline/ml_inference_pipeline
+{
+  "description": "Generate passage_embedding for searched documents",
+  "processors": [
+    {
+      "ml_inference": {
+        "model_id": "<your model id>",
+        "input_map": [
+          {
+            "inputs": "query.term.label.value"
+          }
+        ],
+        "output_map": [
+          {
+            "query.term.label.value": "label"
+          }
+        ]
+      }
+    }
+  ]
+}
+```
+{% include copy-curl.html %}
+
+When making a Predict API request to an externally hosted model, all necessary fields and parameters are usually contained within a `parameters` object:
+
+```json
+POST /_plugins/_ml/models/cleMb4kBJ1eYAeTMFFg4/_predict
+{
+  "parameters": {
+    "inputs": [
+      {
+        ...
+      }
+    ]
+  }
+}
+```
+
+Thus, to use an externally hosted sentiment analysis model, send a Predict API request in the following format:
+
+```json
+POST /_plugins/_ml/models/cywgD5EB6KAJXDLxyDp1/_predict
+{
+  "parameters": {
+    "inputs": "happy moments"
+  }
+}
+```
+{% include copy-curl.html %}
+
+The model processes the input and generates a prediction based on the sentiment of the input text. In this case, the sentiment is positive: 
+
+```json
+{
+  "inference_results": [
+    {
+      "output": [
+        {
+          "name": "response",
+          "dataAsMap": {
+            "label": "POSITIVE",
+            "score": "0.948"
+          }
+        }
+      ],
+      "status_code": 200
+    }
+  ]
+}
+```
+
+When specifying the `input_map` for an externally hosted model, you can directly reference the `inputs` field instead of providing its dot path `parameters.inputs`:
+
+```json
+"input_map": [  
+  {
+    "inputs": "query.term.label.value"
+  }
+]
+```
+
+**Step 2: Run the pipeline**
+
+Once you have created a search pipeline, you can run the same term query with the search pipeline:
+
+```json
+GET /my_index/_search?search_pipeline=my_pipeline_request_review
+{
+  "query": {
+    "term": {
+      "label": {
+        "value": "happy moments",
+        "boost": 1
+      }
+    }
+  }
+}
+```
+{% include copy-curl.html %}
+
+The query term value is rewritten based on the model's output. The model determines that the sentiment of the query term is positive, so the rewritten query appears as follows:
+
+```json
+{
+  "query": {
+    "term": {
+      "label": {
+        "value": "POSITIVE",
+        "boost": 1
+      }
+    }
+  }
+}
+```
+
+The response includes the document whose `label` field has the value `POSITIVE`:
+
+```json
+{
+  "took": 288,
+  "timed_out": false,
+  "_shards": {
+    "total": 1,
+    "successful": 1,
+    "skipped": 0,
+    "failed": 0
+  },
+  "hits": {
+    "total": {
+      "value": 1,
+      "relation": "eq"
+    },
+    "max_score": 0.00009405752,
+    "hits": [
+      {
+        "_index": "my_index",
+        "_id": "3",
+        "_score": 0.00009405752,
+        "_source": {
+          "passage_text": "I am excited",
+          "passage_language": "en",
+          "label": "POSITIVE"
+        }
+      }
+    ]
+  }
+}
+```
+
+### Example: Local model
+
+The following example shows you how to configure an `ml_inference` processor with a local model to rewrite a term query into a k-NN query.
+
+**Step 1: Create a pipeline**
+
+The following example shows you how to create a search pipeline for the `huggingface/sentence-transformers/all-distilroberta-v1` local model. The model is a [pretrained sentence transformer model]({{site.url}}{{site.baseurl}}/ml-commons-plugin/pretrained-models/#sentence-transformers)
+hosted in your OpenSearch cluster.
+
+If you invoke the model using the Predict API, then the request appears as follows:
+
+```json
+POST /_plugins/_ml/_predict/text_embedding/cleMb4kBJ1eYAeTMFFg4
+{
+  "text_docs": [
+    "today is sunny"
+  ],
+  "return_number": true,
+  "target_response": [
+    "sentence_embedding"
+  ]
+}
+```
+
+Using this schema, specify the `model_input` as follows:
+
+```json
+ "model_input": "{ \"text_docs\": ${input_map.text_docs}, \"return_number\": ${model_config.return_number}, \"target_response\": ${model_config.target_response} }"
+```
+
+In the `input_map`, map the `query.term.passage_embedding.value` query field to the `text_docs` field expected by the model:
+
+```json
+"input_map": [
+  {
+    "text_docs": "query.term.passage_embedding.value"
+  } 
+]
+```
+
+Because you specified the field to be converted into embeddings as a JSON path, you need to set the `full_response_path` to `true`. Then the full JSON document is parsed in order to obtain the input field:
+
+```json
+"full_response_path": true
+```
+
+The text in the `query.term.passage_embedding.value` field will be used to generate embeddings:
+
+```json
+{
+  "text_docs": "happy passage"
+}
+```
+
+The Predict API request returns the following response:
+
+```json
+{
+  "inference_results": [
+    {
+      "output": [
+        {
+          "name": "sentence_embedding",
+          "data_type": "FLOAT32",
+          "shape": [
+            768
+          ],
+          "data": [
+            0.25517133,
+            -0.28009856,
+            0.48519906,
+            ...
+          ]
+        }
+      ]
+    }
+  ]
+}
+```
+
+The model generates embeddings in the `$.inference_results.*.output.*.data` field. The `output_map` maps this field to the query field in the query template:
+
+```json
+"output_map": [
+  {
+    "modelPredictionOutcome": "$.inference_results.*.output.*.data"
+  }
+]
+```
+
+To configure an `ml_inference` search request processor with a local model, specify the `function_name` explicitly. In this example, the `function_name` is `text_embedding`. For information about valid `function_name` values, see [Configuration parameters](#configuration-parameters).
+
+The following is the final configuration of the `ml_inference` processor with the local model:
+
+```json
+PUT /_search/pipeline/ml_inference_pipeline_local
+{
+  "description": "searchs reviews and generates embeddings",
+  "processors": [
+    {
+      "ml_inference": {
+        "function_name": "text_embedding",
+        "full_response_path": true,
+        "model_id": "<your model id>",
+        "model_config": {
+          "return_number": true,
+          "target_response": [
+            "sentence_embedding"
+          ]
+        },
+        "model_input": "{ \"text_docs\": ${input_map.text_docs}, \"return_number\": ${model_config.return_number}, \"target_response\": ${model_config.target_response} }",
+        "query_template": """{
+        "size": 2,
+        "query": {
+          "knn": {
+            "passage_embedding": {
+              "vector": ${modelPredictionOutcome},
+              "k": 5
+              }
+            }
+           }
+          }""",
+        "input_map": [
+          {
+            "text_docs": "query.term.passage_embedding.value"
+          }
+        ],
+        "output_map": [
+          {
+            "modelPredictionOutcome": "$.inference_results.*.output.*.data"
+          }
+        ],
+        "ignore_missing": true,
+        "ignore_failure": true
+      }
+    }
+  ]
+}
+```
+{% include copy-curl.html %}
+
+**Step 2: Run the pipeline**
+
+Run the following query, providing the pipeline name in the request:
+
+```json
+GET /my_index/_search?search_pipeline=ml_inference_pipeline_local
+{
+"query": {
+  "term": {
+    "passage_embedding": {
+      "value": "happy passage"
+      }
+    }
+  }
+}
+```
+{% include copy-curl.html %}
+
+The response confirms that the processor ran a k-NN query, which returned document 1 with a higher score:
+
+```json
+{
+  "took": 288,
+  "timed_out": false,
+  "_shards": {
+    "total": 1,
+    "successful": 1,
+    "skipped": 0,
+    "failed": 0
+  },
+  "hits": {
+    "total": {
+      "value": 2,
+      "relation": "eq"
+    },
+    "max_score": 0.00009405752,
+    "hits": [
+      {
+        "_index": "my_index",
+        "_id": "1",
+        "_score": 0.00009405752,
+        "_source": {
+          "passage_text": "I am excited",
+          "passage_language": "en",
+          "label": "POSITIVE",
+          "passage_embedding": [
+            2.3886719,
+            0.032714844,
+            -0.22229004
+            ...]
+        }
+      },
+      {
+        "_index": "my_index",
+        "_id": "2",
+        "_score": 0.00001405052,
+        "_source": {
+          "passage_text": "I am sad",
+          "passage_language": "en",
+          "label": "NEGATIVE",
+          "passage_embedding": [
+            1.7773438,
+            0.4309082,
+            1.8857422,
+            0.95996094,
+            ...
+          ]
+        }
+      }
+    ]
+  }
+}
+```
diff --git a/_search-plugins/search-pipelines/ml-inference-search-response.md b/_search-plugins/search-pipelines/ml-inference-search-response.md
new file mode 100644
index 0000000000..e2ed7889c7
--- /dev/null
+++ b/_search-plugins/search-pipelines/ml-inference-search-response.md
@@ -0,0 +1,391 @@
+---
+layout: default
+title: ML inference (response) 
+nav_order: 40
+has_children: false
+parent: Search processors
+grand_parent: Search pipelines
+---
+
+# ML inference search response processor
+Introduced 2.16
+{: .label .label-purple }
+
+The `ml_inference` search response processor is used to invoke registered machine learning (ML) models in order to incorporate their outputs as new fields in documents within search results.
+
+**PREREQUISITE**<br>
+Before using the `ml_inference` search response processor, you must have either a local ML model hosted on your OpenSearch cluster or an externally hosted model connected to your OpenSearch cluster through the ML Commons plugin. For more information about local models, see [Using ML models within OpenSearch]({{site.url}}{{site.baseurl}}/ml-commons-plugin/using-ml-models/). For more information about externally hosted models, see [Connecting to externally hosted models]({{site.url}}{{site.baseurl}}/ml-commons-plugin/remote-models/index/).
+{: .note}
+
+## Syntax
+
+The following is the syntax for the `ml-inference` search response processor:
+
+```json
+{
+  "ml_inference": {
+    "model_id": "<model_id>",
+    "function_name": "<function_name>",
+    "full_response_path": "<full_response_path>",
+    "model_config":{
+      "<model_config_field>": "<config_value>"
+    },
+    "model_input": "<model_input>",
+    "input_map": [
+      {
+        "<model_input_field>": "<document_field>"
+      }
+    ],
+    "output_map": [
+      {
+        "<new_document_field>": "<model_output_field>"
+      }
+    ],
+    "override": "<override>",
+    "one_to_one": false
+  }
+}
+```
+{% include copy-curl.html %}
+
+## Request fields
+
+The following table lists the required and optional parameters for the `ml-inference` search response processor.
+
+| Parameter | Data type | Required/Optional | Description  |
+|:--| :--- | :--- |:---|
+| `model_id` | String | Required | The ID of the ML model used by the processor. |
+| `function_name`        | String    | Optional for externally hosted models<br/><br/>Required for local models | The function name of the ML model configured in the processor. For local models, valid values are `sparse_encoding`, `sparse_tokenize`, `text_embedding`, and `text_similarity`. For externally hosted models, valid value is `remote`. Default is `remote`. |
+| `model_config`         | Object    | Optional   | Custom configuration options for the ML model. For more information, see [The `model_config` object]({{site.url}}{{site.baseurl}}/ml-commons-plugin/api/model-apis/register-model/#the-model_config-object).|
+| `model_input`          | String    | Optional for externally hosted models<br/><br/>Required for local models | A template that defines the input field format expected by the model. Each local model type might use a different set of inputs. For externally hosted models, default is `"{ \"parameters\": ${ml_inference.parameters} }`. |
+| `input_map`            | Array | Optional for externally hosted models<br/><br/>Required for local models | An array specifying how to map document fields in the search response to the model input fields. Each element of the array is a map in the `"<model_input_field>": "<document_field>"` format and corresponds to one model invocation of a document field. If no input mapping is specified for an externally hosted model, then all document fields are passed to the model directly as input. The `input_map` size indicates the number of times the model is invoked (the number of Predict API requests). |
+| `<model_input_field>`  | String    | Optional for externally hosted models<br/><br/>Required for local models  | The model input field name. |
+| `<document_field>`     | String    | Optional for externally hosted models<br/><br/>Required for local models | The name or JSON path of the document field in the search response used as the model input.  |
+| `output_map`           | Array | Optional for externally hosted models<br/><br/>Required for local models | An array specifying how to map the model output fields to new fields in the search response document. Each element of the array is a map in the `"<new_document_field>": "<model_output_field>"` format. |
+| `<new_document_field>` | String    | Optional for externally hosted models<br/><br/>Required for local models | The name of the new field in the document in which the model's output (specified by `model_output`) is stored. If no output mapping is specified for externally hosted models, then all fields from the model output are added to the new document field.  |
+| `<model_output_field>` | String    | Optional for externally hosted models<br/><br/>Required for local models | The name or JSON path of the field in the model output to be stored in the `new_document_field`.  |
+| `full_response_path`   | Boolean   | Optional   | Set this parameter to `true` if the `model_output_field` contains a full JSON path to the field instead of the field name. The model output will then be fully parsed to get the value of the field. Default is `true` for local models and `false` for externally hosted models.  |
+| `ignore_missing`       | Boolean   | Optional  | If `true` and any of the input fields defined in the `input_map` or `output_map` are missing, then the missing fields are ignored. Otherwise, a missing field causes a failure. Default is `false`. |
+| `ignore_failure`       | Boolean   | Optional  | Specifies whether the processor continues execution even if it encounters an error. If `true`, then any failure is ignored and the search continues. If `false`, then any failure causes the search to be canceled. Default is `false`. |
+| `override`             | Boolean   | Optional   | Relevant if a document in the response already contains a field with the name specified in `<new_document_field>`. If `override` is `false`, then the input field is skipped. If `true`, then the existing field value is overridden by the new model output. Default is `false`.  |
+| `max_prediction_tasks` | Integer   | Optional  | The maximum number of concurrent model invocations that can run during document search. Default is `10`.  |
+| `one_to_one`           | Boolean    | Optional  | Set this parameter to `true` to invoke the model once (make one Predict API request) for each document. Default value (`false`) specifies to invoke the model with all documents from the search response, making one Predict API request. |
+| `description`          | String    | Optional  | A brief description of the processor. |
+| `tag`                  | String    | Optional | An identifier tag for the processor. Useful for debugging to distinguish between processors of the same type. |
+
+The `input_map` and `output_map` mappings support standard [JSON path](https://github.com/json-path/JsonPath) notation for specifying complex data structures.
+{: .note}
+
+### Setup
+
+Create an index named `my_index` and index one document to explain the mappings:
+
+```json
+POST /my_index/_doc/1
+{
+  "passage_text": "hello world"
+}
+```
+{% include copy-curl.html %}
+
+## Using the processor
+
+Follow these steps to use the processor in a pipeline. You must provide a model ID when creating the processor. Before testing a pipeline using the processor, make sure that the model is successfully deployed. You can check the model state using the [Get Model API]({{site.url}}{{site.baseurl}}/ml-commons-plugin/api/model-apis/get-model/).
+
+For local models, you must provide a `model_input` field that specifies the model input format. Add any input fields in `model_config` to `model_input`.
+
+For remote models, the `model_input` field is optional, and its default value is `"{ \"parameters\": ${ml_inference.parameters} }`.
+
+### Example: Externally hosted model
+
+The following example shows you how to configure an `ml_inference` search response processor with an externally hosted model.
+
+**Step 1: Create a pipeline**
+
+The following example shows you how to create a search pipeline for an externally hosted text embedding model. The model requires an `input` field and generates results in a `data` field. It converts the text in the `passage_text` field into text embeddings and stores the embeddings in the `passage_embedding` field. The `function_name` is not explicitly specified in the processor configuration, so it defaults to `remote`, signifying an externally hosted model:
+
+```json
+PUT /_search/pipeline/ml_inference_pipeline
+{
+  "description": "Generate passage_embedding when search documents",
+  "processors": [
+    {
+      "ml_inference": {
+        "model_id": "<your model id>",
+        "input_map": [
+          {
+            "input": "passage_text"
+          }
+        ],
+        "output_map": [
+          {
+            "passage_embedding": "data"
+          }
+        ]
+      }
+    }
+  ]
+}
+```
+{% include copy-curl.html %}
+
+When making a Predict API request to an externally hosted model, all necessary fields and parameters are usually contained within a `parameters` object:
+
+```json
+POST /_plugins/_ml/models/cleMb4kBJ1eYAeTMFFg4/_predict
+{
+  "parameters": {
+    "input": [
+      {
+        ...
+      }
+    ]
+  }
+}
+```
+
+When specifying the `input_map` for an externally hosted model, you can directly reference the `input` field instead of providing its dot path `parameters.input`:
+
+```json
+"input_map": [
+  {
+    "input": "passage_text"
+  }
+]
+```
+
+**Step 2: Run the pipeline**
+
+Run the following query, providing the pipeline name in the request:
+
+```json
+GET /my_index/_search?search_pipeline=ml_inference_pipeline_local
+{
+  "query": {
+    "match_all": {
+    }
+  }
+}
+```
+{% include copy-curl.html %}
+
+The response confirms that the processor has generated text embeddings in the `passage_embedding` field. The document within `_source` now contains both the `passage_text` and `passage_embedding` fields:
+
+```json
+
+{
+  "took": 288,
+  "timed_out": false,
+  "_shards": {
+    "total": 1,
+    "successful": 1,
+    "skipped": 0,
+    "failed": 0
+  },
+  "hits": {
+    "total": {
+      "value": 1,
+      "relation": "eq"
+    },
+    "max_score": 0.00009405752,
+    "hits": [
+      {
+        "_index": "my_index",
+        "_id": "1",
+        "_score": 0.00009405752,
+        "_source": {
+          "passage_text": "hello world",
+          "passage_embedding": [
+            0.017304314,
+            -0.021530833,
+            0.050184276,
+            0.08962978,
+            ...]
+        }
+      }
+      }
+    ]
+  }
+}
+```
+
+### Example: Local model
+
+The following example shows you how to configure an `ml_inference` search response processor with a local model.
+
+**Step 1: Create a pipeline**
+
+The following example shows you how to create a search pipeline for the `huggingface/sentence-transformers/all-distilroberta-v1` local model. The model is a [pretrained sentence transformer model]({{site.url}}{{site.baseurl}}/ml-commons-plugin/pretrained-models/#sentence-transformers) hosted in your OpenSearch cluster.
+
+If you invoke the model using the Predict API, then the request appears as follows:
+
+```json
+POST /_plugins/_ml/_predict/text_embedding/cleMb4kBJ1eYAeTMFFg4
+{
+  "text_docs":[ "today is sunny"],
+  "return_number": true,
+  "target_response": ["sentence_embedding"]
+}
+```
+
+Using this schema, specify the `model_input` as follows:
+
+```json
+ "model_input": "{ \"text_docs\": ${input_map.text_docs}, \"return_number\": ${model_config.return_number}, \"target_response\": ${model_config.target_response} }"
+```
+
+In the `input_map`, map the `passage_text` document field to the `text_docs` field expected by the model:
+
+```json
+"input_map": [
+  {
+    "text_docs": "passage_text"
+  }
+]
+```
+
+Because you specified the field to be converted into embeddings as a JSON path, you need to set the `full_response_path` to `true`. Then the full JSON document is parsed in order to obtain the input field:
+
+```json
+"full_response_path": true
+```
+
+The text in the `passage_text` field will be used to generate embeddings:
+
+```json
+{
+  "passage_text": "hello world"
+}
+```
+
+The Predict API request returns the following response:
+
+```json
+{
+  "inference_results" : [
+    {
+      "output" : [
+        {
+          "name" : "sentence_embedding",
+          "data_type" : "FLOAT32",
+          "shape" : [
+            768
+          ],
+          "data" : [
+            0.25517133,
+            -0.28009856,
+            0.48519906,
+            ...
+          ]
+        }
+      ]
+    }
+  ]
+}
+```
+
+The model generates embeddings in the `$.inference_results.*.output.*.data` field. The `output_map` maps this field to the newly created `passage_embedding` field in the search response document:
+
+```json
+"output_map": [
+  {
+    "passage_embedding": "$.inference_results.*.output.*.data"
+  }
+]
+```
+
+To configure an `ml_inference` search response processor with a local model, specify the `function_name` explicitly. In this example, the `function_name` is `text_embedding`. For information about valid `function_name` values, see [Request fields](#request-fields).
+
+The following is the final configuration of the `ml_inference` search response processor with the local model:
+
+```json
+PUT /_search/pipeline/ml_inference_pipeline_local
+{
+  "description": "search passage and generates embeddings",
+  "processors": [
+    {
+      "ml_inference": {
+        "function_name": "text_embedding",
+        "full_response_path": true,
+        "model_id": "<your model id>",
+        "model_config": {
+          "return_number": true,
+          "target_response": ["sentence_embedding"]
+        },
+        "model_input": "{ \"text_docs\": ${input_map.text_docs}, \"return_number\": ${model_config.return_number}, \"target_response\": ${model_config.target_response} }",
+        "input_map": [
+          {
+            "text_docs": "passage_text"
+          }
+        ],
+        "output_map": [
+          {
+            "passage_embedding": "$.inference_results.*.output.*.data"
+          }
+        ],
+        "ignore_missing": true,
+        "ignore_failure": true
+      }
+    }
+  ]
+}
+```
+{% include copy-curl.html %}
+
+**Step 2: Run the pipeline**
+
+Run the following query, providing the pipeline name in the request:
+
+```json
+GET /my_index/_search?search_pipeline=ml_inference_pipeline_local
+{
+"query": {
+  "term": {
+    "passage_text": {
+      "value": "hello"
+      }
+    }
+  }
+}
+```
+{% include copy-curl.html %}
+
+#### Response
+
+The response confirms that the processor has generated text embeddings in the `passage_embedding` field:
+
+```json
+{
+  "took": 288,
+  "timed_out": false,
+  "_shards": {
+    "total": 1,
+    "successful": 1,
+    "skipped": 0,
+    "failed": 0
+  },
+  "hits": {
+    "total": {
+      "value": 1,
+      "relation": "eq"
+    },
+    "max_score": 0.00009405752,
+    "hits": [
+      {
+        "_index": "my_index",
+        "_id": "1",
+        "_score": 0.00009405752,
+        "_source": {
+          "passage_text": "hello world",
+          "passage_embedding": [
+            0.017304314,
+            -0.021530833,
+            0.050184276,
+            0.08962978,
+            ...]
+        }
+      }
+    ]
+  }
+}
+```
\ No newline at end of file
diff --git a/_search-plugins/search-pipelines/neural-query-enricher.md b/_search-plugins/search-pipelines/neural-query-enricher.md
index e187ea17a9..683eaa7b85 100644
--- a/_search-plugins/search-pipelines/neural-query-enricher.md
+++ b/_search-plugins/search-pipelines/neural-query-enricher.md
@@ -1,13 +1,15 @@
 ---
 layout: default
 title: Neural query enricher
-nav_order: 12
+nav_order: 50
 has_children: false
 parent: Search processors
 grand_parent: Search pipelines
 ---
 
 # Neural query enricher processor
+Introduced 2.11
+{: .label .label-purple }
 
 The `neural_query_enricher` search request processor is designed to set a default machine learning (ML) model ID at the index or field level for [neural search]({{site.url}}{{site.baseurl}}/search-plugins/neural-search/) queries. To learn more about ML models, see [Using ML models within OpenSearch]({{site.url}}{{site.baseurl}}/ml-commons-plugin/using-ml-models/) and [Connecting to remote models]({{site.url}}{{site.baseurl}}/ml-commons-plugin/remote-models/index/).
 
diff --git a/_search-plugins/search-pipelines/neural-sparse-query-two-phase-processor.md b/_search-plugins/search-pipelines/neural-sparse-query-two-phase-processor.md
index de36225a99..3ba1e21405 100644
--- a/_search-plugins/search-pipelines/neural-sparse-query-two-phase-processor.md
+++ b/_search-plugins/search-pipelines/neural-sparse-query-two-phase-processor.md
@@ -1,7 +1,7 @@
 ---
 layout: default
 title: Neural sparse query two-phase
-nav_order: 13
+nav_order: 60
 parent: Search processors
 grand_parent: Search pipelines
 ---
diff --git a/_search-plugins/search-pipelines/normalization-processor.md b/_search-plugins/search-pipelines/normalization-processor.md
index a8fad2e40d..ac29b079f1 100644
--- a/_search-plugins/search-pipelines/normalization-processor.md
+++ b/_search-plugins/search-pipelines/normalization-processor.md
@@ -1,13 +1,15 @@
 ---
 layout: default
 title: Normalization
-nav_order: 15
+nav_order: 70
 has_children: false
 parent: Search processors
 grand_parent: Search pipelines
 ---
 
 # Normalization processor
+Introduced 2.10
+{: .label .label-purple }
 
 The `normalization-processor` is a search phase results processor that runs between the query and fetch phases of search execution. It intercepts the query phase results and then normalizes and combines the document scores from different query clauses before passing the documents to the fetch phase.
 
diff --git a/_search-plugins/search-pipelines/oversample-processor.md b/_search-plugins/search-pipelines/oversample-processor.md
index 698d9572cf..81f4252f3d 100644
--- a/_search-plugins/search-pipelines/oversample-processor.md
+++ b/_search-plugins/search-pipelines/oversample-processor.md
@@ -1,13 +1,15 @@
 ---
 layout: default
 title: Oversample
-nav_order: 17
+nav_order: 80
 has_children: false
 parent: Search processors
 grand_parent: Search pipelines
 ---
 
 # Oversample processor
+Introduced 2.12
+{: .label .label-purple }
 
 The `oversample` request processor multiplies the `size` parameter of the search request by a specified `sample_factor` (>= 1.0), saving the original value in the `original_size` pipeline variable. The `oversample` processor is designed to work with the [`truncate_hits` response processor]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/truncate-hits-processor/) but may be used on its own.
 
diff --git a/_search-plugins/search-pipelines/personalize-search-ranking.md b/_search-plugins/search-pipelines/personalize-search-ranking.md
index c7a7dd8dde..b63ba4b966 100644
--- a/_search-plugins/search-pipelines/personalize-search-ranking.md
+++ b/_search-plugins/search-pipelines/personalize-search-ranking.md
@@ -8,6 +8,8 @@ grand_parent: Search pipelines
 ---
 
 # Personalize search ranking processor
+Introduced 2.9
+{: .label .label-purple }
 
 The `personalize_search_ranking` search response processor intercepts a search response and uses [Amazon Personalize](https://aws.amazon.com/personalize/) to rerank search results according to their Amazon Personalize ranking. This ranking is based on the user's past behavior and metadata about the search items and the user.
 
diff --git a/_search-plugins/search-pipelines/rag-processor.md b/_search-plugins/search-pipelines/rag-processor.md
index 7137134aff..60257ebd05 100644
--- a/_search-plugins/search-pipelines/rag-processor.md
+++ b/_search-plugins/search-pipelines/rag-processor.md
@@ -1,13 +1,15 @@
 ---
 layout: default
 title: Retrieval-augmented generation
-nav_order: 18
+nav_order: 90
 has_children: false
 parent: Search processors
 grand_parent: Search pipelines
 ---
 
 # Retrieval-augmented generation processor
+Introduced 2.12
+{: .label .label-purple }
 
 The `retrieval_augmented_generation` processor is a search results processor that you can use in [conversational search]({{site.url}}{{site.baseurl}}/search-plugins/conversational-search/) for retrieval-augmented generation (RAG). The processor intercepts query results, retrieves previous messages from the conversation from the conversational memory, and sends a prompt to a large language model (LLM). After the processor receives a response from the LLM, it saves the response in conversational memory and returns both the original OpenSearch query results and the LLM response.
 
diff --git a/_search-plugins/search-pipelines/rename-field-processor.md b/_search-plugins/search-pipelines/rename-field-processor.md
index cb01125df5..9c734af656 100644
--- a/_search-plugins/search-pipelines/rename-field-processor.md
+++ b/_search-plugins/search-pipelines/rename-field-processor.md
@@ -1,13 +1,15 @@
 ---
 layout: default
 title: Rename field
-nav_order: 20
+nav_order: 100
 has_children: false
 parent: Search processors
 grand_parent: Search pipelines
 ---
 
 # Rename field processor
+Introduced 2.8
+{: .label .label-purple }
 
 The `rename_field` search response processor intercepts a search response and renames the specified field. This is useful when your index and your application use different names for the same field. For example, if you rename a field in your index, the `rename_field` processor can change the new name to the old one before sending the response to your application.
 
diff --git a/_search-plugins/search-pipelines/rerank-processor.md b/_search-plugins/search-pipelines/rerank-processor.md
index 73bacd35c9..313ae5f74d 100644
--- a/_search-plugins/search-pipelines/rerank-processor.md
+++ b/_search-plugins/search-pipelines/rerank-processor.md
@@ -1,13 +1,15 @@
 ---
 layout: default
 title: Rerank
-nav_order: 25
+nav_order: 110
 has_children: false
 parent: Search processors
 grand_parent: Search pipelines
 ---
 
 # Rerank processor
+Introduced 2.12
+{: .label .label-purple }
 
 The `rerank` search request processor intercepts search results and passes them to a cross-encoder model to be reranked. The model reranks the results, taking into account the scoring context. Then the processor orders documents in the search results based on their new scores.
 
diff --git a/_search-plugins/search-pipelines/script-processor.md b/_search-plugins/search-pipelines/script-processor.md
index e1e629e398..1fd1d08e57 100644
--- a/_search-plugins/search-pipelines/script-processor.md
+++ b/_search-plugins/search-pipelines/script-processor.md
@@ -1,13 +1,15 @@
 ---
 layout: default
 title: Script
-nav_order: 30
+nav_order: 120
 has_children: false
 parent: Search processors
 grand_parent: Search pipelines
 ---
 
 # Script processor
+Introduced 2.8
+{: .label .label-purple }
 
 The `script` search request processor intercepts a search request and adds an inline Painless script that is run on incoming requests. The script can only run on the following request fields:
 
diff --git a/_search-plugins/search-pipelines/search-processors.md b/_search-plugins/search-pipelines/search-processors.md
index ad515cc541..d696859a78 100644
--- a/_search-plugins/search-pipelines/search-processors.md
+++ b/_search-plugins/search-pipelines/search-processors.md
@@ -24,10 +24,11 @@ The following table lists all supported search request processors.
 Processor | Description | Earliest available version
 :--- | :--- | :---
 [`filter_query`]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/filter-query-processor/) | Adds a filtering query that is used to filter requests. | 2.8
-[`neural_query_enricher`]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/neural-query-enricher/) | Sets a default model for neural search and neural sparse search at the index or field level. | 2.11(neural), 2.13(neural sparse)
-[`script`]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/script-processor/) | Adds a script that is run on newly indexed documents. | 2.8
+[`ml_inference`]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/ml-inference-search-request/) | Invokes registered machine learning (ML) models in order to rewrite queries. | 2.16 
+[`neural_query_enricher`]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/neural-query-enricher/) | Sets a default model for neural search and neural sparse search at the index or field level. | 2.11 (neural), 2.13 (neural sparse)
+[`neural_sparse_two_phase_processor`]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/neural-sparse-query-two-phase-processor/) | Accelerates the neural sparse query. | 2.15
 [`oversample`]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/oversample-processor/) | Increases the search request `size` parameter, storing the original value in the pipeline state.  | 2.12
-
+[`script`]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/script-processor/) | Adds a script that is run on newly indexed documents. | 2.8
 
 ## Search response processors
 
@@ -38,6 +39,7 @@ The following table lists all supported search response processors.
 Processor | Description | Earliest available version
 :--- | :--- | :---
 [`collapse`]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/collapse-processor/)| Deduplicates search hits based on a field value, similarly to `collapse` in a search request. | 2.12
+[`ml_inference`]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/ml-inference-search-response/) | Invokes registered machine learning (ML) models in order to incorporate model output as additional search response fields. | 2.16 
 [`personalize_search_ranking`]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/personalize-search-ranking/) | Uses [Amazon Personalize](https://aws.amazon.com/personalize/) to rerank search results (requires setting up the Amazon Personalize service). | 2.9
 [`rename_field`]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/rename-field-processor/)| Renames an existing field. | 2.8
 [`rerank`]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/rerank-processor/)| Reranks search results using a cross-encoder model. | 2.12
diff --git a/_search-plugins/search-pipelines/sort-processor.md b/_search-plugins/search-pipelines/sort-processor.md
index dde05c1b3a..6df2352c1e 100644
--- a/_search-plugins/search-pipelines/sort-processor.md
+++ b/_search-plugins/search-pipelines/sort-processor.md
@@ -1,13 +1,15 @@
 ---
 layout: default
 title: Sort
-nav_order: 32
+nav_order: 130
 has_children: false
 parent: Search processors
 grand_parent: Search pipelines
 ---
 
 # Sort processor
+Introduced 2.16
+{: .label .label-purple }
 
 The `sort` processor sorts an array of items in either ascending or descending order. Numeric arrays are sorted numerically, while string or mixed arrays (strings and numbers) are sorted lexicographically. The processor throws an error if the input is not an array.
 
diff --git a/_search-plugins/search-pipelines/split-processor.md b/_search-plugins/search-pipelines/split-processor.md
index 6830f81ec3..4afe49e6d2 100644
--- a/_search-plugins/search-pipelines/split-processor.md
+++ b/_search-plugins/search-pipelines/split-processor.md
@@ -1,13 +1,15 @@
 ---
 layout: default
 title: Split
-nav_order: 33
+nav_order: 140
 has_children: false
 parent: Search processors
 grand_parent: Search pipelines
 ---
 
 # Split processor
+Introduced 2.16
+{: .label .label-purple }
 
 The `split` processor splits a string field into an array of substrings based on a specified delimiter.
 
diff --git a/_search-plugins/search-pipelines/truncate-hits-processor.md b/_search-plugins/search-pipelines/truncate-hits-processor.md
index 871879efe3..7bba627734 100644
--- a/_search-plugins/search-pipelines/truncate-hits-processor.md
+++ b/_search-plugins/search-pipelines/truncate-hits-processor.md
@@ -1,13 +1,15 @@
 ---
 layout: default
 title: Truncate hits
-nav_order: 35
+nav_order: 150
 has_children: false
 parent: Search processors
 grand_parent: Search pipelines
 ---
 
 # Truncate hits processor
+Introduced 2.12
+{: .label .label-purple }
 
 The `truncate_hits` response processor discards returned search hits after a given hit count is reached. The `truncate_hits` processor is designed to work with the [`oversample` request processor]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/oversample-processor/) but may be used on its own.