Incorporated Jordan's and Gabe's comments

pabel-rh · pabel-rh · commit 6576038ed889 · 2026-05-21T20:11:14.000+05:30
diff --git a/assemblies/shared/assembly-appendix-llm-requirements.adoc b/assemblies/shared/assembly-appendix-llm-requirements.adoc
@@ -13,11 +13,11 @@ include::../modules/shared/con-large-language-model-llm-requirements.adoc[levelo
 
 include::../modules/shared/con-openai-model-integration-for-your-deployment.adoc[leveloffset=+1]
 
-include::../modules/shared/con-ollama-model-integration-for-local-development-environments.adoc[leveloffset=+1]
+include::../modules/shared/con-ollama-model-integration-requirements.adoc[leveloffset=+1]
 
 include::../modules/shared/con-vllm-model-integration-for-high-throughput-inference.adoc[leveloffset=+1]
 
-include::../modules/shared/con-vertex-ai-integration-for-scalable-model-deployment.adoc[leveloffset=+1]
+include::../modules/shared/con-vertex-ai-integration-for-gemini-models.adoc[leveloffset=+1]
 
 ifdef::parent-context[:context: {parent-context}]
 ifndef::parent-context[:!context:]
diff --git a/modules/shared/con-large-language-model-llm-requirements.adoc b/modules/shared/con-large-language-model-llm-requirements.adoc
@@ -8,7 +8,9 @@ To plan your {ls-short} deployment, you must determine which compatible large la
 
 {ls-short} operates on a _Bring Your Own Model (BYOM)_ architecture. Because the service does not include a native model, you must connect a compatible inference provider during installation. 
 
-The underlying {lcs-short} service integrates with several platforms that support the OpenAI API specification or utilize the vLLM inference engine. Because there is no explicit {rhoai-brand-name} provider option in the configuration, you must route those deployments through the vLLM or OpenAI-compatible provider settings.
+The underlying {lcs-short} service integrates with platforms that support the OpenAI API specification or utilize the vLLM inference engine. Because there is no explicit {rhoai-brand-name} provider option in the configuration, you must route those deployments through the vLLM or OpenAI-compatible provider settings.
+
+The `vllm` provider type communicates with endpoints that conform to the OpenAI API schema by automatically appending `/v1` to the configured provider URL. This mechanism allows you to use the `vllm` configuration for other hosted, OpenAI-compliant inference providers.
 
 {ls-short} supports the following inference provider configurations:
 
diff --git a/modules/shared/con-ollama-model-integration-for-local-development-environments.adoc b/modules/shared/con-ollama-model-integration-for-local-development-environments.adoc
diff --git a/modules/shared/con-ollama-model-integration-requirements.adoc b/modules/shared/con-ollama-model-integration-requirements.adoc
@@ -0,0 +1,22 @@
+:_mod-docs-content-type: CONCEPT
+
+[id="ollama-model-integration-requirements_{context}"]
+= Ollama model integration requirements
+
+[role="_abstract"]
+To integrate the open-source Ollama framework with {ls-short}, you must ensure that your network topology allows the {ls-short} service to route traffic to the Ollama server endpoint. 
+
+The Ollama server operates as a containerized layer, providing a command-line interface (CLI) to download, manage, and execute open-source models such as Llama 3 and Mistral. You can deploy Ollama on both local workstations and cluster environments.
+
+However, a cluster-deployed {ls-short} instance cannot access an Ollama server that runs exclusively on a workstation `localhost` interface. For cluster deployments, the Ollama server must reside on an externally accessible network perimeter or run directly inside the cluster.
+
+The following integration configurations are supported:
+* Both {ls-short} and Ollama deploy on a local workstation.
+* {ls-short} deploys locally and connects to an externally accessible cluster Ollama server.
+* Both {ls-short} and Ollama deploy inside the cluster infrastructure.
+
+
+.Additional resources
+* link:https://ollama.com[Ollama project website]
+* link:https://hub.docker.com/r/ollama/ollama[Ollama server container image]
+
diff --git a/modules/shared/con-vertex-ai-integration-for-gemini-models.adoc b/modules/shared/con-vertex-ai-integration-for-gemini-models.adoc
@@ -0,0 +1,13 @@
+:_mod-docs-content-type: CONCEPT
+
+[id="vertex-ai-integration-for-gemini-models_{context}"]
+= Vertex AI integration for Gemini models
+
+[role="_abstract"]
+To use Gemini models with {ls-short}, you can configure Google Cloud Vertex AI to act as your managed large language model (LLM) inference provider. 
+
+The underlying {lcs-short} service connects to Vertex AI to access hosted Gemini models. This integration provides {ls-short} with enterprise-grade language processing and chat assistance capabilities without requiring you to maintain a local inference server.
+
+.Additional resources
+* link:https://cloud.google.com/vertex-ai/docs[Vertex AI documentation]
+
diff --git a/modules/shared/con-vertex-ai-integration-for-scalable-model-deployment.adoc b/modules/shared/con-vertex-ai-integration-for-scalable-model-deployment.adoc
diff --git a/modules/shared/proc-configure-by-using-the-operator.adoc b/modules/shared/proc-configure-by-using-the-operator.adoc
@@ -40,7 +40,7 @@ stringData:
   VLLM_API_KEY: "<api_key>"
   ENABLE_VALIDATION: "true"
   VALIDATION_PROVIDER: "vllm"
-  VALIDATION_MODEL_NAME: "gpt-4o-mini"
+  VALIDATION_MODEL_NAME: "llama3.1"
 ----
 
 . Map your secret inside the `extraEnvs` section of the {backstage} CR to complete container provisioning:
diff --git a/modules/shared/proc-customize-chat-history-storage.adoc b/modules/shared/proc-customize-chat-history-storage.adoc
@@ -25,7 +25,7 @@ Storing chat history records user prompts and responses. You must assess data pr
 [source,yaml]
 ----
 conversation_cache:
-  type: postgres
+  type: "postgres"
   postgres:
     host: _<your_database_host>_
     port: _<your_database_port>_
diff --git a/modules/shared/proc-mirror-images-for-air-gapped-environments.adoc b/modules/shared/proc-mirror-images-for-air-gapped-environments.adoc
@@ -14,6 +14,7 @@ You must mirror the following {ls-short} images:
 .Prerequisites
 * You have a target mirror registry accessible to your disconnected cluster.
 * You authenticated to the {rhcr} and your target mirror registry.
+* You updated the cluster install secret (the 'pull-secret' in the 'openshift-config' namespace) to include the authentication credentials to your mirror registry. The `kubelet` requires these credentials to pull the sidecar images when starting up the {product-very-short} pod.
 
 .Procedure
 . Extract or identify the image digests for the {ls-short} sidecar and initialization container images.
diff --git a/modules/shared/snip-lightspeed-secret-keys.adoc b/modules/shared/snip-lightspeed-secret-keys.adoc
@@ -1,5 +1,10 @@
 :_mod-docs-content-type: SNIPPET
 
+[IMPORTANT]
+====
+To disable an inference provider or configuration feature, you must leave the corresponding `ENABLE_*` variable completely unset. Setting an `ENABLE_*` variable to `false` does not disable the component because the underlying system checks only whether the variable is defined.
+====
+
 |===
 | Key | Description