Red Hat Developer Lightspeed (Developer Lightspeed) is a virtual assistant powered by generative AI that offers in-depth insights into Red Hat Developer Hub (RHDH), including its wide range of capabilities. You can interact with this assistant to explore and learn more about RHDH in greater detail.
Developer Lightspeed provides a natural language interface within the RHDH console, helping you easily find information about the product, understand its features, and get answers to your questions as they come up.
Developer Lightspeed for Red Hat Developer Hub is available as a plug-in on all platforms that host RHDH, and it requires the use of Lightspeed-Core Service (LCS) and Llama Stack as sidecar containers.
Important
Developer Lightspeed can be resource intensive on the model side because of the use of RAG and in some cases safety guard. If using the provided Ollama server locally you may encounter unforeseen responses if the model cannot handle the context.
For best results it is recommended to provide your own external LLM provider.
Follow these steps to configure and launch Developer Lightspeed.
-
Load the Developer Lightspeed dynamic plugins
Add the
developer-lightspeed/configs/dynamic-plugins/dynamic-plugins.lightspeed.yamlfile to the list ofincludesin yourconfigs/dynamic-plugins/dynamic-plugins.override.yamlto enable Developer Lightspeed plugins within RHDH.Example:
includes: - dynamic-plugins.default.yaml - developer-lightspeed/configs/dynamic-plugins/dynamic-plugins.lightspeed.yaml # <-- to add to enable the developer lightspeed plugins # Below you can add your own custom dynamic plugins, including local ones. plugins: []
-
Copy the Lightspeed App Config example
Start by creating a new local app config file for Lightspeed:
cp developer-lightspeed/configs/app-config/app-config.lightspeed.local.example.yaml developer-lightspeed/configs/app-config/app-config.lightspeed.local.yaml
-
Understanding the Setup Process
The setup script (
start-lightspeed.sh) uses a simple 2-step process to configure Developer Lightspeed:Step 1: Choose Your LLM Provider
- Ollama (recommended for beginners): Uses the built-in Ollama container that runs locally. No additional configuration needed - the script handles everything automatically.
- Bring Your Own Model: Use your own external LLM provider (any OpenAI API compatible service). You must configure the environment variables for your chosen provider before starting.
Step 2: Choose Safety Guard
- No safety guard: Allows any type of questions - Developer Lightspeed acts as a general-purpose assistant with no safety content filtering.
- With safety guard: Enables Llama Guard for content safety filtering. A local Ollama container is automatically provisioned to run the safety model (
llama-guard3:1bby default). No additional configuration is needed - the safety guard only supports llama-guard model variants.
[!NOTE] What does "Bring Your Own Model" mean?
This option allows you to use an external LLM service instead of the local Ollama container. Developer Lightspeed supports any service that is OpenAI API compatible, including but not limited to:
- vLLM: A high-performance inference server (self-hosted or cloud)
- OpenAI: OpenAI's API (GPT-3.5, GPT-4, etc.)
- Vertex AI: Google Cloud's Vertex AI service (experimental)
- Other compatible services: Azure OpenAI, WatsonX, Amazon Bedrock, Mistral, Nvidia NIM, LM Studio, and other OpenAI API compatible services
Tested Providers: Red Hat has tested and verified the following providers: vllm, ollama, openai, and vertex_ai. For other services, compatibility testing is the responsibility of the user. Red Hat has not performed testing on all OpenAI API compatible services.
Using Other OpenAI API Compatible Services: If you have an OpenAI API compatible endpoint that doesn't have its own provider configuration (like Azure OpenAI, LM Studio, Mistral, etc.), you can use the vLLM provider configuration (see Option A in step 4 below). Simply set
ENABLE_VLLM=trueand configureVLLM_URLto point to your service's endpoint.If you select "Bring Your Own Model" in step 1, you must configure at least one provider in step 4 below.
-
Set Environment Variables
[!NOTE] If you intend to use any environment variables in the Lightspeed Core configuration file, lightspeed-stack.yaml, it is important to note that Lightspeed Core parses environment variables differently than what is typical. Environment variables for this file must be in the form:
${env.VAR}${env.VAR:=default-value}${env.VAR:+value}In the root of this repository there is a
default.envfile. Copy its contents to.envand fill in the required values:cp default.env .env
[!IMPORTANT] Configuration Requirements:
- If you selected Ollama in step 1: You don't need to modify any environment variables. The defaults work out of the box.
- If you selected Ollama + "With safety guard": No configuration needed! The setup automatically pulls
llama-guard3:1band configures the safety URL. - If you selected "Bring Your Own Model" in step 1: You must configure at least one external LLM provider below before starting the application.
- If you selected "Bring Your Own Model" + "With safety guard": No safety configuration needed! A dedicated local Ollama container (
safety-ollama) is automatically provisioned to runllama-guard3:1b. You only need to configure your external LLM provider.
Provider Safety Guard Required Env Vars Compose Files Used Ollama No safety guard None (defaults work) compose.yaml+developer-lightspeed/compose-with-ollama.yamlOllama With safety guard None (auto-configured) compose.yaml+developer-lightspeed/compose-with-ollama.yaml+developer-lightspeed/compose-with-safety-guard-ollama.yamlBring your own model No safety guard At least one: vLLM, OpenAI, or Vertex AI compose.yaml+developer-lightspeed/compose.yamlBring your own model With safety guard At least one: vLLM, OpenAI, or Vertex AI compose.yaml+developer-lightspeed/compose.yaml+developer-lightspeed/compose-with-safety-guard.yaml
If you selected "Bring Your Own Model" in step 1, configure at least one of the following providers in your
.envfile:Use vLLM for high-performance inference with self-hosted or cloud-based vLLM servers. This provider configuration also works with any OpenAI API compatible service (Azure OpenAI, LM Studio, Mistral, Nvidia NIM, etc.) that provides an OpenAI-compatible endpoint.
# Enable vLLM provider (or generic OpenAI API compatible endpoint) ENABLE_VLLM=true # REQUIRED: URL to your server (must end with /v1) # Examples: # - vLLM server: https://your-vllm-server.com/v1 # - Azure OpenAI: https://your-resource.openai.azure.com/v1 # - LM Studio: http://localhost:1234/v1 # - Any OpenAI-compatible endpoint VLLM_URL=https://your-server.com/v1 # REQUIRED: API key for authentication (if your server requires it) # For Azure OpenAI, use your Azure API key # For LM Studio or local servers, you can use any value or leave as default VLLM_API_KEY=your-api-key-here # OPTIONAL: Maximum tokens per request (default: 4096) # VLLM_MAX_TOKENS=4096 # OPTIONAL: TLS verification (default: true) # Set to false for local servers with self-signed certificates # VLLM_TLS_VERIFY=true
[!TIP] Using Other OpenAI API Compatible Services:
If you have an OpenAI API compatible endpoint that doesn't have its own provider configuration (like Azure OpenAI, LM Studio, Mistral, Nvidia NIM, etc.), you can use the vLLM provider configuration above. Simply:
- Set
ENABLE_VLLM=true - Set
VLLM_URLto your service's endpoint (must end with/v1) - Set
VLLM_API_KEYto your service's API key (if required)
The
remote::vllmprovider type accepts any OpenAI API compatible endpoint, not just vLLM servers.Use OpenAI's API to access GPT models (GPT-3.5, GPT-4, etc.).
# Enable OpenAI provider ENABLE_OPENAI=true # REQUIRED: Your OpenAI API key OPENAI_API_KEY=sk-your-openai-api-key-here
Use Google Cloud's Vertex AI service to run Gemini models.
[!WARNING] Experimental Feature: Using Vertex AI to run Google models is experimental. Vertex AI provides an OpenAI-compatible API for Gemini models, which is why it can work with Developer Lightspeed (which supports OpenAI API implementations). This is provided as an alternative way to access Google models since
remote:geminiis not yet fully supported.# Enable Vertex AI provider ENABLE_VERTEX_AI=true # REQUIRED: Absolute path to your Google Cloud credentials JSON file VERTEX_AI_CREDENTIALS_PATH=/absolute/path/to/your/google-cloud-credentials.json # REQUIRED: Your GCP project ID VERTEX_AI_PROJECT=your-gcp-project-id # OPTIONAL: GCP location/region (default: us-central1) # VERTEX_AI_LOCATION=us-central1
[!NOTE] To use Vertex AI, you need:
- A Google Cloud Platform (GCP) project with Vertex AI API enabled
- A service account with appropriate permissions
- A service account key file (JSON) downloaded from GCP
- Set
VERTEX_AI_PROJECTto your project ID - Set
VERTEX_AI_CREDENTIALS_PATHto the absolute path of your credentials JSON file
Read more: Vertex AI Provider Documentation
[!TIP] Safety guard is auto-configured for all provider combinations. No manual safety configuration is needed regardless of whether you use Ollama or an external provider.
When you select "With safety guard", the setup automatically provisions a local Ollama container to run the safety model:
- Ollama provider: The safety model (
llama-guard3:1b) is pulled into the same Ollama container used for inference. - Bring Your Own Model: A dedicated
safety-ollamacontainer is spun up solely for the safety model. Your external provider handles inference while the local Ollama handles safety filtering.
[!IMPORTANT] The safety provider uses
inline::llama-guard, which meansSAFETY_MODELmust be a llama-guard variant (e.g.llama-guard3:1b). Other model families will not work for safety filtering.[!NOTE] Safety model latency: The default safety model is now
llama-guard3:1b. Depending on your local CPU/GPU setup and container runtime, safety checks may still add noticeable latency, and in some environments may be slower than expected.# OPTIONAL: Override the default safety guard model (default: llama-guard3:1b) # Must be a llama-guard variant SAFETY_MODEL=llama-guard3:1b
[!WARNING] Performance impact: Running the llama-guard model locally can increase response latency, especially on machines with limited CPU, memory, or GPU resources. Every user message is evaluated by the safety model before being forwarded to the inference model, so constrained environments may experience noticeably slower responses. If you encounter this, consider increasing the memory allocated to your container runtime (see Running Larger Models with Ollama) or running without the guard.
-
Start the application
To start the Developer Lightspeed interactive setup script, run the following from the root of the repository:
bash ./developer-lightspeed/scripts/start-lightspeed.sh
The script will guide you through a simple 3-step process:
- Choose your LLM provider (Ollama or Bring your own model)
- Choose safety guard (No safety guard or With safety guard)
- Container runtime detection (auto-detects podman or docker, prompts only if detection fails or to override)
[!IMPORTANT] Before starting:
- Ollama provider: No configuration needed - works out of the box (including with safety guard)
- Bring Your Own Model: Ensure you've configured at least one external LLM provider in your
.envfile (see step 4 above) - Bring Your Own Model + Safety Guard: Only your external LLM provider needs configuration. The safety guard model is automatically provisioned locally via a dedicated
safety-ollamacontainer
-
Verify that all services are running
After starting the application, make sure all services are running:
podman compose ps # OR docker compose psLook for all services to show
runningorUp (starting)in the Status column, like:You should see output similar to:
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 31c3c681b742 quay.io/rhdh-community/rhdh:next 16 seconds ago Exited (0) 5 seconds ago 8080/tcp rhdh-plugins-installer f7b74b9f241e quay.io/rhdh-community/rhdh:next 4 seconds ago Up 5 seconds (starting) 0.0.0.0:7007->7007/tcp, 127.0.0.1:9229->9229/tcp, 8080/tcp rhdh 818ddf7fd045 docker.io/ollama/ollama:latest ollama serve & ... 16 seconds ago Up 16 seconds (healthy) 0.0.0.0:7007->7007/tcp, 127.0.0.1:9229->9229/tcp, 11434/tcp ollama 2860fc13b036 quay.io/lightspeed-core/lightspeed-stack:0.4.0 python3.11 runner... 15 seconds ago Up 5 seconds (starting) 0.0.0.0:7007->7007/tcp, 127.0.0.1:9229->9229/tcp, 8080/tcp, 8443/tcp lightspeed-core-service 1572ghe259c0 quay.io/redhat-ai-dev/llama-stack:0.1.4 3 minutes ago Up 3 minutes (healthy) 7007/tcp, 127.0.0.1:9229->9229/tcp llama-stack
You should see output similar to:
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 31c3c681b742 quay.io/rhdh-community/rhdh:next 16 seconds ago Exited (0) 5 seconds ago 8080/tcp rhdh-plugins-installer f7b74b9f241e quay.io/rhdh-community/rhdh:next 4 seconds ago Up 5 seconds (starting) 0.0.0.0:7007->7007/tcp, 127.0.0.1:9229->9229/tcp, 8080/tcp rhdh 2860fc13b036 quay.io/lightspeed-core/lightspeed-stack:0.4.0 python3.11 runner... 15 seconds ago Up 5 seconds (starting) 0.0.0.0:7007->7007/tcp, 127.0.0.1:9229->9229/tcp, 8080/tcp, 8443/tcp lightspeed-core-service 1572ghe259c0 quay.io/redhat-ai-dev/llama-stack:0.1.4 3 minutes ago Up 3 minutes (healthy) 7007/tcp, 127.0.0.1:9229->9229/tcp llama-stack
Note: If any service is not running, you can inspect the logs:
podman logs <container-name>
-
Open http://localhost:7007/lightspeed in your browser to access Developer Lightspeed.
The easiest way to stop Developer Lightspeed is using the interactive stop script:
bash ./developer-lightspeed/scripts/stop-lightspeed.shWith volumes removal (non-interactive):
bash ./developer-lightspeed/scripts/stop-lightspeed.sh -v
# or
bash ./developer-lightspeed/scripts/stop-lightspeed.sh --volumesThe script will:
- Auto-detect your container runtime (podman or docker)
- Detect which configuration is running
- Without
-vflag: Stops containers only (preserves volumes for faster restart) - With
-vflag: Stops containers and removes volumes (complete cleanup)
If you prefer to stop containers manually, use the commands below based on your setup:
podman compose -f compose.yaml -f developer-lightspeed/compose-with-ollama.yaml down -vpodman compose -f compose.yaml -f developer-lightspeed/compose-with-ollama.yaml -f developer-lightspeed/compose-with-safety-guard-ollama.yaml down -vpodman compose -f compose.yaml -f developer-lightspeed/compose.yaml down -vpodman compose -f compose.yaml -f developer-lightspeed/compose.yaml -f developer-lightspeed/compose-with-safety-guard.yaml down -vNote: All instructions in this guide apply to both Podman and Docker.
Replacepodman composewithdocker composeif you are using Docker.
If you encounter issues while setting up or running Developer Lightspeed, try the following solutions:
-
Check container logs:
Use the following command to view logs for a specific container:podman logs <container-name> # OR docker logs <container-name>
Look for error messages that can help diagnose the problem.
-
Common causes:
- Port conflicts (another service is using the same port)
- Insufficient memory or CPU resources
- Incorrect environment variables
- Increase the memory allocated to your Podman or Docker virtual machine.
See the Running Larger Models with Ollama section for instructions.
- Ensure you have the necessary permissions to access files and directories, especially when mounting volumes.
- On Linux/macOS, you may need to adjust permissions with
chmodor run commands withsudo.
4. Web UI Not Accessible at http://localhost:7007/lightspeed
- Make sure all containers are running:
podman compose ps # OR docker compose ps - Check for firewall or VPN issues that may block access to localhost ports.
- Double-check that your
.envordefault.envfiles are present and correctly configured. - Restart the containers after making changes to environment files.
If you selected "Bring Your Own Model" in step 1 but the external LLM provider isn't working:
- Verify provider is enabled: Check that
ENABLE_VLLM=true,ENABLE_OPENAI=true, orENABLE_VERTEX_AI=trueis set in your.envfile - Check required variables: Ensure all required variables for your chosen provider are set (see step 4 above)
- Verify connectivity: For vLLM, ensure the
VLLM_URLis accessible from the container - Check logs: Review
llama-stackcontainer logs for provider connection errors:podman logs llama-stack # OR docker logs llama-stack - Validate API keys: Ensure API keys are correct and have proper permissions
- Try stopping and removing all containers, then starting again, see cleanup.
If your issue persists, please open an issue on GitHub with details about your problem so we can help you troubleshoot.
Available configuration options:
lightspeed:
# OPTIONAL: Custom users prompts displayed to users
# If not provided, the plugin uses built-in default prompts
prompts:
- title: <prompt_title> # REQUIRED: Display title for the prompt
message: <prompt_message> # REQUIRED: The actual prompt text/question
# OPTIONAL: Backend-only configurations
servicePort: 8080 # OPTIONAL: Port for lightspeed service (default: 8080)
systemPrompt: <custom_system_prompt> # OPTIONAL: Override default RHDH system prompt| Field | Type | Required | Default | Description |
|---|---|---|---|---|
prompts |
Array | ❌ No | Built-in prompts | Custom welcome prompts for users |
prompts[].title |
String | ✅ Yes* | - | Display title for the prompt (*required if prompts array is provided) |
prompts[].message |
String | ✅ Yes* | - | The actual prompt text/question (*required if prompts array is provided) |
servicePort |
Number | ❌ No | 8080 |
Port for lightspeed backend service |
systemPrompt |
String | ❌ No | RHDH default | Custom system prompt to override default behavior |
lightspeed:
prompts:
- title: "Quick Start"
message: "How do I enable a dynamic plugin?"
servicePort: 8080
systemPrompt: "You are a helpful assistant focused on Red Hat Developer Hub development."Some AI models require more memory than the default Podman machine allocation. If you encounter errors such as “model requires more system memory than is available,” you can increase the memory available to your Podman virtual machine:
podman machine stop
podman machine set --memory=8192
podman machine start- The example above sets the memory to 8 GiB (
8192MB). - Adjust the value as needed (e.g.,
--memory=16384for 16 GiB). - Ensure your host system has enough free RAM.
After increasing the memory, restart your containers to use the new limits.
By default, the Ollama service pulls and loads the llama3.2:1b model.
To use a larger or different model, you can now specify the model name using the OLLAMA_MODEL environment variable. The Compose file supports a default value using the ${OLLAMA_MODEL:-llama3.2:1b} syntax.
Example in your Compose file:
command: >
"ollama serve &
sleep 5 &&
ollama pull ${OLLAMA_MODEL:-llama3.2:1b} &&
touch /tmp/ready &&
wait"- If you set
OLLAMA_MODELin your.envfile or environment, that model will be used. - If not set, it will default to
llama3.2:1b. - Example
.enventry:OLLAMA_MODEL=llama2:13b
- Make sure the model you choose fits within your available memory.
Tip: You can find available models and their memory requirements in the Ollama model library.
If you have custom or pre-downloaded Ollama models on your local system, you can make them available to the Ollama container by mounting your host’s model directory into the container.
By default, Ollama stores models in:
- Linux/macOS:
~/.ollama - Windows:
%USERPROFILE%\.ollama
You can set the path to your local .ollama directory using an environment variable in your .env file:
In your .env file:
OLLAMA_MODELS_PATH=/absolute/path/to/your/.ollamaIn your Compose file (already set up):
volumes:
- ${OLLAMA_MODELS_PATH:-ollama_data}:/root/.ollama- If you set
OLLAMA_MODELS_PATHin your.envfile, that directory will be mounted. - If not set, it will default to using the
ollama_datavolume.
Once mounted, you can reference your model in the ollama pull command in the developer-lightspeed/compose-with-ollama.yaml.
Ollama will use the models from the mounted directory, so you don’t need to re-download them inside the container.
Tip: If you add new models to your local
.ollamadirectory, they will automatically be available in the container after a restart.
This approach saves bandwidth, speeds up startup, and lets you use custom or fine-tuned models you've created locally.
