Skip to content

Latest commit

 

History

History
561 lines (391 loc) · 26.2 KB

File metadata and controls

561 lines (391 loc) · 26.2 KB

Developer Lightspeed in RHDH local

Red Hat Developer Lightspeed (Developer Lightspeed) is a virtual assistant powered by generative AI that offers in-depth insights into Red Hat Developer Hub (RHDH), including its wide range of capabilities. You can interact with this assistant to explore and learn more about RHDH in greater detail.

Developer Lightspeed provides a natural language interface within the RHDH console, helping you easily find information about the product, understand its features, and get answers to your questions as they come up.

Supported Architecture

Developer Lightspeed for Red Hat Developer Hub is available as a plug-in on all platforms that host RHDH, and it requires the use of Lightspeed-Core Service (LCS) and Llama Stack as sidecar containers.

Table of Contents

  1. Getting Started
  2. Cleanup
  3. Troubleshooting
  4. Advanced Configuration Guides

Getting Started

Important

Developer Lightspeed can be resource intensive on the model side because of the use of RAG and in some cases safety guard. If using the provided Ollama server locally you may encounter unforeseen responses if the model cannot handle the context.

For best results it is recommended to provide your own external LLM provider.

Follow these steps to configure and launch Developer Lightspeed.


  1. Load the Developer Lightspeed dynamic plugins

    Add the developer-lightspeed/configs/dynamic-plugins/dynamic-plugins.lightspeed.yaml file to the list of includes in your configs/dynamic-plugins/dynamic-plugins.override.yaml to enable Developer Lightspeed plugins within RHDH.

    Example:

    includes:
    - dynamic-plugins.default.yaml
    - developer-lightspeed/configs/dynamic-plugins/dynamic-plugins.lightspeed.yaml # <-- to add to enable the developer lightspeed plugins
    
    # Below you can add your own custom dynamic plugins, including local ones.
    plugins: []
  2. Copy the Lightspeed App Config example

    Start by creating a new local app config file for Lightspeed:

    cp developer-lightspeed/configs/app-config/app-config.lightspeed.local.example.yaml developer-lightspeed/configs/app-config/app-config.lightspeed.local.yaml
  3. Understanding the Setup Process

    The setup script (start-lightspeed.sh) uses a simple 2-step process to configure Developer Lightspeed:

    Step 1: Choose Your LLM Provider

    • Ollama (recommended for beginners): Uses the built-in Ollama container that runs locally. No additional configuration needed - the script handles everything automatically.
    • Bring Your Own Model: Use your own external LLM provider (any OpenAI API compatible service). You must configure the environment variables for your chosen provider before starting.

    Step 2: Choose Safety Guard

    • No safety guard: Allows any type of questions - Developer Lightspeed acts as a general-purpose assistant with no safety content filtering.
    • With safety guard: Enables Llama Guard for content safety filtering. A local Ollama container is automatically provisioned to run the safety model (llama-guard3:1b by default). No additional configuration is needed - the safety guard only supports llama-guard model variants.

    [!NOTE] What does "Bring Your Own Model" mean?

    This option allows you to use an external LLM service instead of the local Ollama container. Developer Lightspeed supports any service that is OpenAI API compatible, including but not limited to:

    • vLLM: A high-performance inference server (self-hosted or cloud)
    • OpenAI: OpenAI's API (GPT-3.5, GPT-4, etc.)
    • Vertex AI: Google Cloud's Vertex AI service (experimental)
    • Other compatible services: Azure OpenAI, WatsonX, Amazon Bedrock, Mistral, Nvidia NIM, LM Studio, and other OpenAI API compatible services

    Tested Providers: Red Hat has tested and verified the following providers: vllm, ollama, openai, and vertex_ai. For other services, compatibility testing is the responsibility of the user. Red Hat has not performed testing on all OpenAI API compatible services.

    Using Other OpenAI API Compatible Services: If you have an OpenAI API compatible endpoint that doesn't have its own provider configuration (like Azure OpenAI, LM Studio, Mistral, etc.), you can use the vLLM provider configuration (see Option A in step 4 below). Simply set ENABLE_VLLM=true and configure VLLM_URL to point to your service's endpoint.

    If you select "Bring Your Own Model" in step 1, you must configure at least one provider in step 4 below.

  4. Set Environment Variables

    [!NOTE] If you intend to use any environment variables in the Lightspeed Core configuration file, lightspeed-stack.yaml, it is important to note that Lightspeed Core parses environment variables differently than what is typical. Environment variables for this file must be in the form:

    ${env.VAR}

    ${env.VAR:=default-value}

    ${env.VAR:+value}

    In the root of this repository there is a default.env file. Copy its contents to .env and fill in the required values:

    cp default.env .env

    [!IMPORTANT] Configuration Requirements:

    • If you selected Ollama in step 1: You don't need to modify any environment variables. The defaults work out of the box.
    • If you selected Ollama + "With safety guard": No configuration needed! The setup automatically pulls llama-guard3:1b and configures the safety URL.
    • If you selected "Bring Your Own Model" in step 1: You must configure at least one external LLM provider below before starting the application.
    • If you selected "Bring Your Own Model" + "With safety guard": No safety configuration needed! A dedicated local Ollama container (safety-ollama) is automatically provisioned to run llama-guard3:1b. You only need to configure your external LLM provider.

    Quick Reference: Setup Combinations → Required Configuration

    Provider Safety Guard Required Env Vars Compose Files Used
    Ollama No safety guard None (defaults work) compose.yaml + developer-lightspeed/compose-with-ollama.yaml
    Ollama With safety guard None (auto-configured) compose.yaml + developer-lightspeed/compose-with-ollama.yaml + developer-lightspeed/compose-with-safety-guard-ollama.yaml
    Bring your own model No safety guard At least one: vLLM, OpenAI, or Vertex AI compose.yaml + developer-lightspeed/compose.yaml
    Bring your own model With safety guard At least one: vLLM, OpenAI, or Vertex AI compose.yaml + developer-lightspeed/compose.yaml + developer-lightspeed/compose-with-safety-guard.yaml

    Configure External LLM Providers (Required if you selected "Bring Your Own Model")

    If you selected "Bring Your Own Model" in step 1, configure at least one of the following providers in your .env file:

    Option A: vLLM Provider (or Any OpenAI API Compatible Endpoint)

    Use vLLM for high-performance inference with self-hosted or cloud-based vLLM servers. This provider configuration also works with any OpenAI API compatible service (Azure OpenAI, LM Studio, Mistral, Nvidia NIM, etc.) that provides an OpenAI-compatible endpoint.

    # Enable vLLM provider (or generic OpenAI API compatible endpoint)
    ENABLE_VLLM=true
    
    # REQUIRED: URL to your server (must end with /v1)
    # Examples:
    #   - vLLM server: https://your-vllm-server.com/v1
    #   - Azure OpenAI: https://your-resource.openai.azure.com/v1
    #   - LM Studio: http://localhost:1234/v1
    #   - Any OpenAI-compatible endpoint
    VLLM_URL=https://your-server.com/v1
    
    # REQUIRED: API key for authentication (if your server requires it)
    # For Azure OpenAI, use your Azure API key
    # For LM Studio or local servers, you can use any value or leave as default
    VLLM_API_KEY=your-api-key-here
    
    # OPTIONAL: Maximum tokens per request (default: 4096)
    # VLLM_MAX_TOKENS=4096
    
    # OPTIONAL: TLS verification (default: true)
    # Set to false for local servers with self-signed certificates
    # VLLM_TLS_VERIFY=true

    [!TIP] Using Other OpenAI API Compatible Services:

    If you have an OpenAI API compatible endpoint that doesn't have its own provider configuration (like Azure OpenAI, LM Studio, Mistral, Nvidia NIM, etc.), you can use the vLLM provider configuration above. Simply:

    1. Set ENABLE_VLLM=true
    2. Set VLLM_URL to your service's endpoint (must end with /v1)
    3. Set VLLM_API_KEY to your service's API key (if required)

    The remote::vllm provider type accepts any OpenAI API compatible endpoint, not just vLLM servers.

    Option B: OpenAI Provider

    Use OpenAI's API to access GPT models (GPT-3.5, GPT-4, etc.).

    # Enable OpenAI provider
    ENABLE_OPENAI=true
    
    # REQUIRED: Your OpenAI API key
    OPENAI_API_KEY=sk-your-openai-api-key-here

    Option C: Vertex AI Provider (Experimental)

    Use Google Cloud's Vertex AI service to run Gemini models.

    [!WARNING] Experimental Feature: Using Vertex AI to run Google models is experimental. Vertex AI provides an OpenAI-compatible API for Gemini models, which is why it can work with Developer Lightspeed (which supports OpenAI API implementations). This is provided as an alternative way to access Google models since remote:gemini is not yet fully supported.

    # Enable Vertex AI provider
    ENABLE_VERTEX_AI=true
    
    # REQUIRED: Absolute path to your Google Cloud credentials JSON file
    VERTEX_AI_CREDENTIALS_PATH=/absolute/path/to/your/google-cloud-credentials.json
    
    # REQUIRED: Your GCP project ID
    VERTEX_AI_PROJECT=your-gcp-project-id
    
    # OPTIONAL: GCP location/region (default: us-central1)
    # VERTEX_AI_LOCATION=us-central1

    [!NOTE] To use Vertex AI, you need:

    1. A Google Cloud Platform (GCP) project with Vertex AI API enabled
    2. A service account with appropriate permissions
    3. A service account key file (JSON) downloaded from GCP
    4. Set VERTEX_AI_PROJECT to your project ID
    5. Set VERTEX_AI_CREDENTIALS_PATH to the absolute path of your credentials JSON file

    Read more: Vertex AI Provider Documentation


    Safety Guard Configuration

    [!TIP] Safety guard is auto-configured for all provider combinations. No manual safety configuration is needed regardless of whether you use Ollama or an external provider.

    When you select "With safety guard", the setup automatically provisions a local Ollama container to run the safety model:

    • Ollama provider: The safety model (llama-guard3:1b) is pulled into the same Ollama container used for inference.
    • Bring Your Own Model: A dedicated safety-ollama container is spun up solely for the safety model. Your external provider handles inference while the local Ollama handles safety filtering.

    [!IMPORTANT] The safety provider uses inline::llama-guard, which means SAFETY_MODEL must be a llama-guard variant (e.g. llama-guard3:1b). Other model families will not work for safety filtering.

    [!NOTE] Safety model latency: The default safety model is now llama-guard3:1b. Depending on your local CPU/GPU setup and container runtime, safety checks may still add noticeable latency, and in some environments may be slower than expected.

    # OPTIONAL: Override the default safety guard model (default: llama-guard3:1b)
    # Must be a llama-guard variant
    SAFETY_MODEL=llama-guard3:1b

    [!WARNING] Performance impact: Running the llama-guard model locally can increase response latency, especially on machines with limited CPU, memory, or GPU resources. Every user message is evaluated by the safety model before being forwarded to the inference model, so constrained environments may experience noticeably slower responses. If you encounter this, consider increasing the memory allocated to your container runtime (see Running Larger Models with Ollama) or running without the guard.

  5. Start the application

    To start the Developer Lightspeed interactive setup script, run the following from the root of the repository:

    bash ./developer-lightspeed/scripts/start-lightspeed.sh

    The script will guide you through a simple 3-step process:

    1. Choose your LLM provider (Ollama or Bring your own model)
    2. Choose safety guard (No safety guard or With safety guard)
    3. Container runtime detection (auto-detects podman or docker, prompts only if detection fails or to override)

    [!IMPORTANT] Before starting:

    • Ollama provider: No configuration needed - works out of the box (including with safety guard)
    • Bring Your Own Model: Ensure you've configured at least one external LLM provider in your .env file (see step 4 above)
    • Bring Your Own Model + Safety Guard: Only your external LLM provider needs configuration. The safety guard model is automatically provisioned locally via a dedicated safety-ollama container

  1. Verify that all services are running

    After starting the application, make sure all services are running:

    podman compose ps
    # OR
    docker compose ps

    Look for all services to show running or Up (starting) in the Status column, like:

    A. Default setup (with Ollama):

    You should see output similar to:

    CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
    31c3c681b742 quay.io/rhdh-community/rhdh:next 16 seconds ago Exited (0) 5 seconds ago 8080/tcp rhdh-plugins-installer
    f7b74b9f241e quay.io/rhdh-community/rhdh:next 4 seconds ago Up 5 seconds (starting) 0.0.0.0:7007->7007/tcp, 127.0.0.1:9229->9229/tcp, 8080/tcp rhdh
    818ddf7fd045 docker.io/ollama/ollama:latest ollama serve & ... 16 seconds ago Up 16 seconds (healthy) 0.0.0.0:7007->7007/tcp, 127.0.0.1:9229->9229/tcp, 11434/tcp ollama
    2860fc13b036 quay.io/lightspeed-core/lightspeed-stack:0.4.0 python3.11 runner... 15 seconds ago Up 5 seconds (starting) 0.0.0.0:7007->7007/tcp, 127.0.0.1:9229->9229/tcp, 8080/tcp, 8443/tcp lightspeed-core-service
    1572ghe259c0 quay.io/redhat-ai-dev/llama-stack:0.1.4 3 minutes ago Up 3 minutes (healthy) 7007/tcp, 127.0.0.1:9229->9229/tcp llama-stack

    B. Minimal setup (own model server, no ollama):

    You should see output similar to:

    CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
    31c3c681b742 quay.io/rhdh-community/rhdh:next 16 seconds ago Exited (0) 5 seconds ago 8080/tcp rhdh-plugins-installer
    f7b74b9f241e quay.io/rhdh-community/rhdh:next 4 seconds ago Up 5 seconds (starting) 0.0.0.0:7007->7007/tcp, 127.0.0.1:9229->9229/tcp, 8080/tcp rhdh
    2860fc13b036 quay.io/lightspeed-core/lightspeed-stack:0.4.0 python3.11 runner... 15 seconds ago Up 5 seconds (starting) 0.0.0.0:7007->7007/tcp, 127.0.0.1:9229->9229/tcp, 8080/tcp, 8443/tcp lightspeed-core-service
    1572ghe259c0 quay.io/redhat-ai-dev/llama-stack:0.1.4 3 minutes ago Up 3 minutes (healthy) 7007/tcp, 127.0.0.1:9229->9229/tcp llama-stack

    Note: If any service is not running, you can inspect the logs:

    podman logs <container-name>
  2. Open http://localhost:7007/lightspeed in your browser to access Developer Lightspeed.

    Developer Lightspeed


Cleanup

Quick Cleanup (Recommended)

The easiest way to stop Developer Lightspeed is using the interactive stop script:

bash ./developer-lightspeed/scripts/stop-lightspeed.sh

With volumes removal (non-interactive):

bash ./developer-lightspeed/scripts/stop-lightspeed.sh -v
# or
bash ./developer-lightspeed/scripts/stop-lightspeed.sh --volumes

The script will:

  • Auto-detect your container runtime (podman or docker)
  • Detect which configuration is running
  • Without -v flag: Stops containers only (preserves volumes for faster restart)
  • With -v flag: Stops containers and removes volumes (complete cleanup)

Manual Cleanup

If you prefer to stop containers manually, use the commands below based on your setup:

A. Default setup (with Ollama)

Without Safety Guard
podman compose -f compose.yaml -f developer-lightspeed/compose-with-ollama.yaml down -v
With Safety Guard
podman compose -f compose.yaml -f developer-lightspeed/compose-with-ollama.yaml -f developer-lightspeed/compose-with-safety-guard-ollama.yaml down -v

B. Minimal setup (own model server, no ollama)

Without Safety Guard
podman compose -f compose.yaml -f developer-lightspeed/compose.yaml down -v
With Safety Guard
podman compose -f compose.yaml -f developer-lightspeed/compose.yaml -f developer-lightspeed/compose-with-safety-guard.yaml down -v

Note: All instructions in this guide apply to both Podman and Docker.
Replace podman compose with docker compose if you are using Docker.

Troubleshooting

If you encounter issues while setting up or running Developer Lightspeed, try the following solutions:

1. Services Not Starting or Exiting Unexpectedly

  • Check container logs:
    Use the following command to view logs for a specific container:

    podman logs <container-name>
    # OR
    docker logs <container-name>

    Look for error messages that can help diagnose the problem.

  • Common causes:

    • Port conflicts (another service is using the same port)
    • Insufficient memory or CPU resources
    • Incorrect environment variables

2. "model requires more system memory than is available" Error

3. "Permission Denied" or File Access Errors

  • Ensure you have the necessary permissions to access files and directories, especially when mounting volumes.
  • On Linux/macOS, you may need to adjust permissions with chmod or run commands with sudo.

4. Web UI Not Accessible at http://localhost:7007/lightspeed

  • Make sure all containers are running:
    podman compose ps
    # OR
    docker compose ps
  • Check for firewall or VPN issues that may block access to localhost ports.

5. Environment Variables Not Set

  • Double-check that your .env or default.env files are present and correctly configured.
  • Restart the containers after making changes to environment files.

6. "Bring Your Own Model" Not Working

If you selected "Bring Your Own Model" in step 1 but the external LLM provider isn't working:

  • Verify provider is enabled: Check that ENABLE_VLLM=true, ENABLE_OPENAI=true, or ENABLE_VERTEX_AI=true is set in your .env file
  • Check required variables: Ensure all required variables for your chosen provider are set (see step 4 above)
  • Verify connectivity: For vLLM, ensure the VLLM_URL is accessible from the container
  • Check logs: Review llama-stack container logs for provider connection errors:
    podman logs llama-stack
    # OR
    docker logs llama-stack
  • Validate API keys: Ensure API keys are correct and have proper permissions

7. Still Stuck?

  • Try stopping and removing all containers, then starting again, see cleanup.

If your issue persists, please open an issue on GitHub with details about your problem so we can help you troubleshoot.

Advanced Configuration Guides

Plugin Configuration Reference

Available configuration options:

lightspeed:
  # OPTIONAL: Custom users prompts displayed to users
  # If not provided, the plugin uses built-in default prompts
  prompts:
    - title: <prompt_title>              # REQUIRED: Display title for the prompt
      message: <prompt_message>          # REQUIRED: The actual prompt text/question
  
  # OPTIONAL: Backend-only configurations
  servicePort: 8080                      # OPTIONAL: Port for lightspeed service (default: 8080)
  systemPrompt: <custom_system_prompt>   # OPTIONAL: Override default RHDH system prompt

Configuration Fields

Field Type Required Default Description
prompts Array ❌ No Built-in prompts Custom welcome prompts for users
prompts[].title String ✅ Yes* - Display title for the prompt (*required if prompts array is provided)
prompts[].message String ✅ Yes* - The actual prompt text/question (*required if prompts array is provided)
servicePort Number ❌ No 8080 Port for lightspeed backend service
systemPrompt String ❌ No RHDH default Custom system prompt to override default behavior

Example Configuration

lightspeed:
  prompts:
    - title: "Quick Start"
      message: "How do I enable a dynamic plugin?"
  servicePort: 8080
  systemPrompt: "You are a helpful assistant focused on Red Hat Developer Hub development."

Running Larger Models with Ollama

Some AI models require more memory than the default Podman machine allocation. If you encounter errors such as “model requires more system memory than is available,” you can increase the memory available to your Podman virtual machine:

podman machine stop
podman machine set --memory=8192
podman machine start
  • The example above sets the memory to 8 GiB (8192 MB).
  • Adjust the value as needed (e.g., --memory=16384 for 16 GiB).
  • Ensure your host system has enough free RAM.

After increasing the memory, restart your containers to use the new limits.


How do I change the Ollama model?

By default, the Ollama service pulls and loads the llama3.2:1b model.
To use a larger or different model, you can now specify the model name using the OLLAMA_MODEL environment variable. The Compose file supports a default value using the ${OLLAMA_MODEL:-llama3.2:1b} syntax.

Example in your Compose file:

command: >
  "ollama serve &
  sleep 5 &&
  ollama pull ${OLLAMA_MODEL:-llama3.2:1b} &&
  touch /tmp/ready &&
  wait"
  • If you set OLLAMA_MODEL in your .env file or environment, that model will be used.
  • If not set, it will default to llama3.2:1b.
  • Example .env entry:
    OLLAMA_MODEL=llama2:13b
  • Make sure the model you choose fits within your available memory.

Tip: You can find available models and their memory requirements in the Ollama model library.


Using Your Own Ollama Models from Your System

If you have custom or pre-downloaded Ollama models on your local system, you can make them available to the Ollama container by mounting your host’s model directory into the container.

Step 1: Locate Your Ollama Model Directory

By default, Ollama stores models in:

  • Linux/macOS: ~/.ollama
  • Windows: %USERPROFILE%\.ollama

Step 2: Mount the Directory in Your Compose File

You can set the path to your local .ollama directory using an environment variable in your .env file:

In your .env file:

OLLAMA_MODELS_PATH=/absolute/path/to/your/.ollama

In your Compose file (already set up):

volumes:
  - ${OLLAMA_MODELS_PATH:-ollama_data}:/root/.ollama
  • If you set OLLAMA_MODELS_PATH in your .env file, that directory will be mounted.
  • If not set, it will default to using the ollama_data volume.

Step 3: Use Your Models in the Container

Once mounted, you can reference your model in the ollama pull command in the developer-lightspeed/compose-with-ollama.yaml.

Ollama will use the models from the mounted directory, so you don’t need to re-download them inside the container.

Tip: If you add new models to your local .ollama directory, they will automatically be available in the container after a restart.


This approach saves bandwidth, speeds up startup, and lets you use custom or fine-tuned models you've created locally.