|
| 1 | +# Azure AI Foundry Pipeline - Native OpenWebUI Citations |
| 2 | + |
| 3 | +This document describes the native OpenWebUI citation support in the Azure AI Foundry Pipeline, which enables rich citation cards and source previews in the OpenWebUI frontend. |
| 4 | + |
| 5 | +## Overview |
| 6 | + |
| 7 | +The Azure AI Foundry Pipeline supports **native OpenWebUI citations** for Azure AI Search (RAG) responses. This feature is **automatically enabled** when you configure Azure AI Search data sources (`AZURE_AI_DATA_SOURCES`). The OpenWebUI frontend will display: |
| 8 | + |
| 9 | +- **Citation cards** with source information and relevance scores |
| 10 | +- **Source previews** with content snippets |
| 11 | +- **Relevance percentage** displayed on citation cards (requires `AZURE_AI_INCLUDE_SEARCH_SCORES=true`) |
| 12 | +- **Clickable `[docX]` references** that link directly to document URLs |
| 13 | +- **Interactive citation UI** with expandable source details |
| 14 | + |
| 15 | +## Features |
| 16 | + |
| 17 | +### Automatic Citation Support |
| 18 | + |
| 19 | +When Azure AI Search is configured, the pipeline automatically: |
| 20 | + |
| 21 | +1. Emits citation events via `__event_emitter__` for the OpenWebUI frontend |
| 22 | +2. Converts `[docX]` references in the response to clickable markdown links |
| 23 | +3. Filters citations to only show documents actually referenced in the response |
| 24 | +4. Extracts relevance scores from Azure Search when available |
| 25 | + |
| 26 | +### Configuration Options |
| 27 | + |
| 28 | +| Environment Variable | Default | Description | |
| 29 | +|---------------------|---------|-------------| |
| 30 | +| `AZURE_AI_DATA_SOURCES` | `""` | JSON configuration for Azure AI Search (required for citations) | |
| 31 | +| `AZURE_AI_INCLUDE_SEARCH_SCORES` | `true` | Enable relevance score extraction from Azure Search | |
| 32 | + |
| 33 | +### How It Works |
| 34 | + |
| 35 | +#### Streaming Responses |
| 36 | + |
| 37 | +When Azure AI Search returns citations in a streaming response: |
| 38 | + |
| 39 | +1. The pipeline detects citations in the SSE (Server-Sent Events) stream |
| 40 | +2. `[docX]` references in each chunk are converted to markdown links with document URLs |
| 41 | +3. After the stream ends, citation events are emitted via `__event_emitter__` |
| 42 | +4. Citations are filtered to only include documents referenced in the response |
| 43 | + |
| 44 | +#### Non-Streaming Responses |
| 45 | + |
| 46 | +When Azure AI Search returns citations in a non-streaming response: |
| 47 | + |
| 48 | +1. The pipeline extracts citations from the response context |
| 49 | +2. `[docX]` references in the content are converted to markdown links |
| 50 | +3. Individual citation events are emitted via `__event_emitter__` for each referenced source |
| 51 | + |
| 52 | +## Citation Format |
| 53 | + |
| 54 | +### OpenWebUI Citation Event Structure |
| 55 | + |
| 56 | +Each citation is emitted as a separate event to ensure all sources appear in the UI. Citation events follow the official OpenWebUI specification (see [OpenWebUI Events Documentation](https://docs.openwebui.com/features/plugin/development/events#source-or-citation-and-code-execution)): |
| 57 | + |
| 58 | +```python |
| 59 | +{ |
| 60 | + "type": "citation", |
| 61 | + "data": { |
| 62 | + "document": ["Document content..."], # Content from this citation |
| 63 | + "metadata": [{"source": "https://..."}], # Metadata with source URL |
| 64 | + "source": { |
| 65 | + "name": "[doc1] Document Title", # Unique name with index |
| 66 | + "url": "https://..." # Source URL if available |
| 67 | + }, |
| 68 | + "distances": [0.95] # Relevance score (displayed as percentage) |
| 69 | + } |
| 70 | +} |
| 71 | +``` |
| 72 | + |
| 73 | +Key points: |
| 74 | +- Each source document gets its own citation event |
| 75 | +- The `source.name` includes the doc index (`[doc1]`, `[doc2]`, etc.) to prevent grouping |
| 76 | +- The `distances` array contains relevance scores from Azure AI Search, which OpenWebUI displays as a percentage on the citation cards |
| 77 | + |
| 78 | +### Azure Citation Format (Input) |
| 79 | + |
| 80 | +Azure AI Search returns citations in this format: |
| 81 | + |
| 82 | +```python |
| 83 | +{ |
| 84 | + "title": "Document Title", |
| 85 | + "content": "Full or partial content", |
| 86 | + "url": "https://...", |
| 87 | + "filepath": "/path/to/file", |
| 88 | + "chunk_id": "chunk-123", |
| 89 | + "score": 0.95, |
| 90 | + "metadata": {} |
| 91 | +} |
| 92 | +``` |
| 93 | + |
| 94 | +The pipeline automatically converts Azure citations to OpenWebUI format. |
| 95 | + |
| 96 | +## Usage |
| 97 | + |
| 98 | +### Basic Setup |
| 99 | + |
| 100 | +Configure Azure AI Search to enable citation support: |
| 101 | + |
| 102 | +```bash |
| 103 | +# Azure AI Search configuration (required for citations) |
| 104 | +AZURE_AI_DATA_SOURCES='[{"type":"azure_search","parameters":{"endpoint":"https://YOUR-SEARCH-SERVICE.search.windows.net","index_name":"YOUR-INDEX-NAME","authentication":{"type":"api_key","key":"YOUR-SEARCH-API-KEY"}}}]' |
| 105 | + |
| 106 | +# Enable relevance scores (default: true) |
| 107 | +AZURE_AI_INCLUDE_SEARCH_SCORES=true |
| 108 | +``` |
| 109 | + |
| 110 | +### Clickable Document Links |
| 111 | + |
| 112 | +The pipeline automatically converts `[docX]` references to clickable markdown links: |
| 113 | + |
| 114 | +```markdown |
| 115 | +# Input from Azure AI |
| 116 | +The answer can be found in [doc1] and [doc2]. |
| 117 | + |
| 118 | +# Output (converted by pipeline) |
| 119 | +The answer can be found in [[doc1]](https://example.com/doc1.pdf) and [[doc2]](https://example.com/doc2.pdf). |
| 120 | +``` |
| 121 | + |
| 122 | +This works for both streaming and non-streaming responses. |
| 123 | + |
| 124 | +### Relevance Scores |
| 125 | + |
| 126 | +When `AZURE_AI_INCLUDE_SEARCH_SCORES=true` (default), the pipeline: |
| 127 | + |
| 128 | +1. Automatically adds `include_contexts: ["citations", "all_retrieved_documents"]` to Azure Search requests |
| 129 | +2. Extracts scores based on the `filter_reason` field: |
| 130 | + - `filter_reason="rerank"` → uses `rerank_score` |
| 131 | + - `filter_reason="score"` or not present → uses `original_search_score` |
| 132 | +3. Displays the score as a percentage on citation cards |
| 133 | + |
| 134 | +## Implementation Details |
| 135 | + |
| 136 | +### Helper Functions |
| 137 | + |
| 138 | +The pipeline includes these helper functions for citation processing: |
| 139 | + |
| 140 | +1. **`_extract_citations_from_response()`**: Extracts citations from Azure responses |
| 141 | +2. **`_normalize_citation_for_openwebui()`**: Converts Azure citations to OpenWebUI format |
| 142 | +3. **`_emit_openwebui_citation_events()`**: Emits citation events via `__event_emitter__` |
| 143 | +4. **`_merge_score_data()`**: Matches citations with score data from `all_retrieved_documents` |
| 144 | +5. **`_build_citation_urls_map()`**: Builds mapping of citation indices to URLs |
| 145 | +6. **`_format_citation_link()`**: Creates markdown links for `[docX]` references |
| 146 | +7. **`_convert_doc_refs_to_links()`**: Converts all `[docX]` references in content to markdown links |
| 147 | + |
| 148 | +### Title Fallback Logic |
| 149 | + |
| 150 | +The pipeline uses intelligent title fallback: |
| 151 | + |
| 152 | +1. Use `title` field if available |
| 153 | +2. Fallback to filename extracted from `filepath` or `url` |
| 154 | +3. Fallback to `"Unknown Document"` if all are empty |
| 155 | + |
| 156 | +This ensures every citation has a meaningful display name. |
| 157 | + |
| 158 | +### Citation Filtering |
| 159 | + |
| 160 | +Citations are filtered to only show documents that are actually referenced in the response content. For example, if Azure returns 5 citations but the response only references `[doc1]` and `[doc3]`, only those 2 citations will appear in the UI. |
| 161 | + |
| 162 | +## Troubleshooting |
| 163 | + |
| 164 | +### Citations Not Appearing |
| 165 | + |
| 166 | +**Problem**: Citations don't appear in the OpenWebUI frontend |
| 167 | + |
| 168 | +**Solutions**: |
| 169 | +1. Check that Azure AI Search is properly configured (`AZURE_AI_DATA_SOURCES`) |
| 170 | +2. Ensure you're using an Azure OpenAI endpoint (not a generic Azure AI endpoint) |
| 171 | +3. Verify the response contains `[docX]` references |
| 172 | +4. Check browser console and server logs for errors |
| 173 | + |
| 174 | +### Relevance Scores Showing 0% |
| 175 | + |
| 176 | +**Problem**: All citation cards show 0% relevance |
| 177 | + |
| 178 | +**Solutions**: |
| 179 | +1. Verify `AZURE_AI_INCLUDE_SEARCH_SCORES=true` is set |
| 180 | +2. Check that your Azure Search index supports scoring |
| 181 | +3. Enable DEBUG logging to see the raw score values from Azure |
| 182 | + |
| 183 | +### Links Not Working |
| 184 | + |
| 185 | +**Problem**: `[docX]` references are not clickable |
| 186 | + |
| 187 | +**Solutions**: |
| 188 | +1. Ensure citations have valid `url` or `filepath` fields |
| 189 | +2. Check that the document URL is accessible |
| 190 | +3. Verify the markdown link format is being generated correctly |
| 191 | + |
| 192 | +## References |
| 193 | + |
| 194 | +- [OpenWebUI Pipelines Citation Feature Discussion](https://github.com/open-webui/pipelines/issues/229) |
| 195 | +- [OpenWebUI Event Emitter Documentation](https://docs.openwebui.com/features/plugin/development/events) |
| 196 | +- [Azure AI Search Documentation](https://learn.microsoft.com/en-us/azure/search/) |
| 197 | +- [Azure On Your Data API Reference](https://learn.microsoft.com/en-us/azure/ai-foundry/openai/references/on-your-data) |
| 198 | + |
| 199 | +## Version History |
| 200 | + |
| 201 | +- **v2.6.0**: Major refactor - removed `AZURE_AI_ENHANCE_CITATIONS` and `AZURE_AI_OPENWEBUI_CITATIONS` valves; citation support is now always enabled when `AZURE_AI_DATA_SOURCES` is configured; added clickable `[docX]` markdown links; improved score extraction using `filter_reason` field |
| 202 | +- **v2.5.x**: Dual citation modes (OpenWebUI events + markdown/HTML) |
0 commit comments