Skip to content

Commit 44cfa46

Browse files
[ContentUnderstanding] Add to_llm_input helper for converting analysis results to LLM-friendly text (#46386)
1 parent 5691862 commit 44cfa46

25 files changed

Lines changed: 3203 additions & 42 deletions

sdk/contentunderstanding/azure-ai-contentunderstanding/CHANGELOG.md

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,10 @@
11
# Release History
22

3+
## 1.2.0b1 (2026-04-28)
4+
5+
### Features Added
6+
- Added `to_llm_input` helper function that converts `AnalysisResult` objects into LLM-friendly text with YAML front matter and markdown content. Supports documents, audio/video, and classification hierarchies.
7+
38
## 1.1.0 (2026-04-20)
49

510
### Features Added

sdk/contentunderstanding/azure-ai-contentunderstanding/README.md

Lines changed: 54 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -57,6 +57,7 @@ This table shows the relationship between SDK versions and supported API service
5757

5858
| SDK version | Supported API service version |
5959
| ----------- | ----------------------------- |
60+
| 1.2.0b1 | 2025-11-01 |
6061
| 1.1.0 | 2025-11-01 |
6162
| 1.0.1 | 2025-11-01 |
6263
| 1.0.0 | 2025-11-01 |
@@ -517,6 +518,58 @@ async def analyze_invoice():
517518
asyncio.run(analyze_invoice())
518519
```
519520
521+
#### Convert results to LLM-ready text
522+
523+
Use the `to_llm_input()` helper to convert any analysis result into a text format that LLMs
524+
can consume directly — YAML front matter with extracted fields followed by the markdown body.
525+
This works with all content types (documents, images, audio, video) and handles multi-segment
526+
results and classification hierarchies automatically. Run from the `samples/` directory:
527+
528+
```python
529+
from azure.ai.contentunderstanding import ContentUnderstandingClient, to_llm_input
530+
from azure.identity import DefaultAzureCredential
531+
532+
client = ContentUnderstandingClient(endpoint, DefaultAzureCredential())
533+
534+
# Analyze a document with text, tables, and charts using prebuilt-documentSearch (CU's primary RAG analyzer)
535+
with open("sample_files/sample_document_features.pdf", "rb") as f:
536+
poller = client.begin_analyze_binary(
537+
analyzer_id="prebuilt-documentSearch",
538+
binary_input=f.read(),
539+
)
540+
result = poller.result()
541+
542+
# One line to get LLM-ready text
543+
text = to_llm_input(result)
544+
print(text)
545+
# Output:
546+
# ---
547+
# contentType: document
548+
# pages: 1
549+
# fields:
550+
# Summary: The document provides an overview of Latin, includes a sample
551+
# table with names and corporate affiliations, presents a bar chart
552+
# figure illustrating monthly values, and describes the AI Document
553+
# Intelligence service...
554+
# ---
555+
# <!-- page 1 -->
556+
# # ==This is title==
557+
# ## 1. Text
558+
# [Latin](https://en.wikipedia.org/wiki/Latin) refers to an ancient Italic language...
559+
# ## 2. Page Objects
560+
# ### 2.1 Table
561+
# <table><caption>Table 1: This is a dummy table</caption>...</table>
562+
# ### 2.2. Figure
563+
# ![Values...](figures/1.1 "Bar chart with six bars: Jan=200, Feb=300...")
564+
# ```chart
565+
# {"type":"bar","data":{"labels":["Jan","Feb",...],...}}
566+
# ```
567+
# ...
568+
```
569+
570+
See the [advanced sample][python_cu_sample_to_llm_input] for output options (fields-only,
571+
markdown-only, custom metadata), multi-page content ranges, and multi-segment video.
572+
520573
## Troubleshooting
521574
522575
### Common issues
@@ -618,6 +671,7 @@ This project has adopted the [Microsoft Open Source Code of Conduct][code_of_con
618671
[python_cu_pypi]: https://pypi.org/project/azure-ai-contentunderstanding/
619672
[python_cu_product_docs]: https://learn.microsoft.com/azure/ai-services/content-understanding/
620673
[python_cu_samples]: https://github.com/Azure/azure-sdk-for-python/tree/main/sdk/contentunderstanding/azure-ai-contentunderstanding/samples
674+
[python_cu_sample_to_llm_input]: https://github.com/Azure/azure-sdk-for-python/tree/main/sdk/contentunderstanding/azure-ai-contentunderstanding/samples/sample_to_llm_input.py
621675
[azure_sub]: https://azure.microsoft.com/free/
622676
[cu_quickstart]: https://learn.microsoft.com/azure/ai-services/content-understanding/quickstart/use-rest-api?tabs=portal%2Cdocument
623677
[cu_region_support]: https://learn.microsoft.com/azure/ai-services/content-understanding/language-region-support

sdk/contentunderstanding/azure-ai-contentunderstanding/assets.json

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,5 +2,5 @@
22
"AssetsRepo": "Azure/azure-sdk-assets",
33
"AssetsRepoPrefixPath": "python",
44
"TagPrefix": "python/contentunderstanding/azure-ai-contentunderstanding",
5-
"Tag": "python/contentunderstanding/azure-ai-contentunderstanding_83f2e32128"
5+
"Tag": "python/contentunderstanding/azure-ai-contentunderstanding_3b4e92c5d3"
66
}

0 commit comments

Comments
 (0)