managedcode
diff --git a/‎AGENTS.md‎
Lines changed: 30 additions & 0 deletions b/‎AGENTS.md‎
Lines changed: 30 additions & 0 deletions
diff --git a/‎Directory.Build.props‎
Lines changed: 2 additions & 2 deletions b/‎Directory.Build.props‎
Lines changed: 2 additions & 2 deletions
diff --git a/‎Directory.Packages.props‎
Lines changed: 20 additions & 14 deletions b/‎Directory.Packages.props‎
Lines changed: 20 additions & 14 deletions
@@ -9,12 +9,42 @@ If I tell you to remember something, you do the same, update
 
 
 ## Rules to follow
+- Never introduce fallback logic that silently overrides user or config values; surface configuration errors instead of masking them in code.
+- Keep `SegmentOptions.MaxParallelImageAnalysis` at `Math.Max(Environment.ProcessorCount * 4, 32)` and do not downscale it via runtime fallbacks.
+- Treat non-positive `SegmentOptions.MaxParallelImageAnalysis` values as configuration errors—fail fast instead of defaulting to unlimited concurrency.
+- Ensure document segments remain in source order with explicit numeric page/segment metadata—avoid relying on labels like "Page 1".
+- When extracting images (or other artifacts), persist them to disk when a target path is supplied and record the file path in artifact metadata.
+- Generate Markdown output from the ordered segment collection so it always reflects current segment content; avoid storing stale Markdown snapshots.
+- Allow `ConvertAsync` (and related entry points) to accept caller-supplied options for AI/config overrides on a per-document basis.
 - MIME handling: always use `ManagedCode.MimeTypes` for MIME constants, lookups, and validation logic.
 - Treat this repository as a high-fidelity port of `microsoft-markitdown`: every test fixture copied from the upstream `tests/test_files/` directory must be referenced by .NET tests (either as positive conversions or explicit unsupported cases). No orphaned fixtures.
 - CSV parsing must use the `Sep` library; avoid Sylvan or other CSV parsers for new or updated code.
 - Format integration tasks: never break the project or existing tests, and validate new format handling against real sample files.
 - Test fixtures must be surfaced via the auto-generated `TestAssetCatalog`; add binaries under `TestFiles/` and rely on its constants in tests.
 - YouTube converter work: include at least one live integration test that exercises the real metadata provider (skip gracefully if the upstream API is unavailable) so the flow mirrors production behaviour.
+- Never introduce test-only abstractions like `IAzureIntegrationSampleResolver` into the core library; keep cross-cutting helpers clean and production-ready.
+- Image enrichment tasks: once OCR runs, send the artifact through the shared `IChatClient` prompt constants, capture a thorough visual description first, convert diagrams/schematics into Mermaid or structured tables, describe technical drawings in depth, and emit Markdown that follows `docs/MetaMD.md` and `docs/MetaMD-Examples.md`.
+- Image AI enrichment must reject missing MIME metadata—surface the failure to callers instead of substituting fallback content types.
+- Image enrichment tasks: once AI enrichment runs, strip any legacy/fallback image comments so only one `**Image:` placeholder and description remain in the final Markdown.
+- Front matter titles must ignore metadata or image description comments—derive the title from the first real document text.
+- When refactoring intelligence helpers, have them return explicit result data instead of relying on hidden side effects.
+- Image placeholders must emit Markdown image links (`![alt](file.png)`) that reference persisted artifacts; only fall back to bold text when no file is available.
+- If AI image enrichment yields no insight, log and continue instead of throwing—treat empty payloads as a soft failure.
+- When executing tests, always include the `ManualConversionDebugTests` suite; treat its failures as blocking.
+- Telemetry work: instrument both overall document processing time and per-page duration with real metrics alongside traces—include histogram/counter coverage so latency is observable at both levels.
+- For large converters, structure them as partial classes and split related files into a dedicated subfolder.
+- Markdown hygiene: strip non-breaking, zero-width, or other non-printable spaces; replace them with regular ASCII spaces so output never contains invisible characters like the long space before `Add`.
+- Architecture revamps: adopt DI-first composition, expose per-request cloud model selection, and employ `System.IO.Pipelines` with optional parallel converter scheduling while keeping documentation and structure tidy.
+- DOCX processing work: restructure element handling around pipeline-driven parallelism so enrichment and extraction avoid sequential bottlenecks while preserving output ordering.
+- URL conversion APIs: expose Uri-based overloads so callers can supply strongly-typed endpoints without manual string normalization.
+- Manual Azure config defaults: never auto-populate `AzureIntegrationConfigDefaults` from environment variables; keep the static placeholder JSON.
+- Never use `MemoryStream` for conversion paths; rely on file-based processing instead of in-memory buffering.
+- Disk-first refactors: put shared disk/workspace helpers into reusable base classes instead of hiding them as nested converter types.
+- Document pipeline work: keep a single, well-defined flow that matches `docs/DocumentProcessingPipeline.md`, centralising common setup in the shared base converter and pushing OpenXML helpers into shared abstractions instead of per-converter copies; document tables/images behaviour in `docs/MetaMD.md`.
+- Manual conversion diagnostics: persist manual harness output to disk and ensure MetaMD formatting includes image description blocks for every extracted artifact.
+- Multi-page tables must emit `<!-- Table spans pages X-Y -->` comments, continuation markers for each affected page, and populate `table.pageStart`, `table.pageEnd`, and `table.pageRange` metadata so downstream systems can align tables with their source pages.
+- PDF converters must honour `SegmentOptions.Pdf.TreatPagesAsImages`, rendering each page to PNG, running OCR/vision enrichment, and composing page segments with image placeholders plus recognized text whenever the option is enabled.
+- Persist conversion workspaces through `ManagedCode.Storage` by allocating a unique, sanitized folder per document, copy the source file, store every extracted artifact via `IStorage`, and emit the final Markdown into the same folder.
 
 # Repository Guidelines
 
 
@@ -22,8 +22,8 @@
     <PackageLicenseExpression>MIT</PackageLicenseExpression>
     <PackageReadmeFile>README.md</PackageReadmeFile>
     <Product>Managed Code - MarkItDown</Product>
-    <Version>0.0.4</Version>
-    <PackageVersion>0.0.4</PackageVersion>
+    <Version>0.0.5</Version>
+    <PackageVersion>0.0.5</PackageVersion>
   </PropertyGroup>
 
   <PropertyGroup Condition="'$(GITHUB_ACTIONS)' == 'true'">
 
@@ -1,38 +1,44 @@
 <Project>
   <ItemGroup>
     <PackageVersion Include="AngleSharp" Version="1.3.0" />
-    <PackageVersion Include="AWSSDK.Rekognition" Version="4.0.2.6" />
-    <PackageVersion Include="AWSSDK.S3" Version="4.0.7.7" />
-    <PackageVersion Include="AWSSDK.Textract" Version="4.0.2.6" />
-    <PackageVersion Include="AWSSDK.TranscribeService" Version="4.0.3.9" />
+    <PackageVersion Include="AWSSDK.Rekognition" Version="4.0.2.8" />
+    <PackageVersion Include="AWSSDK.S3" Version="4.0.7.10" />
+    <PackageVersion Include="AWSSDK.Textract" Version="4.0.2.8" />
+    <PackageVersion Include="AWSSDK.TranscribeService" Version="4.0.4" />
     <PackageVersion Include="Azure.AI.FormRecognizer" Version="4.1.0" />
     <PackageVersion Include="Azure.AI.OpenAI" Version="2.1.0" />
     <PackageVersion Include="Azure.AI.Vision.ImageAnalysis" Version="1.0.0" />
-    <PackageVersion Include="Azure.Identity" Version="1.12.0" />
+    <PackageVersion Include="Azure.Identity" Version="1.17.0" />
     <PackageVersion Include="coverlet.collector" Version="6.0.4" />
     <PackageVersion Include="DocumentFormat.OpenXml" Version="3.3.0" />
     <PackageVersion Include="DotNet.ReproducibleBuilds" Version="1.2.25" />
-    <PackageVersion Include="Google.Cloud.DocumentAI.V1" Version="3.21.0" />
+    <PackageVersion Include="Google.Cloud.DocumentAI.V1" Version="3.22.0" />
     <PackageVersion Include="Google.Cloud.Speech.V1" Version="3.8.0" />
     <PackageVersion Include="Google.Cloud.Vision.V1" Version="3.7.0" />
-    <PackageVersion Include="ManagedCode.MimeTypes" Version="1.0.4" />
-    <PackageVersion Include="Microsoft.Extensions.AI" Version="9.9.1" />
+    <PackageVersion Include="ManagedCode.MimeTypes" Version="1.0.5" />
+    <PackageVersion Include="ManagedCode.Storage.Aws" Version="9.2.1" />
+    <PackageVersion Include="ManagedCode.Storage.Azure" Version="9.2.1" />
+    <PackageVersion Include="ManagedCode.Storage.Core" Version="9.2.1" />
+    <PackageVersion Include="ManagedCode.Storage.FileSystem" Version="9.2.1" />
+    <PackageVersion Include="ManagedCode.Storage.Gcp" Version="9.2.1" />
+    <PackageVersion Include="Microsoft.Extensions.AI" Version="9.10.0" />
     <PackageVersion Include="Microsoft.Extensions.AI.OpenAI" Version="9.9.1-preview.1.25474.6" />
-    <PackageVersion Include="Microsoft.Extensions.DependencyInjection.Abstractions" Version="9.0.9" />
-    <PackageVersion Include="Microsoft.Extensions.Logging.Abstractions" Version="9.0.9" />
+    <PackageVersion Include="Microsoft.Extensions.DependencyInjection.Abstractions" Version="9.0.10" />
+    <PackageVersion Include="Microsoft.Extensions.Logging.Abstractions" Version="9.0.10" />
+    <PackageVersion Include="Microsoft.Extensions.Options" Version="9.0.10" />
     <PackageVersion Include="Microsoft.NET.Test.Sdk" Version="17.14.1" />
     <PackageVersion Include="MimeKit" Version="4.14.0" />
     <PackageVersion Include="Moq" Version="4.20.72" />
     <PackageVersion Include="PdfPig" Version="0.1.11" />
     <PackageVersion Include="PDFtoImage" Version="5.1.1" />
-    <PackageVersion Include="Sep" Version="0.11.1" />
+    <PackageVersion Include="Sep" Version="0.11.2" />
     <PackageVersion Include="Shouldly" Version="4.3.0" />
     <PackageVersion Include="SkiaSharp" Version="3.119.1" />
     <PackageVersion Include="Spectre.Console" Version="0.51.1" />
-    <PackageVersion Include="System.Text.Encoding.CodePages" Version="9.0.9" />
-    <PackageVersion Include="System.Text.Json" Version="9.0.9" />
+    <PackageVersion Include="System.Text.Encoding.CodePages" Version="9.0.10" />
+    <PackageVersion Include="System.Text.Json" Version="9.0.10" />
     <PackageVersion Include="YoutubeExplode" Version="6.5.5" />
     <PackageVersion Include="xunit" Version="2.9.3" />
     <PackageVersion Include="xunit.runner.visualstudio" Version="3.1.4" />
   </ItemGroup>
-</Project>
+</Project>