Provides an implementation of the IngestionDocumentReader class for the MarkItDown utility.
From the command-line:
dotnet add package Microsoft.Extensions.DataIngestion.MarkItDown --prereleaseOr directly in the C# project file:
<ItemGroup>
<PackageReference Include="Microsoft.Extensions.DataIngestion.MarkItDown" Version="[CURRENTVERSION]" />
</ItemGroup>Use MarkItDownReader to convert documents using the MarkItDown executable installed locally:
using Microsoft.Extensions.DataIngestion;
IngestionDocumentReader reader =
new MarkItDownReader(new FileInfo(@"pathToMarkItDown.exe"), extractImages: true);
using IngestionPipeline<string> pipeline = new(CreateChunker(), CreateWriter());
await foreach (IngestionResult result in pipeline.ProcessAsync(reader, directory, "*.pdf"))
{
Console.WriteLine($"Processed '{result.DocumentId}'. Succeeded: {result.Succeeded}");
}Use MarkItDownMcpReader to convert documents using a MarkItDown MCP server:
using Microsoft.Extensions.DataIngestion;
// Connect to a MarkItDown MCP server (e.g., running in Docker)
IngestionDocumentReader reader =
new MarkItDownMcpReader(new Uri("http://localhost:3001/mcp"));
using IngestionPipeline<string> pipeline = new(CreateChunker(), CreateWriter());
await foreach (IngestionResult result in pipeline.ProcessAsync(reader, directory, "*.*"))
{
Console.WriteLine($"Processed '{result.DocumentId}'. Succeeded: {result.Succeeded}");
}The MarkItDown MCP server can be run using Docker:
docker run -p 3001:3001 mcp/markitdown --http --host 0.0.0.0 --port 3001Or installed via pip:
pip install markitdown-mcp-server
markitdown-mcp --http --host 0.0.0.0 --port 3001Aspire can be used for seamless integration with MarkItDown MCP. Sample AppHost logic:
var builder = DistributedApplication.CreateBuilder(args);
var markitdown = builder.AddContainer("markitdown", "mcp/markitdown")
.WithArgs("--http", "--host", "0.0.0.0", "--port", "3001")
.WithHttpEndpoint(targetPort: 3001, name: "http");
var webApp = builder.AddProject("name");
webApp.WithEnvironment("MARKITDOWN_MCP_URL", markitdown.GetEndpoint("http"));
builder.Build().Run();Sample Ingestion Service:
string url = $"{Environment.GetEnvironmentVariable("MARKITDOWN_MCP_URL")}/mcp";
IngestionDocumentReader reader = new MarkItDownMcpReader(new Uri(url));We welcome feedback and contributions in our GitHub repo.