title	Extract PDF to Markdown in C# \| Smart Data Extractor \| Syncfusion
description	Extract PDF documents as Markdown (MD) in C# using Syncfusion<sup>®</sup> Smart Data Extractor library without Microsoft Office or Adobe dependencies
platform	document-processing
control	SmartDataExtractor
documentation	UG
keywords	Assemblies

PDF to Markdown Extraction

Markdown is a lightweight markup language that adds formatting elements to plain text documents. The Syncfusion^® Smart Data Extractor library extracts structured information from PDF documents and scanned images, and outputs the content as Markdown (MD). It analyzes text blocks, tables, headers, and form fields to preserve layout and formatting.

Assemblies and NuGet packages required

Refer to the following links for assemblies and NuGet packages required based on platforms to Extract data as Markdown file using the .NET Word Library (DocIO).

Extract Data as Markdown from PDF or Image

To extract form fields across a PDF document using the ExtractDataAsMarkdown method of the DataExtractor class, refer to the following code example:

{% tabs %}

{% highlight c# tabtitle="C# [Cross-platform]" playgroundButtonLink="https://raw.githubusercontent.com/SyncfusionExamples/PDF-Examples/refs/heads/master/Data-Extraction/Smart-Data-Extractor/Extract-data-as-MD-from-PDF/.NET/Extract-data-as-MD-from-PDF/Program.cs" %}

using System.IO; using Syncfusion.SmartDataExtractor; using System.Text;

//Open the input PDF file as a stream. using (FileStream stream = new FileStream("Input.pdf", FileMode.Open, FileAccess.Read)) { //Initialize the Data Extractor. DataExtractor extractor = new DataExtractor(); //Extract data as Markdown. string data = extractor.ExtractDataAsMarkdown(stream); //Save the extracted Markdown data into an output file. File.WriteAllText("Output.md", data, Encoding.UTF8); }

{% endhighlight %}

{% highlight c# tabtitle="C# [Windows-specific]" %}

using System.IO; using Syncfusion.SmartDataExtractor; using System.Text;

{% endhighlight %}

{% highlight vb.net tabtitle="VB.NET [Windows-specific]" %}

Imports System.IO Imports System.Text Imports Syncfusion.SmartDataExtractor

' Open the input PDF file as a stream. Using stream As New FileStream("Input.pdf", FileMode.Open, FileAccess.Read) ' Initialize the Data Extractor. Dim extractor As New DataExtractor() ' Extract data as Markdown. Dim data As String = extractor.ExtractDataAsMarkdown(stream) ' Save the extracted Markdown data into an output file. File.WriteAllText("Output.md", data, Encoding.UTF8) End Using

{% endhighlight %}

{% endtabs %}

N> If you want to extract data from an image instead of a PDF, replace the input stream with the image file (for example, Input.jpg or Input.png). The rest of the code remains unchanged.

You can download a complete working sample from GitHub.

Extract a specific page to Markdown

The following code demonstrates how to use the ExtractDataAsMarkdown method of the DataExtractor class to extract content from a selected page in a PDF and save it as a Markdown file by specifying its page index.

{% tabs %}

{% highlight c# tabtitle="C# [Cross-platform]" %}

using System.IO; using Syncfusion.SmartDataExtractor; using System.Text;

//Open the input PDF file as a stream. using (FileStream stream = new FileStream("Input.pdf", FileMode.Open, FileAccess.Read)) { //Initialize the Data Extractor. DataExtractor extractor = new DataExtractor(); //Set the page index for extraction (example: page 2). extractor.PageRange = new int[,] { { 2, 2 } }; //Extract data as Markdown using the API. string data = extractor.ExtractDataAsMarkdown(stream); //Save the extracted Markdown data into an output file. File.WriteAllText("Output.md", data, Encoding.UTF8); }

{% endhighlight %}

{% highlight c# tabtitle="C# [Windows-specific]" %}

using System.IO; using Syncfusion.SmartDataExtractor; using System.Text;

//Open the input PDF file as a stream. using (FileStream stream = new FileStream("Input.pdf", FileMode.Open, FileAccess.Read)) { //Initialize the Data Extractor. DataExtractor extractor = new DataExtractor(); //Set the page index for extraction (example: page 2). extractor.PageRange = new int[,] { { 2, 2 } }; //Extract data as Markdown using the API. string data = extractor.ExtractDataAsMarkdown(stream); //Save the extracted Markdown data into an output file. File.WriteAllText("Output.md", data, Encoding.UTF8); }

{% endhighlight %}

{% highlight vb.net tabtitle="VB.NET [Windows-specific]" %}

Imports System.IO Imports System.Text Imports Syncfusion.SmartDataExtractor

' Open the input PDF file as a stream. Using stream As New FileStream("Input.pdf", FileMode.Open, FileAccess.Read) ' Initialize the Data Extractor. Dim extractor As New DataExtractor() ' Set the page index for extraction (example: page 2). extractor.PageRange = New Integer(,) {{2, 2}} ' Extract data as Markdown using the API. Dim data As String = extractor.ExtractDataAsMarkdown(stream) ' Save the extracted Markdown data into an output file. File.WriteAllText("Output.md", data, Encoding.UTF8) End Using

{% endhighlight %}

{% endtabs %}

Extract a range of pages to Markdown

The following code demonstrates how to use the ExtractDataAsMarkdown method of the DataExtractor class to extract content from a range of pages in a PDF and save it as a Markdown file by specifying the page range.

{% tabs %}

{% highlight c# tabtitle="C# [Cross-platform]" %}

using System.IO; using Syncfusion.SmartDataExtractor; using System.Text;

//Open the input PDF file as a stream. using (FileStream stream = new FileStream("Input.pdf", FileMode.Open, FileAccess.Read)) { //Initialize the Data Extractor. DataExtractor extractor = new DataExtractor(); //Set the page range for extraction (pages 1 to 3). extractor.PageRange = new int[,] { { 1, 3 } }; //Extract data as Markdown using the API. string data = extractor.ExtractDataAsMarkdown(stream); //Save the extracted Markdown data into an output file. File.WriteAllText("Output.md", data, Encoding.UTF8); }

{% endhighlight %}

{% highlight c# tabtitle="C# [Windows-specific]" %}

using System.IO; using Syncfusion.SmartDataExtractor; using System.Text;

//Open the input PDF file as a stream. using (FileStream stream = new FileStream("Input.pdf", FileMode.Open, FileAccess.Read)) { //Initialize the Data Extractor. DataExtractor extractor = new DataExtractor(); //Set the page range for extraction (pages 1 to 3). extractor.PageRange = new int[,] { { 1, 3 } }; //Extract data as Markdown using the API. string data = extractor.ExtractDataAsMarkdown(stream); //Save the extracted Markdown data into an output file. File.WriteAllText("Output.md", data, Encoding.UTF8); }

{% endhighlight %}

{% highlight vb.net tabtitle="VB.NET [Windows-specific]" %}

Imports System.IO Imports System.Text Imports Syncfusion.SmartDataExtractor

' Open the input PDF file as a stream. Using stream As New FileStream("Input.pdf", FileMode.Open, FileAccess.Read) ' Initialize the Data Extractor. Dim extractor As New DataExtractor() ' Set the page range for extraction (pages 1 to 3). extractor.PageRange = New Integer(,) {{1, 3}} ' Extract data as Markdown using the API. Dim data As String = extractor.ExtractDataAsMarkdown(stream) ' Save the extracted Markdown data into an output file. File.WriteAllText("Output.md", data, Encoding.UTF8) End Using

{% endhighlight %}

{% endtabs %}

PDF to Markdown Preservation Mapping

This section explains how common PDF elements are converted and preserved in Markdown format, ensuring that document structure and formatting remain consistent during the PDF to Markdown conversion process.

PDF Elements	Preservation in Markdown
Header, Paragraph Title, Document Title	Headings (H2)
Paragraph	Paragraph
Image	Image (base64 string)
Table	Table
Text Inline Styles	Bold and Italic
Link text without title text	Links
Code blocks, Footer, Page Number, List, Block quotes, Subscript, Superscript	Text

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PDF to Markdown Extraction

Assemblies and NuGet packages required

Extract Data as Markdown from PDF or Image

Extract a specific page to Markdown

Extract a range of pages to Markdown

PDF to Markdown Preservation Mapping

FilesExpand file tree

pdf-to-markdown.md

Latest commit

History

pdf-to-markdown.md

File metadata and controls

PDF to Markdown Extraction

Assemblies and NuGet packages required

Extract Data as Markdown from PDF or Image

Extract a specific page to Markdown

Extract a range of pages to Markdown

PDF to Markdown Preservation Mapping