You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: blog/2026-05-05-docwire-returns.md
+2-2Lines changed: 2 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -12,7 +12,7 @@ In course of human history, there has not been a time like the present era when
12
12
Away from the noise, contemplating in silence, working through perseverance, Docwire SDK was being evolved into an infrastructure layer for modern data workflows to sustain the demands of modern day information processing. The journey has been long and arduous but the result has been equally satisfying. And it is time we present to you the much more evolved Docwire SDK, along with anecdotes about the philosophy behind it and the direction forward.
13
13
<!-- truncate -->
14
14
## The Invisible Engine
15
-
Every interface we interact with is a surface level reality, but the entity itself is supported by an engine which remains invisible. Same is true for the virtual world. Some of the most important softwares are never seen. When you tap a payment card or rely on an embedded medical device, complex logic runs quietly in the background. Document processing plays a similar role in many systems — critical, yet hidden. And Docwire has been designed to serve exactly the same purpose. An engine that extracts, normalizes, and transforms unstructured documents into structured, usable data — locally, securely and reliably. And it still adheres to its “Plug and Play” philosophy. Developers integrate it once and it simply works! Only better this time, with support for more file formats and a fluent ingestion layer for building data processing pipeline.
15
+
Every interface we interact with is a surface level reality, but the entity itself is supported by an engine which remains invisible. Same is true for the virtual world. Some of the most important software is never seen. When you tap a payment card or rely on an embedded medical device, complex logic runs quietly in the background. Document processing plays a similar role in many systems — critical, yet hidden. And Docwire has been designed to serve exactly the same purpose. An engine that extracts, normalizes, and transforms unstructured documents into structured, usable data — locally, securely and reliably. And it still adheres to its “Plug and Play” philosophy. Developers integrate it once and it simply works! Only better this time, with support for more file formats and a fluent ingestion layer for building data processing pipeline.
16
16
17
17
18
18
## What Changed since 2023
@@ -28,7 +28,7 @@ We started this journey of evolving Docwire with following ideas as underlying p
28
28
And this is where Docwire shines. Docwire is not simply AI-based or AI-driven, but AI-integrated SDK, which handles your document data processing requirements. It gives the user enough flexibility to process the data across various file formats and integrate it further with AI models of their choice, be it local or through APIs.
29
29
30
30
## From Files to Pipelines
31
-
Docwire's evolution is not limited to supporting various file formats but integrating more workflows. In its inception, it was a file parsing tool and in its evolution, it is becoming pipeline construction SDK with various tools at your disposal:<br/>
31
+
DocWire's evolution is not limited to supporting various file formats but integrating more workflows. In its inception, it was a file parsing tool and in its evolution, it is becoming pipeline construction SDK with various tools at your disposal:<br/>
The file to notice here is `y.h` since this is the file that will be included in some main.cpp. Usually, programmers #include many more headers than necessary, which unfortunately degrades build times, especially when a popular header file includes too many other headers. Ours above is a simplistic one, yet enough to convey the message. Can we somehow remove any header from this file while still having our code compile and run successfully?
Copy file name to clipboardExpand all lines: tech-dive/2026-05-07-docwire-parsing-chain.md
+3-3Lines changed: 3 additions & 3 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -5,7 +5,7 @@ tags: [C++20, compile time, optimization]
5
5
---
6
6
7
7
## Introduction
8
-
[Docwire's](https://docwire.io/) is a powerful data extraction tool, developed on Modern C++, that converts text from nearly all known file formats into searchable and editable data. Powered by the Tesseract OCR engine, DocWire is a solution for digitizing text from many image types, MS Office files, e-mails, or e-mail attachments. DocWire outputs data to plain text that may be transmitted for further processing.
8
+
[Docwire](https://docwire.io/) is a powerful data extraction tool, developed on Modern C++, that converts text from nearly all known file formats into searchable and editable data. Powered by the Tesseract OCR engine, DocWire is a solution for digitizing text from many image types, MS Office files, e-mails, or e-mail attachments. DocWire outputs data to plain text that may be transmitted for further processing.
9
9
10
10
One of the interesting aspects of Docwire SDK is its ability to process documents locally (or even make an OpenAI API call) through a series of customizable steps that can be added or removed as per requirements. For example, consider the following code example ():
11
11
```cpp
@@ -17,7 +17,7 @@ In the above pipeline processing, a document is being picked, its content type i
🔗Docwire Code examples can be accessed at this [link](https://docwire.readthedocs.io/en/latest/examples.html).
20
+
🔗Explore Docwire code examples in the [official examples documentation](https://docwire.readthedocs.io/en/latest/examples.html).
21
21
22
22
Now, we have added a local model to translate the text in the document to Spanish and then stream the output. It seems as if the product is moving on a conveyor belt, and necessary customizations can be applied, such that output of the previous step acts as an input of the next step, exactly how a pipeline chain would work. In software terms, this emulates exactly how the Unix pipe operator `|` works in the terminal.
23
23
@@ -46,7 +46,7 @@ struct EndMessage : Message {};
46
46
47
47
We have defined a base entity and various types of such entities, and based on the types, the parsing steps will decide how to act.
48
48
49
-
*_Note_*: <u>In C++, classes are nothing but structs. I have taken the approach of structs here, as the intention is to keep the implementation short and minimal.</u>
49
+
*_Note_*: <u>In C++, classes are primarily structs. I have taken the approach of structs here, as the intention is to keep the implementation short and minimal.</u>
0 commit comments