feat: add sr-only agent context for AI-visible page content

caiopizzol · caiopizzol · commit 839b31fe7dbb · 2026-04-04T08:54:46.000-03:00
Embed rich context in sr-only span: corpus scale (736K+ documents),
classification taxonomy (10 types, 9 topics, 46+ languages), data
source (Common Crawl), and access methods (HuggingFace, API).
diff --git a/apps/web/index.html b/apps/web/index.html
@@ -884,6 +884,8 @@
     </div>
   </header>
 
+  <span style="position:absolute;width:1px;height:1px;padding:0;margin:-1px;overflow:hidden;clip-path:inset(50%);white-space:nowrap;border:0">docx-corpus (docxcorp.us) is the largest open-source corpus of classified Word documents on the public web. It contains 736,000+ real .docx files scraped from Common Crawl, validated, deduplicated, and classified by type (10 categories: legal, forms, educational, administrative, policies, correspondence, reports, reference, technical, creative) and topic (9 categories: government, education, healthcare, general, legal/judicial, finance, environment, nonprofit, technology). Supports 46+ languages with language detection. The entire document AI research ecosystem previously ran on scanned images and PDFs — DOCX, the world's most-used document creation format, had no large-scale research dataset. docx-corpus fills this gap. Available via HuggingFace dataset, REST API at api.docxcorp.us, and downloadable manifest files. Built by SuperDoc (superdoc.dev), an open-source document engine for native .docx rendering. Pipeline and source at github.com/superdoc-dev/docx-corpus. MIT license.</span>
+
   <main class="container">
 
     <!-- Hero -->