Intelligent document processing. Zero Python.
Extract text, tables, layouts, and semantics from any document type. From invoices to contracts, from medical records to financial statements. End-to-end document understanding pipeline in pure .NET.
Challenges we solve
Manual data entry from paper documents
PaddleOCR and TrOCR extract text from scanned documents, handwriting, and photos with 99%+ accuracy.
Unstructured documents with complex layouts
LayoutLMv3 understands document structure - tables, headers, key-value pairs, and multi-column layouts.
Domain-specific document understanding
Fine-tune on your document types: invoices, contracts, medical records, insurance claims. LoRA adapters keep models small.
Python OCR pipelines in .NET applications
Full OCR and document AI pipeline in C#. No Tesseract binary, no Python subprocess, no cross-language complexity.
Key capabilities
OCR & Text Extraction
Extract text from scanned documents, photos, and PDFs with multi-language support.
Document Layout Analysis
Understand document structure including tables, headers, paragraphs, and figures.
Key-Value Extraction
Extract structured data from invoices, receipts, forms, and contracts automatically.
Visual Question Answering
Ask questions about documents in natural language and get accurate answers.
Table Extraction
Detect and extract tables from documents with cell-level accuracy.
Document Classification
Automatically classify and route documents by type, urgency, and department.
Typical workflow
Document ingestion
Scan, upload, or receive documents via email/API integration.
OCR & text extraction
PaddleOCR extracts text, TrOCR handles handwriting, DBNet detects text regions.
Layout analysis
LayoutLMv3 identifies tables, headers, paragraphs, and key-value pairs.
Structured extraction
Donut extracts invoice numbers, dates, amounts. Fine-tuned models handle custom fields.
Validation & QA
Confidence scores flag uncertain extractions for human review.
Integration
Output structured JSON/XML for downstream ERP, CRM, and database systems.
Document AI with AiModelBuilder
using AiDotNet;
// OCR model with AiModelBuilder
var ocrModel = await new AiModelBuilder<float, float[], float>()
.ConfigureModel(new PaddleOCR<float>())
.ConfigurePreprocessing()
.BuildAsync(documentImages, textLabels);
var text = ocrModel.Predict(scannedDocument);
// Layout understanding with AiModelBuilder
var layoutModel = await new AiModelBuilder<float, float[], float>()
.ConfigureModel(new LayoutLMv3<float>("layoutlmv3-base"))
.ConfigureOptimizer(new AdamOptimizer<float>())
.ConfigurePreprocessing()
.BuildAsync(documentImages, layoutLabels);
var structure = layoutModel.Predict(invoiceImage);
// Document VQA with AiModelBuilder
var vqaModel = await new AiModelBuilder<float, float[], float>()
.ConfigureModel(new Phi3Vision<float>("phi-3.5-vision"))
.ConfigurePreprocessing()
.BuildAsync(documentImages, answerLabels);
var answer = vqaModel.Predict(contractImage);