All Classes and Interfaces

Class
Description
Anthropic Messages API provider.
A single audit event emitted by the library — extraction success, extraction failure, citation-below-threshold, etc.
Callback the library invokes for every auditable event.
Visual / structural classification of a TextSection as detected by Layer 1 parsing — a geometric / typographic judgement, NOT a semantic one.
The verifiable evidence anchor for a single extracted field.
A confidence score for a single extracted field, plus a free-form rationale.
 
How a ParsedDocument is rendered into the bounded context window of an LLM call.
Layer 1 entry point: read a CSV file from disk into a ParsedDocument containing exactly one TableSection that mirrors the CSV row-major.
DeepSeek Chat Completions provider.
Public entry point for the library.
Minimal command-line entry point for build-time migration helpers.
Metadata for a ParsedDocument: the source filename, total page count, and (optionally) the timestamp at which the source document was authored / published.
Layer 1 entry point: read a DOCX file from disk into a ParsedDocument with one TextSection per non-blank paragraph and one TableSection per table.
Immutable fluent builder for one extraction call.
Thrown from the public extraction API when an extraction run fails after exhausting retries or when an invariant is violated mid-flight.
The output of DocTruth.extract(...).run(): the extracted value plus per-field citations, per-field confidence scores, and run-level provenance.
A figure (image, chart, diagram) recovered from the source document, represented by its caption text plus a SourceLocation.
Google Gemini generateContent provider.
Multi-level summarisation: condense the document at increasing granularities, hand the LLM the level that fits the budget.
Immutable fluent builder for JSON Schema-driven extraction.
Caller-supplied JSON Schema for schema-bound extraction.
The Layer 2 backend abstraction: an LLM API client.
Pixel bounding box for one OCR region on a rendered page image.
Optional OCR backend, plugged into PdfDocumentParser to recover text from scanned (image-only) pages.
Output of one ocr call.
One OCR-recovered text region with its pixel bounding box on the rendered page image.
OpenAI Chat-Completions API provider.
What a PriorityTruncate strategy does when the priority sections alone exceed the configured maxChars budget.
The output of Layer 1 (document parsing): a stable identifier, an ordered list of sections, and the document metadata.
A single section of a parsed source document.
Thrown by Layer 1 document parsers (PDF / DOCX) when a source file cannot be parsed or when a structural invariant is violated.
Layer 1 entry point: read a PDF file from disk into a ParsedDocument with source locations preserved per detected layout block.
Smart-context strategy for keeping priority sections while trimming everything else to fit.
Bi-temporal provenance for an ExtractionResult: the model that produced it, when the extraction ran, and (optionally) when the source document was authored, the region the extraction was processed in, and the retention horizon of the audit record.
Supplemental provenance metadata kept behind Provenance so the public provenance record stays small while preserving retry, data-residency, and retention semantics.
Thrown by Layer 2 LLM providers (Anthropic, OpenAI, Gemini, DeepSeek) when an upstream call fails.
Per-call knobs passed to an LlmProvider on every request.
What the library hands an LlmProvider on every call: the system prompt, the user prompt (rendered from a ParsedDocument by the configured ContextStrategy), the JSON Schema for the target type, and the per-call options.
What an LlmProvider returns on a successful call: the raw JSON the LLM produced plus the per-call ProviderUsage.
Token-usage and model-version data returned by an LLM provider on every successful call.
Render an ExtractionResult as W3C PROV-O JSON-LD.
Sign / wrap an audit JSON document for tamper-evident persistence.
Fixed-size character windows with optional overlap.
A 1-indexed page + line span into a parsed source document, plus a 0-indexed character offset into the source page text.
A flat string-cell table recovered from the source document, anchored to a SourceLocation.
A run of plain text recovered from the source document, anchored to a SourceLocation and tagged with a BlockKind that classifies the geometric / typographic shape of the block (HEADING / BODY / LIST / OTHER).
Layer 1 entry point: read an XLSX file from disk into a ParsedDocument with one TableSection per non-empty sheet.