Class PdfDocumentParser
java.lang.Object
ai.doctruth.PdfDocumentParser
Layer 1 entry point: read a PDF file from disk into a
ParsedDocument with
source locations preserved per detected layout block. PDFBox owns raw glyph extraction;
PdfPageBlockExtractor owns page-level grouping and visual classification.- Since:
- 0.1.0
-
Method Summary
Modifier and TypeMethodDescriptionstatic ParsedDocumentParse the PDF atpdfPathinto aParsedDocument.
-
Method Details
-
parse
Parse the PDF atpdfPathinto aParsedDocument.- Throws:
NullPointerException- ifpdfPathis null.ParseException- if the file is missing, is not a PDF, is encrypted with an unknown password, or PDFBox raises any IO error.
-