Class CsvDocumentParser
java.lang.Object
ai.doctruth.CsvDocumentParser
Layer 1 entry point: read a CSV file from disk into a
ParsedDocument containing
exactly one TableSection that mirrors the CSV row-major. Backed by
com.fasterxml.jackson.dataformat:jackson-dataformat-csv — chosen per CONTRIBUTING.md §4
"Build, don't synthesize" and ADR 0007 (zero new transitive deps; reuses the Jackson
already on the classpath).
v0.1.0-alpha contract:
- Single page. The whole CSV becomes one
TableSectionwithpageStart == pageEnd == 1; rows map toSourceLocationlines (lineStart == 1,lineEnd == rows.size()). - No header detection. Every row is data — the caller decides whether row 0 is a header. Header-aware parsing can be added when a real fixture requires it.
- Comma-only delimiter. Auto-detection of
;, tab, and|(common in EU CSV exports) can be added when a real fixture requires it. - Empty file → zero sections. A file with no rows produces a
ParsedDocumentcarrying zero sections — matches the PDF blank-page rule (empty content is noise, not signal). - Encoding fallback. Read as UTF-8 first; on
MalformedInputException(invalid UTF-8 byte sequence) retry with ISO-8859-1 (Latin-1 — bytewise lossless for any 8-bit input). The library never rejects a CSV on encoding alone.
The parser is a stateless utility — it has no per-instance config in v0.1.0-alpha (so the static method form is the right level of API surface, per CONTRIBUTING.md "Engineering Principles" §5). If per-call options arrive (header / delimiter / charset overrides), this becomes an instance class with a builder.
- Since:
- 0.1.0
-
Method Summary
Modifier and TypeMethodDescriptionstatic ParsedDocumentParse the CSV atcsvPathinto aParsedDocument.
-
Method Details
-
parse
Parse the CSV atcsvPathinto aParsedDocument.- Throws:
NullPointerException- ifcsvPathis null.ParseException- if the file is missing, is not a regular file, or Jackson raises any IO error while reading. Cause-chain preserves the underlyingIOException.
-