Class CsvDocumentParser

java.lang.Object
ai.doctruth.CsvDocumentParser

public final class CsvDocumentParser extends Object
Layer 1 entry point: read a CSV file from disk into a ParsedDocument containing exactly one TableSection that mirrors the CSV row-major. Backed by com.fasterxml.jackson.dataformat:jackson-dataformat-csv — chosen per CONTRIBUTING.md §4 "Build, don't synthesize" and ADR 0007 (zero new transitive deps; reuses the Jackson already on the classpath).

v0.1.0-alpha contract:

  • Single page. The whole CSV becomes one TableSection with pageStart == pageEnd == 1; rows map to SourceLocation lines (lineStart == 1, lineEnd == rows.size()).
  • No header detection. Every row is data — the caller decides whether row 0 is a header. Header-aware parsing can be added when a real fixture requires it.
  • Comma-only delimiter. Auto-detection of ;, tab, and | (common in EU CSV exports) can be added when a real fixture requires it.
  • Empty file → zero sections. A file with no rows produces a ParsedDocument carrying zero sections — matches the PDF blank-page rule (empty content is noise, not signal).
  • Encoding fallback. Read as UTF-8 first; on MalformedInputException (invalid UTF-8 byte sequence) retry with ISO-8859-1 (Latin-1 — bytewise lossless for any 8-bit input). The library never rejects a CSV on encoding alone.

The parser is a stateless utility — it has no per-instance config in v0.1.0-alpha (so the static method form is the right level of API surface, per CONTRIBUTING.md "Engineering Principles" §5). If per-call options arrive (header / delimiter / charset overrides), this becomes an instance class with a builder.

Since:
0.1.0