Class XlsxDocumentParser

java.lang.Object
ai.doctruth.XlsxDocumentParser

public final class XlsxDocumentParser extends Object
Layer 1 entry point: read an XLSX file from disk into a ParsedDocument with one TableSection per non-empty sheet. Backed by Apache POI (XSSFWorkbook) — chosen per CONTRIBUTING.md §4 "Build, don't synthesize" (POI is the canonical Java OOXML lib).

v0.1.0-alpha sheet-as-page analogy: spreadsheet workbooks have no native "pages" the way PDFs do, but every sheet is a self-contained tabular surface. We map each sheet to a logical page and each row to a logical line so that SourceLocation stays consistent across formats — a Citation pointing at "page 2 line 5" of an XLSX document means "sheet index 1 (0-indexed) row index 4". Sheet name is intentionally not part of the location record (would force a 6th component); downstream consumers can fetch it from DocumentMetadata extensions in a later phase.

Cell rendering uses POI's DataFormatter.formatCellValue(Cell) so dates, percentages, and formula-cached values appear as the user sees them in Excel — not as raw doubles. Empty/null cells render as the empty string ""; trailing all-blank rows are trimmed from each sheet, but interior all-blank rows are preserved (they convey layout).

The parser is a stateless utility — it has no per-instance config in v0.1.0-alpha (so the static method form is the right level of API surface, per Engineering Principles §5 "elegance over cleverness").

Since:
0.1.0