Record Class OcrPageResult

java.lang.Object
java.lang.Record
ai.doctruth.spi.OcrPageResult
Record Components:
text - the full recovered text for the page, in reading order.
confidence - average confidence over the page in [0.0, 1.0].
regions - optional per-region bounding boxes; richer engines emit one per word / line. Empty list is acceptable for simple engines that only emit page-level text.
pageNumber - 1-indexed page number this result corresponds to.

public record OcrPageResult(String text, double confidence, List<OcrRegion> regions, int pageNumber) extends Record
Output of one ocr call.

Carries the recovered text plus per-region bounding boxes — these enable Citation objects to reference a precise pixel rectangle on the source PDF page, not just a line range. That is the audit-depth differentiator over libraries that only OCR to a flat text blob.

Invariants (compact constructor):

  • text non-null (empty string allowed — flag for genuinely blank pages).
  • confidence in [0.0, 1.0] — typically OCR engine's averaged per-character or per-word confidence. NaN / infinities rejected.
  • regions non-null and defensively copied; empty list is allowed.
  • pageNumber >= 1.
Since:
0.1.0
  • Constructor Details

    • OcrPageResult

      public OcrPageResult(String text, double confidence, List<OcrRegion> regions, int pageNumber)
      Creates an instance of a OcrPageResult record class.
      Parameters:
      text - the value for the text record component
      confidence - the value for the confidence record component
      regions - the value for the regions record component
      pageNumber - the value for the pageNumber record component
  • Method Details

    • empty

      public static OcrPageResult empty(int pageNumber)
      Empty result for blank or non-OCR-able pages. Uses confidence 0.0.
    • toString

      public final String toString()
      Returns a string representation of this record class. The representation contains the name of the class, followed by the name and value of each of the record components.
      Specified by:
      toString in class Record
      Returns:
      a string representation of this object
    • hashCode

      public final int hashCode()
      Returns a hash code value for this object. The value is derived from the hash code of each of the record components.
      Specified by:
      hashCode in class Record
      Returns:
      a hash code value for this object
    • equals

      public final boolean equals(Object o)
      Indicates whether some other object is "equal to" this one. The objects are equal if the other object is of the same class and if all the record components are equal. Reference components are compared with Objects::equals(Object,Object); primitive components are compared with the compare method from their corresponding wrapper classes.
      Specified by:
      equals in class Record
      Parameters:
      o - the object with which to compare
      Returns:
      true if this object is the same as the o argument; false otherwise.
    • text

      public String text()
      Returns the value of the text record component.
      Returns:
      the value of the text record component
    • confidence

      public double confidence()
      Returns the value of the confidence record component.
      Returns:
      the value of the confidence record component
    • regions

      public List<OcrRegion> regions()
      Returns the value of the regions record component.
      Returns:
      the value of the regions record component
    • pageNumber

      public int pageNumber()
      Returns the value of the pageNumber record component.
      Returns:
      the value of the pageNumber record component