Class CsvSplit

java.lang.Object
cloud.opencode.base.csv.split.CsvSplit

public final class CsvSplit extends Object
CSV Split - Utility for splitting CSV documents CSV拆分 - CSV文档拆分工具

Provides static methods for splitting a CsvDocument by row count, by a predicate condition, or by grouping on a column value. All methods preserve the original document's headers in the resulting sub-documents.

提供静态方法,按行数、按谓词条件或按列值分组拆分 CsvDocument。 所有方法在结果子文档中保留原始文档的标题。

Features | 主要功能:

  • Split by size (fixed chunk size) - 按大小拆分(固定块大小)
  • Split by condition (binary partition) - 按条件拆分(二元分区)
  • Split by column value (GROUP BY) - 按列值拆分(分组)

Usage Examples | 使用示例:

List<CsvDocument> chunks = CsvSplit.bySize(doc, 100);
List<CsvDocument> parts = CsvSplit.byCondition(doc, row -> row.get(0).startsWith("A"));
Map<String, CsvDocument> groups = CsvSplit.byColumn(doc, "department");

Security | 安全性:

  • Thread-safe: Yes (stateless utility) - 线程安全: 是(无状态工具)
  • Null-safe: Validates all inputs - 空值安全: 验证所有输入
Since:
JDK 25, opencode-base-csv V1.0.3
Author:
Leon Soo www.LeonSoo.com
See Also:
  • Method Details

    • bySize

      public static List<CsvDocument> bySize(CsvDocument doc, int maxRows)
      Splits a document into chunks of at most maxRows rows each 将文档拆分为每块最多maxRows行的块

      Each sub-document shares the same headers as the original. The last chunk may contain fewer rows.

      每个子文档与原始文档共享相同的标题。最后一块可能包含更少的行。

      Parameters:
      doc - the document to split | 要拆分的文档
      maxRows - the maximum number of rows per chunk | 每块最大行数
      Returns:
      a list of sub-documents | 子文档列表
      Throws:
      NullPointerException - if doc is null | 如果doc为null
      OpenCsvException - if maxRows is not positive | 如果maxRows不为正数
    • byCondition

      public static List<CsvDocument> byCondition(CsvDocument doc, Predicate<CsvRow> predicate)
      Splits a document into two: rows matching the predicate, and rows not matching 将文档拆分为两部分:匹配谓词的行和不匹配的行

      Always returns exactly 2 documents: [matching, non-matching]. Both share the same headers.

      始终返回恰好2个文档:[匹配的, 不匹配的]。两者共享相同的标题。

      Parameters:
      doc - the document to split | 要拆分的文档
      predicate - the row predicate | 行谓词
      Returns:
      a list of exactly 2 documents [matching, non-matching] | 恰好2个文档的列表
      Throws:
      NullPointerException - if doc or predicate is null | 如果doc或predicate为null
    • byColumn

      public static Map<String, CsvDocument> byColumn(CsvDocument doc, String column)
      Splits a document by grouping rows on a column value (like SQL GROUP BY) 按列值分组拆分文档(类似SQL GROUP BY)

      Returns a LinkedHashMap preserving first-seen order of column values. Each sub-document shares the same headers.

      返回保留列值首次出现顺序的 LinkedHashMap。 每个子文档共享相同的标题。

      Parameters:
      doc - the document to split | 要拆分的文档
      column - the column name to group by | 用于分组的列名
      Returns:
      a map of column value to sub-document | 列值到子文档的映射
      Throws:
      NullPointerException - if doc or column is null | 如果doc或column为null
      OpenCsvException - if column is not found in headers | 如果列在标题中未找到