Class CsvSplit
java.lang.Object
cloud.opencode.base.csv.split.CsvSplit
CSV Split - Utility for splitting CSV documents
CSV拆分 - CSV文档拆分工具
Provides static methods for splitting a CsvDocument by row count,
by a predicate condition, or by grouping on a column value. All methods
preserve the original document's headers in the resulting sub-documents.
提供静态方法,按行数、按谓词条件或按列值分组拆分 CsvDocument。
所有方法在结果子文档中保留原始文档的标题。
Features | 主要功能:
- Split by size (fixed chunk size) - 按大小拆分(固定块大小)
- Split by condition (binary partition) - 按条件拆分(二元分区)
- Split by column value (GROUP BY) - 按列值拆分(分组)
Usage Examples | 使用示例:
List<CsvDocument> chunks = CsvSplit.bySize(doc, 100);
List<CsvDocument> parts = CsvSplit.byCondition(doc, row -> row.get(0).startsWith("A"));
Map<String, CsvDocument> groups = CsvSplit.byColumn(doc, "department");
Security | 安全性:
- Thread-safe: Yes (stateless utility) - 线程安全: 是(无状态工具)
- Null-safe: Validates all inputs - 空值安全: 验证所有输入
- Since:
- JDK 25, opencode-base-csv V1.0.3
- Author:
- Leon Soo www.LeonSoo.com
- See Also:
-
Method Summary
Modifier and TypeMethodDescriptionstatic Map<String, CsvDocument> byColumn(CsvDocument doc, String column) Splits a document by grouping rows on a column value (like SQL GROUP BY) 按列值分组拆分文档(类似SQL GROUP BY)static List<CsvDocument> byCondition(CsvDocument doc, Predicate<CsvRow> predicate) Splits a document into two: rows matching the predicate, and rows not matching 将文档拆分为两部分:匹配谓词的行和不匹配的行static List<CsvDocument> bySize(CsvDocument doc, int maxRows) Splits a document into chunks of at most maxRows rows each 将文档拆分为每块最多maxRows行的块
-
Method Details
-
bySize
Splits a document into chunks of at most maxRows rows each 将文档拆分为每块最多maxRows行的块Each sub-document shares the same headers as the original. The last chunk may contain fewer rows.
每个子文档与原始文档共享相同的标题。最后一块可能包含更少的行。
- Parameters:
doc- the document to split | 要拆分的文档maxRows- the maximum number of rows per chunk | 每块最大行数- Returns:
- a list of sub-documents | 子文档列表
- Throws:
NullPointerException- if doc is null | 如果doc为nullOpenCsvException- if maxRows is not positive | 如果maxRows不为正数
-
byCondition
Splits a document into two: rows matching the predicate, and rows not matching 将文档拆分为两部分:匹配谓词的行和不匹配的行Always returns exactly 2 documents: [matching, non-matching]. Both share the same headers.
始终返回恰好2个文档:[匹配的, 不匹配的]。两者共享相同的标题。
- Parameters:
doc- the document to split | 要拆分的文档predicate- the row predicate | 行谓词- Returns:
- a list of exactly 2 documents [matching, non-matching] | 恰好2个文档的列表
- Throws:
NullPointerException- if doc or predicate is null | 如果doc或predicate为null
-
byColumn
Splits a document by grouping rows on a column value (like SQL GROUP BY) 按列值分组拆分文档(类似SQL GROUP BY)Returns a
LinkedHashMappreserving first-seen order of column values. Each sub-document shares the same headers.返回保留列值首次出现顺序的
LinkedHashMap。 每个子文档共享相同的标题。- Parameters:
doc- the document to split | 要拆分的文档column- the column name to group by | 用于分组的列名- Returns:
- a map of column value to sub-document | 列值到子文档的映射
- Throws:
NullPointerException- if doc or column is null | 如果doc或column为nullOpenCsvException- if column is not found in headers | 如果列在标题中未找到
-