Class XmlSplitter
java.lang.Object
cloud.opencode.base.xml.splitter.XmlSplitter
XML Splitter - Splits large XML streams by element name
XML 拆分器 - 按元素名称拆分大型 XML 流
This utility class provides stream-based XML splitting using StAX parsing.
It scans the XML input for elements matching a given name, extracts each matching
element (including its full subtree) into a standalone XmlDocument, and
passes it to a callback handler.
此工具类使用 StAX 解析提供基于流的 XML 拆分。它扫描 XML 输入以查找与给定名称匹配的元素,
将每个匹配元素(包括其完整子树)提取为独立的 XmlDocument,并传递给回调处理器。
Features | 主要功能:
- Stream-based splitting with O(1) memory per fragment - 基于流的拆分,每个片段 O(1) 内存
- Correct depth tracking for nested elements - 嵌套元素的正确深度跟踪
- Multiple input sources: InputStream, Path, String - 多种输入源:输入流、路径、字符串
- Indexed splitting with
SplitResult- 带索引的拆分,使用SplitResult - Collect-all and count modes - 全部收集和计数模式
- Secure parsing via
SecureParserFactory- 通过SecureParserFactory进行安全解析
Usage Examples | 使用示例:
// Split and process each <item> element
XmlSplitter.split(inputStream, "item", doc -> {
String name = doc.xpath("//name/text()");
System.out.println(name);
});
// Collect all <record> fragments
List<XmlDocument> records = XmlSplitter.splitAll(xml, "record");
// Count elements without loading into memory
int count = XmlSplitter.count(inputStream, "item");
Performance | 性能特性:
- Time complexity: O(n) where n = input size - 时间复杂度: O(n),n 为输入大小
- Space complexity: O(m) where m = largest matching fragment - 空间复杂度: O(m),m 为最大匹配片段
Security | 安全性:
- Thread-safe: Yes (stateless utility) - 线程安全: 是(无状态工具)
- Null-safe: No (null inputs throw exceptions) - 空值安全: 否(空值输入抛出异常)
- XXE protection enabled via SecureParserFactory - 通过 SecureParserFactory 启用 XXE 防护
- Since:
- JDK 25, opencode-base-xml V1.0.3
- Author:
- Leon Soo www.LeonSoo.com
- See Also:
-
Method Summary
Modifier and TypeMethodDescriptionstatic intcount(InputStream input, String elementName) Counts matching elements in an XML input stream using O(1) memory.static intCounts matching elements in an XML string using O(1) memory.static voidsplit(InputStream input, String elementName, Consumer<XmlDocument> handler) Splits an XML input stream and processes each matching element via callback.static voidsplit(String xml, String elementName, Consumer<XmlDocument> handler) Splits an XML string and processes each matching element via callback.static voidsplit(Path path, String elementName, Consumer<XmlDocument> handler) Splits an XML file and processes each matching element via callback.static List<XmlDocument> Collects all matching element fragments from an XML string.static voidsplitIndexed(InputStream input, String elementName, Consumer<SplitResult> handler) Splits an XML input stream with index and processes each matching element via callback.
-
Method Details
-
split
Splits an XML input stream and processes each matching element via callback. 拆分 XML 输入流并通过回调处理每个匹配元素。- Parameters:
input- the input stream | 输入流elementName- the element name to match | 要匹配的元素名称handler- the callback handler | 回调处理器- Throws:
OpenXmlException- if splitting fails | 如果拆分失败则抛出异常
-
split
Splits an XML file and processes each matching element via callback. 拆分 XML 文件并通过回调处理每个匹配元素。- Parameters:
path- the file path | 文件路径elementName- the element name to match | 要匹配的元素名称handler- the callback handler | 回调处理器- Throws:
OpenXmlException- if splitting fails | 如果拆分失败则抛出异常
-
split
Splits an XML string and processes each matching element via callback. 拆分 XML 字符串并通过回调处理每个匹配元素。- Parameters:
xml- the XML string | XML 字符串elementName- the element name to match | 要匹配的元素名称handler- the callback handler | 回调处理器- Throws:
OpenXmlException- if splitting fails | 如果拆分失败则抛出异常
-
splitIndexed
public static void splitIndexed(InputStream input, String elementName, Consumer<SplitResult> handler) Splits an XML input stream with index and processes each matching element via callback. 拆分带索引的 XML 输入流并通过回调处理每个匹配元素。- Parameters:
input- the input stream | 输入流elementName- the element name to match | 要匹配的元素名称handler- the callback handler | 回调处理器- Throws:
OpenXmlException- if splitting fails | 如果拆分失败则抛出异常
-
splitAll
Collects all matching element fragments from an XML string. 从 XML 字符串中收集所有匹配的元素片段。- Parameters:
xml- the XML string | XML 字符串elementName- the element name to match | 要匹配的元素名称- Returns:
- list of document fragments | 文档片段列表
- Throws:
OpenXmlException- if splitting fails | 如果拆分失败则抛出异常
-
count
Counts matching elements in an XML input stream using O(1) memory. 使用 O(1) 内存计算 XML 输入流中的匹配元素数量。- Parameters:
input- the input stream | 输入流elementName- the element name to count | 要计数的元素名称- Returns:
- the count of matching elements | 匹配元素的数量
- Throws:
OpenXmlException- if counting fails | 如果计数失败则抛出异常
-
count
Counts matching elements in an XML string using O(1) memory. 使用 O(1) 内存计算 XML 字符串中的匹配元素数量。- Parameters:
xml- the XML string | XML 字符串elementName- the element name to count | 要计数的元素名称- Returns:
- the count of matching elements | 匹配元素的数量
- Throws:
OpenXmlException- if counting fails | 如果计数失败则抛出异常
-