Class XmlSplitter

java.lang.Object
cloud.opencode.base.xml.splitter.XmlSplitter

public final class XmlSplitter extends Object
XML Splitter - Splits large XML streams by element name XML 拆分器 - 按元素名称拆分大型 XML 流

This utility class provides stream-based XML splitting using StAX parsing. It scans the XML input for elements matching a given name, extracts each matching element (including its full subtree) into a standalone XmlDocument, and passes it to a callback handler.

此工具类使用 StAX 解析提供基于流的 XML 拆分。它扫描 XML 输入以查找与给定名称匹配的元素, 将每个匹配元素(包括其完整子树)提取为独立的 XmlDocument,并传递给回调处理器。

Features | 主要功能:

  • Stream-based splitting with O(1) memory per fragment - 基于流的拆分,每个片段 O(1) 内存
  • Correct depth tracking for nested elements - 嵌套元素的正确深度跟踪
  • Multiple input sources: InputStream, Path, String - 多种输入源:输入流、路径、字符串
  • Indexed splitting with SplitResult - 带索引的拆分,使用 SplitResult
  • Collect-all and count modes - 全部收集和计数模式
  • Secure parsing via SecureParserFactory - 通过 SecureParserFactory 进行安全解析

Usage Examples | 使用示例:

// Split and process each <item> element
XmlSplitter.split(inputStream, "item", doc -> {
    String name = doc.xpath("//name/text()");
    System.out.println(name);
});

// Collect all <record> fragments
List<XmlDocument> records = XmlSplitter.splitAll(xml, "record");

// Count elements without loading into memory
int count = XmlSplitter.count(inputStream, "item");

Performance | 性能特性:

  • Time complexity: O(n) where n = input size - 时间复杂度: O(n),n 为输入大小
  • Space complexity: O(m) where m = largest matching fragment - 空间复杂度: O(m),m 为最大匹配片段

Security | 安全性:

  • Thread-safe: Yes (stateless utility) - 线程安全: 是(无状态工具)
  • Null-safe: No (null inputs throw exceptions) - 空值安全: 否(空值输入抛出异常)
  • XXE protection enabled via SecureParserFactory - 通过 SecureParserFactory 启用 XXE 防护
Since:
JDK 25, opencode-base-xml V1.0.3
Author:
Leon Soo www.LeonSoo.com
See Also:
  • Method Details

    • split

      public static void split(InputStream input, String elementName, Consumer<XmlDocument> handler)
      Splits an XML input stream and processes each matching element via callback. 拆分 XML 输入流并通过回调处理每个匹配元素。
      Parameters:
      input - the input stream | 输入流
      elementName - the element name to match | 要匹配的元素名称
      handler - the callback handler | 回调处理器
      Throws:
      OpenXmlException - if splitting fails | 如果拆分失败则抛出异常
    • split

      public static void split(Path path, String elementName, Consumer<XmlDocument> handler)
      Splits an XML file and processes each matching element via callback. 拆分 XML 文件并通过回调处理每个匹配元素。
      Parameters:
      path - the file path | 文件路径
      elementName - the element name to match | 要匹配的元素名称
      handler - the callback handler | 回调处理器
      Throws:
      OpenXmlException - if splitting fails | 如果拆分失败则抛出异常
    • split

      public static void split(String xml, String elementName, Consumer<XmlDocument> handler)
      Splits an XML string and processes each matching element via callback. 拆分 XML 字符串并通过回调处理每个匹配元素。
      Parameters:
      xml - the XML string | XML 字符串
      elementName - the element name to match | 要匹配的元素名称
      handler - the callback handler | 回调处理器
      Throws:
      OpenXmlException - if splitting fails | 如果拆分失败则抛出异常
    • splitIndexed

      public static void splitIndexed(InputStream input, String elementName, Consumer<SplitResult> handler)
      Splits an XML input stream with index and processes each matching element via callback. 拆分带索引的 XML 输入流并通过回调处理每个匹配元素。
      Parameters:
      input - the input stream | 输入流
      elementName - the element name to match | 要匹配的元素名称
      handler - the callback handler | 回调处理器
      Throws:
      OpenXmlException - if splitting fails | 如果拆分失败则抛出异常
    • splitAll

      public static List<XmlDocument> splitAll(String xml, String elementName)
      Collects all matching element fragments from an XML string. 从 XML 字符串中收集所有匹配的元素片段。
      Parameters:
      xml - the XML string | XML 字符串
      elementName - the element name to match | 要匹配的元素名称
      Returns:
      list of document fragments | 文档片段列表
      Throws:
      OpenXmlException - if splitting fails | 如果拆分失败则抛出异常
    • count

      public static int count(InputStream input, String elementName)
      Counts matching elements in an XML input stream using O(1) memory. 使用 O(1) 内存计算 XML 输入流中的匹配元素数量。
      Parameters:
      input - the input stream | 输入流
      elementName - the element name to count | 要计数的元素名称
      Returns:
      the count of matching elements | 匹配元素的数量
      Throws:
      OpenXmlException - if counting fails | 如果计数失败则抛出异常
    • count

      public static int count(String xml, String elementName)
      Counts matching elements in an XML string using O(1) memory. 使用 O(1) 内存计算 XML 字符串中的匹配元素数量。
      Parameters:
      xml - the XML string | XML 字符串
      elementName - the element name to count | 要计数的元素名称
      Returns:
      the count of matching elements | 匹配元素的数量
      Throws:
      OpenXmlException - if counting fails | 如果计数失败则抛出异常