Class OpenGrapheme

java.lang.Object
cloud.opencode.base.string.unicode.OpenGrapheme

public final class OpenGrapheme extends Object
Grapheme Cluster Utility - Correctly handles visual characters including emoji and combining marks. 字素簇工具类 - 正确处理包括emoji和组合标记在内的可视字符。

Standard String.length() counts UTF-16 code units, which gives misleading results for emoji, combining characters, and other multi-code-unit graphemes. This utility uses BreakIterator to correctly identify grapheme cluster boundaries.

标准的 String.length() 计算UTF-16代码单元,对emoji、组合字符等多代码单元字素 会给出误导性结果。本工具使用 BreakIterator 正确识别字素簇边界。

Features | 主要功能:

  • Grapheme-aware length calculation - 字素感知的长度计算
  • Grapheme-safe substring extraction - 字素安全的子串提取
  • Grapheme-safe string reversal - 字素安全的字符串反转
  • Display width calculation for East Asian characters - 东亚字符显示宽度计算
  • Width-aware string truncation - 宽度感知的字符串截断

Usage Examples | 使用示例:

OpenGrapheme.length("a👨‍👩‍👧‍👦b"); // 3
OpenGrapheme.reverse("a👨‍👩‍👧‍👦b"); // "b👨‍👩‍👧‍👦a"
OpenGrapheme.displayWidth("Hello你好"); // 9

Performance | 性能特性:

  • Time complexity: O(n) for all operations - 时间复杂度: 所有操作均为O(n)
  • Space complexity: O(n) - 空间复杂度: O(n)

Security | 安全性:

  • Thread-safe: Yes (stateless utility) - 线程安全: 是(无状态工具类)
  • Null-safe: Yes - 空值安全: 是
Since:
JDK 25, opencode-base-string V1.0.3
Author:
Leon Soo www.LeonSoo.com
See Also:
  • Method Details

    • length

      public static int length(String str)
      Count grapheme clusters (visual characters). 计算字素簇(可视字符)数量。

      Examples | 示例:

      length(null)                        = 0
      length("")                          = 0
      length("hello")                     = 5
      length("a👨‍👩‍👧‍👦b") = 3
      
      Parameters:
      str - the string to measure | 要测量的字符串
      Returns:
      the number of grapheme clusters | 字素簇数量
    • substring

      public static String substring(String str, int beginIndex, int endIndex)
      Grapheme-safe substring by grapheme index. 基于字素索引的安全子串。

      Examples | 示例:

      substring("hello", 1, 3)  = "el"
      substring(null, 0, 1)     = null
      
      Parameters:
      str - the source string | 源字符串
      beginIndex - the beginning grapheme index (inclusive) | 起始字素索引(包含)
      endIndex - the ending grapheme index (exclusive) | 结束字素索引(不包含)
      Returns:
      the substring based on grapheme indices | 基于字素索引的子串
      Throws:
      IndexOutOfBoundsException - if indices are out of range | 如果索引越界
    • reverse

      public static String reverse(String str)
      Grapheme-safe reverse that preserves emoji and combining characters. 保留emoji和组合字符的安全反转。

      Examples | 示例:

      reverse("abc")   = "cba"
      reverse(null)     = null
      reverse("")       = ""
      reverse("a👨‍👩‍👧‍👦b") = "b👨‍👩‍👧‍👦a"
      
      Parameters:
      str - the string to reverse | 要反转的字符串
      Returns:
      the reversed string | 反转后的字符串
    • displayWidth

      public static int displayWidth(String str)
      Calculate display width accounting for East Asian double-width characters. 计算考虑东亚双宽度字符的显示宽度。

      CJK characters count as width 2, others as width 1.

      CJK字符宽度为2,其他字符宽度为1。

      Examples | 示例:

      displayWidth("hello")     = 5
      displayWidth("你好")       = 4
      displayWidth("Hi你好")     = 6
      displayWidth(null)        = 0
      
      Parameters:
      str - the string to measure | 要测量的字符串
      Returns:
      the display width in columns | 以列为单位的显示宽度
    • truncateToWidth

      public static String truncateToWidth(String str, int maxWidth)
      Truncate string to fit within maxWidth display columns. 截断字符串以适应最大显示列宽。

      Uses "..." as the default ellipsis.

      使用"..."作为默认省略号。

      Parameters:
      str - the string to truncate | 要截断的字符串
      maxWidth - the maximum display width | 最大显示宽度
      Returns:
      the truncated string | 截断后的字符串
    • truncateToWidth

      public static String truncateToWidth(String str, int maxWidth, String ellipsis)
      Truncate string to fit within maxWidth display columns with custom ellipsis. 截断字符串以适应最大显示列宽,使用自定义省略号。

      Examples | 示例:

      truncateToWidth("hello", 10, "...")         = "hello"
      truncateToWidth("hello world", 8, "...")     = "hello..."
      truncateToWidth("你好世界", 5, "...")   = "你..."
      truncateToWidth(null, 10, "...")             = ""
      
      Parameters:
      str - the string to truncate | 要截断的字符串
      maxWidth - the maximum display width | 最大显示宽度
      ellipsis - the ellipsis string to append | 要追加的省略号字符串
      Returns:
      the truncated string | 截断后的字符串
      Throws:
      IllegalArgumentException - if maxWidth is negative | 如果maxWidth为负数