Class AhoCorasick
java.lang.Object
cloud.opencode.base.string.match.AhoCorasick
Aho-Corasick Multi-Pattern Matcher - Efficient multi-pattern string matching
Aho-Corasick 多模式匹配器 - 高效的多模式字符串匹配
Implements the Aho-Corasick algorithm for matching multiple patterns simultaneously in a single pass through the text. Time complexity is O(n + m + z) where n is text length, m is total pattern length, and z is number of matches.
实现 Aho-Corasick 算法,可在单次遍历文本中同时匹配多个模式。 时间复杂度为 O(n + m + z),其中 n 是文本长度,m 是模式总长度,z 是匹配数。
Common Use Cases | 常见用例:
- Sensitive word filtering - 敏感词过滤
- Multi-keyword search - 多关键词搜索
- Spam detection - 垃圾信息检测
- Content moderation - 内容审核
- Log analysis - 日志分析
Features | 主要功能:
- O(n + m + z) time complexity for multi-pattern matching - O(n + m + z)时间复杂度的多模式匹配
- Case-sensitive and case-insensitive matching modes - 区分大小写和不区分大小写匹配模式
- Find all matches, first match, or check containment - 查找所有匹配、首次匹配或检查包含
- Replace, filter (mask), and highlight matched patterns - 替换、过滤(屏蔽)和高亮匹配模式
- Automatic handling of overlapping matches - 自动处理重叠匹配
- Builder pattern for flexible configuration - 构建器模式实现灵活配置
Usage Examples | 使用示例:
// Build the matcher
AhoCorasick matcher = AhoCorasick.builder()
.addPattern("bad")
.addPattern("word")
.addPatterns(List.of("spam", "virus"))
.ignoreCase(true)
.build();
// Find all matches
List<PatternMatch> matches = matcher.findAll("This is a bad word");
// Check if contains any pattern
boolean contains = matcher.containsAny("Clean text here");
// Replace all matches
String result = matcher.replaceAll("Bad words here", "***");
// Filter sensitive words
String filtered = matcher.filter("Some bad content", '*');
Security | 安全性:
- Thread-safe: Yes (immutable after construction) - 线程安全: 是(构造后不可变)
- Null-safe: Yes - 空值安全: 是
- Since:
- JDK 25, opencode-base-string V1.2.0
- Author:
- Leon Soo www.LeonSoo.com
- See Also:
-
Nested Class Summary
Nested Classes -
Method Summary
Modifier and TypeMethodDescriptionstatic AhoCorasick.Builderbuilder()Creates a new builder.booleancontainsAny(String text) Checks if the text contains any pattern.intcountMatches(String text) Counts the total number of matches.Filters with asterisks.Filters (masks) all matches with a mask character.Finds all pattern matches in the text.Finds the first pattern match in the text.getMatchedPatterns(String text) Gets all matched patterns (unique).Returns all patterns.booleanhasPattern(String pattern) Checks if a pattern exists.Highlights all matches with tags.static AhoCorasickCreates a matcher from patterns.static AhoCorasickof(Collection<String> patterns) Creates a matcher from patterns.static AhoCorasickofIgnoreCase(String... patterns) Creates a case-insensitive matcher from patterns.static AhoCorasickofIgnoreCase(Collection<String> patterns) Creates a case-insensitive matcher from patterns.intReturns the number of patterns.replaceAll(String text, String replacement) Replaces all matches with a replacement string.
-
Method Details
-
builder
Creates a new builder. 创建新的构建器。- Returns:
- a new builder | 新的构建器
-
of
Creates a matcher from patterns. 从模式创建匹配器。- Parameters:
patterns- the patterns to match | 要匹配的模式- Returns:
- a new matcher | 新的匹配器
-
of
Creates a matcher from patterns. 从模式创建匹配器。- Parameters:
patterns- the patterns to match | 要匹配的模式- Returns:
- a new matcher | 新的匹配器
-
ofIgnoreCase
Creates a case-insensitive matcher from patterns. 从模式创建不区分大小写的匹配器。- Parameters:
patterns- the patterns to match | 要匹配的模式- Returns:
- a new matcher | 新的匹配器
-
ofIgnoreCase
Creates a case-insensitive matcher from patterns. 从模式创建不区分大小写的匹配器。- Parameters:
patterns- the patterns to match | 要匹配的模式- Returns:
- a new matcher | 新的匹配器
-
findAll
Finds all pattern matches in the text. 在文本中查找所有模式匹配。- Parameters:
text- the text to search | 要搜索的文本- Returns:
- list of all matches | 所有匹配的列表
-
findFirst
Finds the first pattern match in the text. 在文本中查找第一个模式匹配。- Parameters:
text- the text to search | 要搜索的文本- Returns:
- the first match, or empty if no match | 第一个匹配,如果没有匹配则为空
-
containsAny
Checks if the text contains any pattern. 检查文本是否包含任何模式。- Parameters:
text- the text to check | 要检查的文本- Returns:
- true if any pattern is found | 如果找到任何模式则返回true
-
countMatches
Counts the total number of matches. 统计匹配总数。- Parameters:
text- the text to search | 要搜索的文本- Returns:
- the number of matches | 匹配数
-
getMatchedPatterns
-
replaceAll
-
filter
-
filter
-
highlight
-
patternCount
public int patternCount()Returns the number of patterns. 返回模式数量。- Returns:
- pattern count | 模式数量
-
getPatterns
-
hasPattern
Checks if a pattern exists. 检查模式是否存在。- Parameters:
pattern- the pattern to check | 要检查的模式- Returns:
- true if exists | 如果存在则返回true
-