Class OpenSimilarity
java.lang.Object
cloud.opencode.base.string.similarity.OpenSimilarity
String Similarity Facade - Unified entry point for string similarity calculations.
字符串相似度门面 - 字符串相似度计算的统一入口。
Features | 主要功能:
- Levenshtein distance and similarity - Levenshtein距离和相似度
- Jaccard similarity with configurable N-gram - Jaccard相似度可配置N-gram
- Cosine similarity - 余弦相似度
- Jaro-Winkler similarity - Jaro-Winkler相似度
Usage Examples | 使用示例:
double lev = OpenSimilarity.levenshteinSimilarity("hello", "hallo");
double jac = OpenSimilarity.jaccardSimilarity("abc", "abd");
double cos = OpenSimilarity.cosineSimilarity("hello world", "hello java");
Security | 安全性:
- Thread-safe: Yes (stateless utility) - 线程安全: 是(无状态工具类)
- Null-safe: Partial (depends on algorithm) - 空值安全: 部分(取决于算法)
- Since:
- JDK 25, opencode-base-string V1.0.0
- Author:
- Leon Soo www.LeonSoo.com
- See Also:
-
Method Summary
Modifier and TypeMethodDescriptionstatic intboundedLevenshteinDistance(String s1, String s2, int threshold) Bounded Levenshtein distance with early termination.static doublecosineSimilarity(String s1, String s2) static StringfindMostSimilar(String target, List<String> candidates) findSimilar(String target, List<String> candidates, double threshold) static booleanstatic doublejaccardSimilarity(String s1, String s2) static doublejaccardSimilarity(String s1, String s2, int nGram) static doublejaroWinklerSimilarity(String s1, String s2) static intlevenshteinDistance(String s1, String s2) static doublelevenshteinSimilarity(String s1, String s2) static intlongestCommonSubsequence(String s1, String s2) static intlongestCommonSubstring(String s1, String s2)
-
Method Details
-
levenshteinDistance
-
levenshteinSimilarity
-
boundedLevenshteinDistance
Bounded Levenshtein distance with early termination. 带阈值的有界Levenshtein距离,提前终止。Returns the edit distance if
<= threshold, otherwise-1.如果编辑距离
<= threshold则返回距离,否则返回-1。- Parameters:
s1- the first string | 第一个字符串s2- the second string | 第二个字符串threshold- the maximum acceptable distance | 最大可接受距离- Returns:
- the edit distance if
<= threshold, otherwise-1| 如果编辑距离<= threshold则返回距离,否则返回-1
-
jaccardSimilarity
-
jaccardSimilarity
-
cosineSimilarity
-
jaroWinklerSimilarity
-
longestCommonSubsequence
-
longestCommonSubstring
-
isSimilar
-
findMostSimilar
-
findSimilar
-