Class OpenSimilarity

java.lang.Object
cloud.opencode.base.string.similarity.OpenSimilarity

public final class OpenSimilarity extends Object
String Similarity Facade - Unified entry point for string similarity calculations. 字符串相似度门面 - 字符串相似度计算的统一入口。

Features | 主要功能:

  • Levenshtein distance and similarity - Levenshtein距离和相似度
  • Jaccard similarity with configurable N-gram - Jaccard相似度可配置N-gram
  • Cosine similarity - 余弦相似度
  • Jaro-Winkler similarity - Jaro-Winkler相似度

Usage Examples | 使用示例:

double lev = OpenSimilarity.levenshteinSimilarity("hello", "hallo");
double jac = OpenSimilarity.jaccardSimilarity("abc", "abd");
double cos = OpenSimilarity.cosineSimilarity("hello world", "hello java");

Security | 安全性:

  • Thread-safe: Yes (stateless utility) - 线程安全: 是(无状态工具类)
  • Null-safe: Partial (depends on algorithm) - 空值安全: 部分(取决于算法)
Since:
JDK 25, opencode-base-string V1.0.0
Author:
Leon Soo www.LeonSoo.com
See Also:
  • Method Details

    • levenshteinDistance

      public static int levenshteinDistance(String s1, String s2)
    • levenshteinSimilarity

      public static double levenshteinSimilarity(String s1, String s2)
    • boundedLevenshteinDistance

      public static int boundedLevenshteinDistance(String s1, String s2, int threshold)
      Bounded Levenshtein distance with early termination. 带阈值的有界Levenshtein距离,提前终止。

      Returns the edit distance if <= threshold, otherwise -1.

      如果编辑距离 <= threshold 则返回距离,否则返回 -1

      Parameters:
      s1 - the first string | 第一个字符串
      s2 - the second string | 第二个字符串
      threshold - the maximum acceptable distance | 最大可接受距离
      Returns:
      the edit distance if <= threshold, otherwise -1 | 如果编辑距离 <= threshold 则返回距离,否则返回 -1
    • jaccardSimilarity

      public static double jaccardSimilarity(String s1, String s2)
    • jaccardSimilarity

      public static double jaccardSimilarity(String s1, String s2, int nGram)
    • cosineSimilarity

      public static double cosineSimilarity(String s1, String s2)
    • jaroWinklerSimilarity

      public static double jaroWinklerSimilarity(String s1, String s2)
    • longestCommonSubsequence

      public static int longestCommonSubsequence(String s1, String s2)
    • longestCommonSubstring

      public static int longestCommonSubstring(String s1, String s2)
    • isSimilar

      public static boolean isSimilar(String s1, String s2, double threshold)
    • findMostSimilar

      public static String findMostSimilar(String target, List<String> candidates)
    • findSimilar

      public static List<String> findSimilar(String target, List<String> candidates, double threshold)