Class MCRTextNormalizer

java.lang.Object
org.mycore.mods.merger.MCRTextNormalizer

public class MCRTextNormalizer extends Object
Normalizes text to be fault-tolerant when matching for duplicates. Accents, umlauts, case are normalized. Punctuation and non-alphabetic/non-digit characters are removed.
Author:
Frank Lützenkirchen
  • Constructor Details

    • MCRTextNormalizer

      public MCRTextNormalizer()
  • Method Details

    • normalize

      public String normalize(String text)
      Normalizes text to be fault-tolerant when matching for duplicates. Accents, umlauts, case are normalized. Punctuation and non-alphabetic/non-digit characters are removed.
    • normalizeText

      public static String normalizeText(String text)
      Normalizes text to be fault-tolerant when matching for duplicates. Accents, umlauts, case are normalized. Punctuation and non-alphabetic/non-digit characters are removed.