Package org.mycore.mods.merger
Class MCRTextNormalizer
java.lang.Object
org.mycore.mods.merger.MCRTextNormalizer
Normalizes text to be fault-tolerant when matching for duplicates.
Accents, umlauts, case are normalized. Punctuation and non-alphabetic/non-digit characters are removed.
- Author:
- Frank Lützenkirchen
-
Constructor Summary
-
Method Summary
-
Constructor Details
-
MCRTextNormalizer
public MCRTextNormalizer()
-
-
Method Details
-
normalize
Normalizes text to be fault-tolerant when matching for duplicates. Accents, umlauts, case are normalized. Punctuation and non-alphabetic/non-digit characters are removed. -
normalizeText
Normalizes text to be fault-tolerant when matching for duplicates. Accents, umlauts, case are normalized. Punctuation and non-alphabetic/non-digit characters are removed.
-