org.mycore.common
Class MCRNormalizer
java.lang.Object
org.mycore.common.MCRNormalizer
public class MCRNormalizer
- extends Object
This class implements only static methods to normalize text values. Rules
written as x>u .You can configure this normalization with three property
values
- MCR.Metadata.Normalize.AddRule - add more rules to the default rule
- MCR.Metadata.Normalize.SetRule - replace the default rule
- MCR.Metadata.Normalize.DiacriticRule true (standard) | false - first
rule, remove diacritics from letters
Here you can see how decomposition works:
http://www.icu-project.org/apiref/icu4j/com/ibm/icu/text/Normalizer.html
These diacritics will be removed from letters when property is true:
"?", // ́ (0xcc 0x81 = 204 129) COMBINING ACUTE ACCENT
"?", // ̀ (0xcc 0x80 = 204 128) COMBINING GRAVE ACCENT
"?", // ̂ (0xcc 0x82 = 204 130) COMBINING CIRCUMFLEX ACCENT
"?", // ̇ (0xcc 0x87 = 204 135) COMBINING DOT ABOVE
"?", // ̈ (0xcc 0x88 = 204 136) COMBINING DIAERESIS
"?", // ̆ (0xcc 0x86 = 204 134) COMBINING BREVE
"?", // ̋ (0xcc 0x8b = 204 139) COMBINING DOUBLE ACUTE ACCENT
"?", // ̌ (0xcc 0x8c = 204 140) COMBINING CARON (Hacek)
"?", // ̊ (0xcc 0x8a = 204 138) COMBINING RING ABOVE
"?", // ̄ (0xcc 0x84 = 204 132) COMBINING MACRON
"?", // ̮ (0xcc 0xae = 204 174) COMBINING BREVE BELOW
"?", // ̨ (0xcc 0xa8 = 204 168) COMBINING OGONEK
"?", // ̧ (0xcc 0xa7 = 204 167) COMBINING CEDILLA
"?", // ̣ (0xcc 0xa3 = 204 163) COMBINING DOT BELOW
"?", // ̸ (0xcc 0xb8 = 204 184) COMBINING LONG SOLIDUS OVERLAY
"?", // ̶ (0xcc 0xb6 = 204 182) COMBINING LONG STROKE OVERLAY
"?", // ̲ (0xcc 0xb2 = 204 178) COMBINING LOW LINE
"?"};// ̃ (0xcc 0x83 = 204 131) COMBINING TILDE
- Version:
- $Revision: 15222 $ $Date: 2009-05-19 12:25:55 +0200 (Tue, 19 May 2009) $
- Author:
- Frank Lützenkirchen, Thomas Scheffler (yagee), Jens Kupferschmidt, Harald Richter
| Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
logger
static Logger logger
MCRNormalizer
public MCRNormalizer()
normalizeString
public static final String normalizeString(String in)
- This method replaces umlauts and other special characters of languages
like german to normalized lowercase a-z characters.
- Parameters:
in - the String to be normalized
- Returns:
- the normalized String in lower case.
normalizeString
public static final String normalizeString(String in,
boolean reallyNormalize)
setDoNormalize
public static final void setDoNormalize(boolean value)
- Activates or deactivates normalizing. Used in miless software to make
indexing of scorm and searching possible
- Parameters:
value - true normalize strings false do not normalize strings