org.mycore.services.plugins
Class XmlHtmlPlugin

java.lang.Object
  extended by org.mycore.services.plugins.XmlHtmlPlugin
All Implemented Interfaces:
TextFilterPlugin

public class XmlHtmlPlugin
extends Object
implements TextFilterPlugin

Converts XML, XTHML and HTML to plain text for indexing

Author:
Frank Lützenkirchen, Harald Richter

Constructor Summary
XmlHtmlPlugin()
           
 
Method Summary
static String getFullText(String html)
          Converts HTML string to XML to be able to extract text nodes *
 String getInfo()
          may contain some additional Information on the plugin
 int getMajorNumber()
          should return the major version number
 int getMinorNumber()
          should return the minor version number
 String getName()
          should return a Name of the plugin
 HashSet getSupportedContentTypes()
          returns a list of all supported MCRFileContentTypes.
 Reader transform(MCRFileContentType ct, InputStream input)
          onverts a given Inputstream to Textstream which should contain a textual representation of the source.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

XmlHtmlPlugin

public XmlHtmlPlugin()
Method Detail

getName

public String getName()
Description copied from interface: TextFilterPlugin
should return a Name of the plugin

Specified by:
getName in interface TextFilterPlugin
Returns:
Plugin name
See Also:
TextFilterPlugin.getName()

getInfo

public String getInfo()
Description copied from interface: TextFilterPlugin
may contain some additional Information on the plugin

Specified by:
getInfo in interface TextFilterPlugin
Returns:
further Informations on the plugin
See Also:
TextFilterPlugin.getInfo()

getSupportedContentTypes

public HashSet getSupportedContentTypes()
Description copied from interface: TextFilterPlugin
returns a list of all supported MCRFileContentTypes. These file extensions must be delivered without the leading dot.

Specified by:
getSupportedContentTypes in interface TextFilterPlugin
Returns:
HashSet List of file extensions
See Also:
getSupportedContentTypes()

transform

public Reader transform(MCRFileContentType ct,
                        InputStream input)
                 throws FilterPluginTransformException
Description copied from interface: TextFilterPlugin
onverts a given Inputstream to Textstream which should contain a textual representation of the source.

Specified by:
transform in interface TextFilterPlugin
input - File in foreign format
Returns:
Inputstream textual representation of input
Throws:
FilterPluginTransformException
See Also:
org.mycore.services.plugins.TextFilterPlugin#transform(org.mycore.datamodel.ifs.MCRFileContentType,org.mycore.datamodel.ifs.MCRContentInputStream, java.io.OutputStream)

getMajorNumber

public int getMajorNumber()
Description copied from interface: TextFilterPlugin
should return the major version number

Specified by:
getMajorNumber in interface TextFilterPlugin
Returns:
major version number
See Also:
TextFilterPlugin.getMajorNumber()

getMinorNumber

public int getMinorNumber()
Description copied from interface: TextFilterPlugin
should return the minor version number

Specified by:
getMinorNumber in interface TextFilterPlugin
Returns:
minor version number
See Also:
TextFilterPlugin.getMinorNumber()

getFullText

public static String getFullText(String html)
Converts HTML string to XML to be able to extract text nodes *