org.mycore.datamodel.ifs.extractors
Class MCRDataExtractorPDF

java.lang.Object
  extended by org.mycore.common.events.MCREventHandlerBase
      extended by org.mycore.datamodel.ifs.extractors.MCRDataExtractor
          extended by org.mycore.datamodel.ifs.extractors.MCRDataExtractorPDF
All Implemented Interfaces:
MCREventHandler

public class MCRDataExtractorPDF
extends MCRDataExtractor

Extracts metadata from PDF files using the PDFBox library. The number of pages, document information like author and title, and the titles of all outline items (table of contents) are extracted. See http://www.pdfbox.org/ for details.

Version:
$Revision: 13085 $ $Date: 2008-02-06 18:27:24 +0100 (Mi, 06 Feb 2008) $
Author:
Frank Lützenkirchen

Constructor Summary
MCRDataExtractorPDF()
           
 
Method Summary
protected  void extractData(Element container, InputStream in)
          Extracts metadata from a file.
protected  String getSupportedContentTypeIDs()
          Returns the IDs of the FileContentTypes that are supported by this metadata extractor.
static void main(String[] args)
          Test application that outputs extracted metadata for a given local file.
 
Methods inherited from class org.mycore.datamodel.ifs.extractors.MCRDataExtractor
addDataValue, handleFileCreated, handleFileUpdated, outputData, testLocalFile
 
Methods inherited from class org.mycore.common.events.MCREventHandlerBase
doHandleEvent, doNothing, handleClassificationCreated, handleClassificationDeleted, handleClassificationRepaired, handleClassificationUpdated, handleDerivateCreated, handleDerivateDeleted, handleDerivateRepaired, handleDerivateUpdated, handleFileDeleted, handleFileRepaired, handleObjectCreated, handleObjectDeleted, handleObjectRepaired, handleObjectUpdated, undoClassificationCreated, undoClassificationDeleted, undoClassificationRepaired, undoClassificationUpdated, undoDerivateCreated, undoDerivateDeleted, undoDerivateRepaired, undoDerivateUpdated, undoFileCreated, undoFileDeleted, undoFileRepaired, undoFileUpdated, undoHandleEvent, undoObjectCreated, undoObjectDeleted, undoObjectRepaired, undoObjectUpdated
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

MCRDataExtractorPDF

public MCRDataExtractorPDF()
Method Detail

getSupportedContentTypeIDs

protected String getSupportedContentTypeIDs()
Description copied from class: MCRDataExtractor
Returns the IDs of the FileContentTypes that are supported by this metadata extractor. Only if the given file matches one of these types, metadata is extracted.

Specified by:
getSupportedContentTypeIDs in class MCRDataExtractor
Returns:
a String of supported MCRFileContentType ID(s), separated by spaces

extractData

protected void extractData(Element container,
                           InputStream in)
                    throws Exception
Description copied from class: MCRDataExtractor
Extracts metadata from a file. This method must be overwritten by subclasses.

Specified by:
extractData in class MCRDataExtractor
Parameters:
container - empty XML element that the extractor should fill with data
in - the InputStream to read the file's content from
Throws:
Exception

main

public static void main(String[] args)
Test application that outputs extracted metadata for a given local file.

Parameters:
args - the path to a locally stored PDF file