org.mycore.datamodel.ifs.extractors
Class MCRDataExtractorPDF
java.lang.Object
org.mycore.common.events.MCREventHandlerBase
org.mycore.datamodel.ifs.extractors.MCRDataExtractor
org.mycore.datamodel.ifs.extractors.MCRDataExtractorPDF
- All Implemented Interfaces:
- MCREventHandler
public class MCRDataExtractorPDF
- extends MCRDataExtractor
Extracts metadata from PDF files using the PDFBox library. The number of
pages, document information like author and title, and the titles of all
outline items (table of contents) are extracted. See http://www.pdfbox.org/
for details.
- Version:
- $Revision: 13085 $ $Date: 2008-02-06 18:27:24 +0100 (Mi, 06 Feb 2008) $
- Author:
- Frank Lützenkirchen
| Methods inherited from class org.mycore.common.events.MCREventHandlerBase |
doHandleEvent, doNothing, handleClassificationCreated, handleClassificationDeleted, handleClassificationRepaired, handleClassificationUpdated, handleDerivateCreated, handleDerivateDeleted, handleDerivateRepaired, handleDerivateUpdated, handleFileDeleted, handleFileRepaired, handleObjectCreated, handleObjectDeleted, handleObjectRepaired, handleObjectUpdated, undoClassificationCreated, undoClassificationDeleted, undoClassificationRepaired, undoClassificationUpdated, undoDerivateCreated, undoDerivateDeleted, undoDerivateRepaired, undoDerivateUpdated, undoFileCreated, undoFileDeleted, undoFileRepaired, undoFileUpdated, undoHandleEvent, undoObjectCreated, undoObjectDeleted, undoObjectRepaired, undoObjectUpdated |
| Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
MCRDataExtractorPDF
public MCRDataExtractorPDF()
getSupportedContentTypeIDs
protected String getSupportedContentTypeIDs()
- Description copied from class:
MCRDataExtractor
- Returns the IDs of the FileContentTypes that are supported by this
metadata extractor. Only if the given file matches one of these types,
metadata is extracted.
- Specified by:
getSupportedContentTypeIDs in class MCRDataExtractor
- Returns:
- a String of supported MCRFileContentType ID(s), separated by
spaces
extractData
protected void extractData(Element container,
InputStream in)
throws Exception
- Description copied from class:
MCRDataExtractor
- Extracts metadata from a file. This method must be overwritten by
subclasses.
- Specified by:
extractData in class MCRDataExtractor
- Parameters:
container - empty XML element that the extractor should fill with datain - the InputStream to read the file's content from
- Throws:
Exception
main
public static void main(String[] args)
- Test application that outputs extracted metadata for a given local file.
- Parameters:
args - the path to a locally stored PDF file