Package com.itextpdf.text.pdf.parser
Class TaggedPdfReaderTool
- java.lang.Object
-
- com.itextpdf.text.pdf.parser.TaggedPdfReaderTool
-
public class TaggedPdfReaderTool extends Object
Converts a tagged PDF document into an XML file.- Since:
- 5.0.2
-
-
Field Summary
Fields Modifier and Type Field Description protected PrintWriteroutThe writer object to which the XML will be writtenprotected PdfReaderreaderThe reader object from which the content streams are read.
-
Constructor Summary
Constructors Constructor Description TaggedPdfReaderTool()
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description voidconvertToXml(PdfReader reader, OutputStream os)Parses a string with structured content.voidconvertToXml(PdfReader reader, OutputStream os, String charset)Parses a string with structured content.voidinspectChild(PdfObject k)Inspects a child of a structured element.voidinspectChildArray(PdfArray k)If the child of a structured element is an array, we need to loop over the elements.voidinspectChildDictionary(PdfDictionary k)If the child of a structured element is a dictionary, we inspect the child; we may also draw a tag.voidinspectChildDictionary(PdfDictionary k, boolean inspectAttributes)If the child of a structured element is a dictionary, we inspect the child; we may also draw a tag.voidparseTag(String tag, PdfObject object, PdfDictionary page)Searches for a tag in a page.protected StringxmlName(PdfName name)
-
-
-
Field Detail
-
reader
protected PdfReader reader
The reader object from which the content streams are read.
-
out
protected PrintWriter out
The writer object to which the XML will be written
-
-
Method Detail
-
convertToXml
public void convertToXml(PdfReader reader, OutputStream os, String charset) throws IOException
Parses a string with structured content.- Parameters:
reader- the PdfReader that has access to the PDF fileos- the OutputStream to which the resulting xml will be writtencharset- the charset to encode the data- Throws:
IOException- Since:
- 5.0.5
-
convertToXml
public void convertToXml(PdfReader reader, OutputStream os) throws IOException
Parses a string with structured content. The output is done using the current charset.- Parameters:
reader- the PdfReader that has access to the PDF fileos- the OutputStream to which the resulting xml will be written- Throws:
IOException
-
inspectChild
public void inspectChild(PdfObject k) throws IOException
Inspects a child of a structured element. This can be an array or a dictionary.- Parameters:
k- the child to inspect- Throws:
IOException
-
inspectChildArray
public void inspectChildArray(PdfArray k) throws IOException
If the child of a structured element is an array, we need to loop over the elements.- Parameters:
k- the child array to inspect- Throws:
IOException
-
inspectChildDictionary
public void inspectChildDictionary(PdfDictionary k) throws IOException
If the child of a structured element is a dictionary, we inspect the child; we may also draw a tag.- Parameters:
k- the child dictionary to inspect- Throws:
IOException
-
inspectChildDictionary
public void inspectChildDictionary(PdfDictionary k, boolean inspectAttributes) throws IOException
If the child of a structured element is a dictionary, we inspect the child; we may also draw a tag.- Parameters:
k- the child dictionary to inspect- Throws:
IOException
-
parseTag
public void parseTag(String tag, PdfObject object, PdfDictionary page) throws IOException
Searches for a tag in a page.- Parameters:
tag- the name of the tagobject- an identifier to find the marked contentpage- a page dictionary- Throws:
IOException
-
-