Class SAXBuilder


  • public class SAXBuilder
    extends java.lang.Object
    Builds a JDOM document from files, streams, readers, URLs, or a SAX InputSource instance using a SAX parser. The builder uses a third-party SAX parser (chosen by JAXP by default, or you can choose manually) to handle the parsing duties and simply listens to the SAX events to construct a document. Details which SAX does not provide, such as whitespace outside the root element, are not represented in the JDOM document. Information about SAX can be found at http://www.saxproject.org.

    Known issues: Relative paths for a DocType or EntityRef may be converted by the SAX parser into absolute paths.

    Version:
    $Revision: 1.93 $, $Date: 2009/07/23 06:26:26 $
    Author:
    Jason Hunter, Brett McLaughlin, Dan Schaffer, Philip Nelson, Alex Rosen
    • Constructor Summary

      Constructors 
      Constructor Description
      SAXBuilder()
      Creates a new SAXBuilder which will attempt to first locate a parser via JAXP, then will try to use a set of default SAX Drivers.
      SAXBuilder​(boolean validate)
      Creates a new SAXBuilder which will attempt to first locate a parser via JAXP, then will try to use a set of default SAX Drivers.
      SAXBuilder​(java.lang.String saxDriverClass)
      Creates a new SAXBuilder using the specified SAX parser.
      SAXBuilder​(java.lang.String saxDriverClass, boolean validate)
      Creates a new SAXBuilder using the specified SAX parser.
    • Method Summary

      All Methods Instance Methods Concrete Methods 
      Modifier and Type Method Description
      Document build​(java.io.File file)
      This builds a document from the supplied filename.
      Document build​(java.io.InputStream in)
      This builds a document from the supplied input stream.
      Document build​(java.io.InputStream in, java.lang.String systemId)
      This builds a document from the supplied input stream.
      Document build​(java.io.Reader characterStream)
      This builds a document from the supplied Reader.
      Document build​(java.io.Reader characterStream, java.lang.String systemId)
      This builds a document from the supplied Reader.
      Document build​(java.lang.String systemId)
      This builds a document from the supplied URI.
      Document build​(java.net.URL url)
      This builds a document from the supplied URL.
      Document build​(org.xml.sax.InputSource in)
      This builds a document from the supplied input source.
      protected void configureContentHandler​(SAXHandler contentHandler)
      This configures the SAXHandler that will be used to build the Document.
      protected void configureParser​(org.xml.sax.XMLReader parser, SAXHandler contentHandler)
      This configures the XMLReader to be used for reading the XML document.
      protected SAXHandler createContentHandler()
      This creates the SAXHandler that will be used to build the Document.
      protected org.xml.sax.XMLReader createParser()
      This creates the XMLReader to be used for reading the XML document.
      java.lang.String getDriverClass()
      Returns the driver class assigned in the constructor, or null if none.
      org.xml.sax.DTDHandler getDTDHandler()
      Returns the DTDHandler assigned, or null if none.
      org.xml.sax.EntityResolver getEntityResolver()
      Returns the EntityResolver assigned, or null if none.
      org.xml.sax.ErrorHandler getErrorHandler()
      Returns the ErrorHandler assigned, or null if none.
      boolean getExpandEntities()
      Returns whether or not entities are being expanded into normal text content.
      JDOMFactory getFactory()
      Returns the current JDOMFactory in use.
      boolean getIgnoringBoundaryWhitespace()
      Returns whether or not the parser will elminate element content containing only whitespace.
      boolean getIgnoringElementContentWhitespace()
      Returns whether element content whitespace is to be ignored during the build.
      boolean getReuseParser()
      Returns whether the contained SAX parser instance is reused across multiple parses.
      boolean getValidation()
      Returns whether validation is to be performed during the build.
      org.xml.sax.XMLFilter getXMLFilter()
      Returns the XMLFilter used during parsing, or null if none.
      void setDTDHandler​(org.xml.sax.DTDHandler dtdHandler)
      This sets custom DTDHandler for the Builder.
      void setEntityResolver​(org.xml.sax.EntityResolver entityResolver)
      This sets custom EntityResolver for the Builder.
      void setErrorHandler​(org.xml.sax.ErrorHandler errorHandler)
      This sets custom ErrorHandler for the Builder.
      void setExpandEntities​(boolean expand)
      This sets whether or not to expand entities for the builder.
      void setFactory​(JDOMFactory factory)
      This sets a custom JDOMFactory for the builder.
      void setFastReconfigure​(boolean fastReconfigure)
      Specifies whether this builder will do fast reconfiguration of the underlying SAX parser when reuseParser is true.
      void setFeature​(java.lang.String name, boolean value)
      This sets a feature on the SAX parser.
      void setIgnoringBoundaryWhitespace​(boolean ignoringBoundaryWhite)
      Specifies whether or not the parser should elminate boundary whitespace, a term that indicates whitespace-only text between element tags.
      void setIgnoringElementContentWhitespace​(boolean ignoringWhite)
      Specifies whether or not the parser should elminate whitespace in element content (sometimes known as "ignorable whitespace") when building the document.
      void setProperty​(java.lang.String name, java.lang.Object value)
      This sets a property on the SAX parser.
      void setReuseParser​(boolean reuseParser)
      Specifies whether this builder shall reuse the same SAX parser when performing subsequent parses or allocate a new parser for each parse.
      void setValidation​(boolean validate)
      This sets validation for the builder.
      void setXMLFilter​(org.xml.sax.XMLFilter xmlFilter)
      This sets a custom XMLFilter for the builder.
      • Methods inherited from class java.lang.Object

        clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
    • Constructor Detail

      • SAXBuilder

        public SAXBuilder()
        Creates a new SAXBuilder which will attempt to first locate a parser via JAXP, then will try to use a set of default SAX Drivers. The underlying parser will not validate.
      • SAXBuilder

        public SAXBuilder​(boolean validate)
        Creates a new SAXBuilder which will attempt to first locate a parser via JAXP, then will try to use a set of default SAX Drivers. The underlying parser will validate or not according to the given parameter.
        Parameters:
        validate - boolean indicating if validation should occur.
      • SAXBuilder

        public SAXBuilder​(java.lang.String saxDriverClass)
        Creates a new SAXBuilder using the specified SAX parser. The underlying parser will not validate.
        Parameters:
        saxDriverClass - String name of SAX Driver to use for parsing.
      • SAXBuilder

        public SAXBuilder​(java.lang.String saxDriverClass,
                          boolean validate)
        Creates a new SAXBuilder using the specified SAX parser. The underlying parser will validate or not according to the given parameter.
        Parameters:
        saxDriverClass - String name of SAX Driver to use for parsing.
        validate - boolean indicating if validation should occur.
    • Method Detail

      • getDriverClass

        public java.lang.String getDriverClass()
        Returns the driver class assigned in the constructor, or null if none.
        Returns:
        the driver class assigned in the constructor
      • getFactory

        public JDOMFactory getFactory()
        Returns the current JDOMFactory in use.
        Returns:
        the factory in use
      • setFactory

        public void setFactory​(JDOMFactory factory)
        This sets a custom JDOMFactory for the builder. Use this to build the tree with your own subclasses of the JDOM classes.
        Parameters:
        factory - JDOMFactory to use
      • getValidation

        public boolean getValidation()
        Returns whether validation is to be performed during the build.
        Returns:
        whether validation is to be performed during the build
      • setValidation

        public void setValidation​(boolean validate)
        This sets validation for the builder.
        Parameters:
        validate - boolean indicating whether validation should occur.
      • getErrorHandler

        public org.xml.sax.ErrorHandler getErrorHandler()
        Returns the ErrorHandler assigned, or null if none.
        Returns:
        the ErrorHandler assigned, or null if none
      • setErrorHandler

        public void setErrorHandler​(org.xml.sax.ErrorHandler errorHandler)
        This sets custom ErrorHandler for the Builder.
        Parameters:
        errorHandler - ErrorHandler
      • getEntityResolver

        public org.xml.sax.EntityResolver getEntityResolver()
        Returns the EntityResolver assigned, or null if none.
        Returns:
        the EntityResolver assigned
      • setEntityResolver

        public void setEntityResolver​(org.xml.sax.EntityResolver entityResolver)
        This sets custom EntityResolver for the Builder.
        Parameters:
        entityResolver - EntityResolver
      • getDTDHandler

        public org.xml.sax.DTDHandler getDTDHandler()
        Returns the DTDHandler assigned, or null if none.
        Returns:
        the DTDHandler assigned
      • setDTDHandler

        public void setDTDHandler​(org.xml.sax.DTDHandler dtdHandler)
        This sets custom DTDHandler for the Builder.
        Parameters:
        dtdHandler - DTDHandler
      • getXMLFilter

        public org.xml.sax.XMLFilter getXMLFilter()
        Returns the XMLFilter used during parsing, or null if none.
        Returns:
        the XMLFilter used during parsing
      • setXMLFilter

        public void setXMLFilter​(org.xml.sax.XMLFilter xmlFilter)
        This sets a custom XMLFilter for the builder.
        Parameters:
        xmlFilter - the filter to use
      • getIgnoringElementContentWhitespace

        public boolean getIgnoringElementContentWhitespace()
        Returns whether element content whitespace is to be ignored during the build.
        Returns:
        whether element content whitespace is to be ignored during the build
      • setIgnoringElementContentWhitespace

        public void setIgnoringElementContentWhitespace​(boolean ignoringWhite)
        Specifies whether or not the parser should elminate whitespace in element content (sometimes known as "ignorable whitespace") when building the document. Only whitespace which is contained within element content that has an element only content model will be eliminated (see XML Rec 3.2.1). For this setting to take effect requires that validation be turned on. The default value of this setting is false.
        Parameters:
        ignoringWhite - Whether to ignore ignorable whitespace
      • getIgnoringBoundaryWhitespace

        public boolean getIgnoringBoundaryWhitespace()
        Returns whether or not the parser will elminate element content containing only whitespace.
        Returns:
        boolean - whether only whitespace content will be ignored during build.
        See Also:
        setIgnoringBoundaryWhitespace(boolean)
      • setIgnoringBoundaryWhitespace

        public void setIgnoringBoundaryWhitespace​(boolean ignoringBoundaryWhite)
        Specifies whether or not the parser should elminate boundary whitespace, a term that indicates whitespace-only text between element tags. This feature is a lot like setIgnoringElementContentWhitespace(boolean) but this feature is more aggressive and doesn't require validation be turned on. The setIgnoringElementContentWhitespace(boolean) call impacts the SAX parse process while this method impacts the JDOM build process, so it can be beneficial to turn both on for efficiency. For implementation efficiency, this method actually removes all whitespace-only text() nodes. That can, in some cases (like beteween an element tag and a comment), include whitespace that isn't just boundary whitespace. The default is false.
        Parameters:
        ignoringBoundaryWhite - Whether to ignore whitespace-only text noes
      • getReuseParser

        public boolean getReuseParser()
        Returns whether the contained SAX parser instance is reused across multiple parses. The default is true.
        Returns:
        whether the contained SAX parser instance is reused across multiple parses
      • setReuseParser

        public void setReuseParser​(boolean reuseParser)
        Specifies whether this builder shall reuse the same SAX parser when performing subsequent parses or allocate a new parser for each parse. The default value of this setting is true (parser reuse).

        Note: As SAX parser instances are not thread safe, the parser reuse feature should not be used with SAXBuilder instances shared among threads.

        Parameters:
        reuseParser - Whether to reuse the SAX parser.
      • setFastReconfigure

        public void setFastReconfigure​(boolean fastReconfigure)
        Specifies whether this builder will do fast reconfiguration of the underlying SAX parser when reuseParser is true. This improves performance in cases where SAXBuilders are reused and lots of small documents are frequently parsed. This avoids attempting to set features on the SAX parser each time build() is called which result in SaxNotRecognizedExceptions. This should ONLY be set for builders where this specific case is an issue. The default value of this setting is false (no fast reconfiguration). If reuseParser is false, calling this has no effect.
        Parameters:
        fastReconfigure - Whether to do a fast reconfiguration of the parser
      • setFeature

        public void setFeature​(java.lang.String name,
                               boolean value)
        This sets a feature on the SAX parser. See the SAX documentation for . more information.

        NOTE: SAXBuilder requires that some particular features of the SAX parser be set up in certain ways for it to work properly. The list of such features may change in the future. Therefore, the use of this method may cause parsing to break, and even if it doesn't break anything today it might break parsing in a future JDOM version, because what JDOM parsers require may change over time. Use with caution.

        Parameters:
        name - The feature name, which is a fully-qualified URI.
        value - The requested state of the feature (true or false).
      • setProperty

        public void setProperty​(java.lang.String name,
                                java.lang.Object value)
        This sets a property on the SAX parser. See the SAX documentation for more information.

        NOTE: SAXBuilder requires that some particular properties of the SAX parser be set up in certain ways for it to work properly. The list of such properties may change in the future. Therefore, the use of this method may cause parsing to break, and even if it doesn't break anything today it might break parsing in a future JDOM version, because what JDOM parsers require may change over time. Use with caution.

        Parameters:
        name - The property name, which is a fully-qualified URI.
        value - The requested value for the property.
      • build

        public Document build​(org.xml.sax.InputSource in)
                       throws JDOMException,
                              java.io.IOException
        This builds a document from the supplied input source.
        Parameters:
        in - InputSource to read from
        Returns:
        Document resultant Document object
        Throws:
        JDOMException - when errors occur in parsing
        java.io.IOException - when an I/O error prevents a document from being fully parsed
      • createContentHandler

        protected SAXHandler createContentHandler()
        This creates the SAXHandler that will be used to build the Document.
        Returns:
        SAXHandler - resultant SAXHandler object.
      • configureContentHandler

        protected void configureContentHandler​(SAXHandler contentHandler)
        This configures the SAXHandler that will be used to build the Document.

        The default implementation simply passes through some configuration settings that were set on the SAXBuilder: setExpandEntities() and setIgnoringElementContentWhitespace().

        Parameters:
        contentHandler - The SAXHandler to configure
      • createParser

        protected org.xml.sax.XMLReader createParser()
                                              throws JDOMException
        This creates the XMLReader to be used for reading the XML document.

        The default behavior is to (1) use the saxDriverClass, if it has been set, (2) try to obtain a parser from JAXP, if it is available, and (3) if all else fails, use a hard-coded default parser (currently the Xerces parser). Subclasses may override this method to determine the parser to use in a different way.

        Returns:
        XMLReader - resultant XMLReader object.
        Throws:
        JDOMException
      • configureParser

        protected void configureParser​(org.xml.sax.XMLReader parser,
                                       SAXHandler contentHandler)
                                throws JDOMException
        This configures the XMLReader to be used for reading the XML document.

        The default implementation sets various options on the given XMLReader, such as validation, DTD resolution, entity handlers, etc., according to the options that were set (e.g. via setEntityResolver) and set various SAX properties and features that are required for JDOM internals. These features may change in future releases, so change this behavior at your own risk.

        Parameters:
        parser -
        contentHandler -
        Throws:
        JDOMException
      • build

        public Document build​(java.io.InputStream in)
                       throws JDOMException,
                              java.io.IOException

        This builds a document from the supplied input stream.

        Parameters:
        in - InputStream to read from
        Returns:
        Document resultant Document object
        Throws:
        JDOMException - when errors occur in parsing
        java.io.IOException - when an I/O error prevents a document from being fully parsed.
      • build

        public Document build​(java.io.File file)
                       throws JDOMException,
                              java.io.IOException

        This builds a document from the supplied filename.

        Parameters:
        file - File to read from
        Returns:
        Document resultant Document object
        Throws:
        JDOMException - when errors occur in parsing
        java.io.IOException - when an I/O error prevents a document from being fully parsed
      • build

        public Document build​(java.net.URL url)
                       throws JDOMException,
                              java.io.IOException

        This builds a document from the supplied URL.

        Parameters:
        url - URL to read from.
        Returns:
        Document - resultant Document object.
        Throws:
        JDOMException - when errors occur in parsing
        java.io.IOException - when an I/O error prevents a document from being fully parsed.
      • build

        public Document build​(java.io.InputStream in,
                              java.lang.String systemId)
                       throws JDOMException,
                              java.io.IOException

        This builds a document from the supplied input stream.

        Parameters:
        in - InputStream to read from.
        systemId - base for resolving relative URIs
        Returns:
        Document resultant Document object
        Throws:
        JDOMException - when errors occur in parsing
        java.io.IOException - when an I/O error prevents a document from being fully parsed
      • build

        public Document build​(java.io.Reader characterStream)
                       throws JDOMException,
                              java.io.IOException

        This builds a document from the supplied Reader. It's the programmer's responsibility to make sure the reader matches the encoding of the file. It's often easier and safer to use an InputStream rather than a Reader, and to let the parser auto-detect the encoding from the XML declaration.

        Parameters:
        characterStream - Reader to read from
        Returns:
        Document resultant Document object
        Throws:
        JDOMException - when errors occur in parsing
        java.io.IOException - when an I/O error prevents a document from being fully parsed
      • build

        public Document build​(java.io.Reader characterStream,
                              java.lang.String systemId)
                       throws JDOMException,
                              java.io.IOException

        This builds a document from the supplied Reader. It's the programmer's responsibility to make sure the reader matches the encoding of the file. It's often easier and safer to use an InputStream rather than a Reader, and to let the parser auto-detect the encoding from the XML declaration.

        Parameters:
        characterStream - Reader to read from.
        systemId - base for resolving relative URIs
        Returns:
        Document resultant Document object
        Throws:
        JDOMException - when errors occur in parsing
        java.io.IOException - when an I/O error prevents a document from being fully parsed
      • build

        public Document build​(java.lang.String systemId)
                       throws JDOMException,
                              java.io.IOException

        This builds a document from the supplied URI.

        Parameters:
        systemId - URI for the input
        Returns:
        Document resultant Document object
        Throws:
        JDOMException - when errors occur in parsing
        java.io.IOException - when an I/O error prevents a document from being fully parsed
      • getExpandEntities

        public boolean getExpandEntities()
        Returns whether or not entities are being expanded into normal text content.
        Returns:
        whether entities are being expanded
      • setExpandEntities

        public void setExpandEntities​(boolean expand)

        This sets whether or not to expand entities for the builder. A true means to expand entities as normal content. A false means to leave entities unexpanded as EntityRef objects. The default is true.

        When this setting is false, the internal DTD subset is retained; when this setting is true, the internal DTD subset is not retained.

        Note that Xerces (at least up to 1.4.4) has a bug where entities in attribute values will be misreported if this flag is turned off, resulting in entities to appear within element content. When turning entity expansion off either avoid entities in attribute values, or use another parser like Crimson. http://nagoya.apache.org/bugzilla/show_bug.cgi?id=6111

        Parameters:
        expand - boolean indicating whether entity expansion should occur.