Thursday, August 24, 2006

Translate SAX events to a DOM tree

I had to pass the XML document provided by a piece of software to an other piece of software. The first one provides the document as SAX events. But the second one expects a DOM Document. So here is a SAX-events-to-DOM-Document translator:
public class SaxToDom
{
    public SaxToDom(XMLReader reader, InputSource input) {
        myReader = reader;
        myInput  = input;
    }

    public Document makeDom() {
        Document doc = null;
        try {
            // Find the implementation
            DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
            factory.setNamespaceAware(true);
            DocumentBuilder        builder = factory.newDocumentBuilder();
            DOMImplementation      impl    = builder.getDOMImplementation();

            // Create the document
            doc = impl.createDocument(null, null, null);

            // The Handlers and the actual building
            SaxToDomHandler handlers = new SaxToDomHandler(doc);
            myReader.setContentHandler(handlers);
            myReader.setErrorHandler(handlers);
            myReader.parse(myInput);
        }
        // For the catch handlers below, use your usual logging facilities.
        catch (DOMException e) {
            System.err.println(e); 
        }
        catch (ParserConfigurationException e) {
            System.err.println(e); 
        }
        catch (SAXException e) {
            System.err.println(e); 
        }
        catch (IOException e) {
            System.err.println(e); 
        }
        return doc;
    }

    private XMLReader   myReader;
    private InputSource myInput;
}


class SaxToDomHandler
    extends DefaultHandler
{
    public SaxToDomHandler(Document doc) {
        myDoc         = doc;
        myCurrentNode = myDoc;
    }

    // Add it in the DOM tree, at the right place.
    public void startElement(String uri, String name, String qName, Attributes attrs) {
        // Create the element.
        Element elem = myDoc.createElementNS(uri, qName);
        // Add each attribute.
        for ( int i = 0; i < attrs.getLength(); ++i ) {
            String ns_uri = attrs.getURI(i);
            String qname  = attrs.getQName(i);
            String value  = attrs.getValue(i);
            Attr   attr   = myDoc.createAttributeNS(ns_uri, qname);
            attr.setValue(value);
            elem.setAttributeNodeNS(attr);
        }
        // Actually add it in the tree, and adjust the right place.
        myCurrentNode.appendChild(elem);
        myCurrentNode = elem;
    }

    // Adjust the current place for subsequent additions.
    public void endElement(String uri, String name, String qName) {
        myCurrentNode = myCurrentNode.getParentNode();
    }

    // Add a new text node in the DOM tree, at the right place.
    public void characters(char[] ch, int start, int length) {
        String str  = new String(ch, start, length);
        Text   text = myDoc.createTextNode(str);
        myCurrentNode.appendChild(text);
    }

    // Add a new text node in the DOM tree, at the right place.
    public void ignorableWhitespace(char[] ch, int start, int length) {
        String str  = new String(ch, start, length);
        Text   text = myDoc.createTextNode(str);
        myCurrentNode.appendChild(text);
    }

    // Add a new text PI in the DOM tree, at the right place.
    public void processingInstruction(String target, String data) {
        ProcessingInstruction pi = myDoc.createProcessingInstruction(target, data);
        myCurrentNode.appendChild(pi);
    }

    // For the handlers below, use your usual logging facilities.
    public void error(SAXParseException e) {
        System.err.println("Erreur non fatale  (ligne " + e.getLineNumber() + ", col " +
                           e.getColumnNumber() + ") : " + e.getMessage());
    }

    public void fatalError(SAXParseException e) {
        System.err.println("Erreur fatale : " + e.getMessage());
    }

    public void warning(SAXParseException e) {
        System.err.println("Warning : " + e.getMessage());
    }

    private Document myDoc;
    private Node     myCurrentNode;
}

Labels: ,

4 Comments:

Anonymous Anonymous said...

this seems to be something that i have been trying to achieve. I have my sax handler but not able to convert it to a dom tree. I see you have been using a document type object and i use string for the source. kindly guide me what I can do to achieve a SAX to DOM. below is my code
e-mail: bravo.aevi@gmail.com

package saxtodom;

import java.io.IOException;
import javax.xml.parsers.*;
import javax.xml.parsers.DocumentBuilderFactory;
import org.w3c.dom.*;
import org.w3c.dom.DOMImplementation;
import org.w3c.dom.Document;
import org.xml.sax.*;
import org.xml.sax.XMLReader;
import org.xml.sax.helpers.DefaultHandler;

public class SaxToDom
{
public static void main ( String args[] ) throws Exception {
new SaxToDomHandler("cs.xml");
}

}

class SaxToDomHandler
extends DefaultHandler
{

int state;

public SaxToDomHandler ( String file ) {
super();
state = 0;
try {
SAXParserFactory factory = SAXParserFactory.newInstance();
factory.setValidating(false);
factory.setNamespaceAware(false);
XMLReader xmlReader = factory.newSAXParser().getXMLReader();
xmlReader.setContentHandler(this);
xmlReader.parse(file);
} catch (Exception e) {
throw new Error(e);
}
}

public void startElement ( String uri, String name, String tag, Attributes atts ) {
if (tag.equals("lastname"))
state = 1;
else if (state == 2 && tag.equals("firstname"))
state = 3;
}

public void endElement ( String uri, String name, String tag ) {
if (tag.equals("firstname"))
state = 0;
}

public void characters ( char text[], int start, int length ) {
if (state == 1 && new String(text,start,length).equals("Finton"))
state = 2;
else if (state == 3)
System.out.println(new String(text,start,length));
}
}

23:51  
Blogger Florent Georges said...

Aevi,

I must admit I don't see what you try to achieve. You only change the value of the variable state and output text content on the standard output. If you want to translate SAX events to a DOM tree, well, just use the code of this post. If you want to build a DOM tree from a file, use the standard class javax.xml.parsers.DocumentBuilder.

Regards, -- Florent

14:53  
Anonymous Anonymous said...

I realize this is a pretty old post, but i'm pretty sure a class like SaxToDom has never been needed. Even as far back as Java 1.4.2 (which is the oldest javadocs i have handy) you could build a SAXSource from an XMLReader and transform it to a DOMResult...

public static Document sax2dom(XMLReader reader, InputSource input) throws Exception {
Transformer t = TransformerFactory.newInstance().newTransformer();
Source s = new SAXSource(input);
DOMResult r = new DOMResult();
t.transform(s, r);
return (Document) r.getNode();
}

One limitation of this is that it only works when your SAX Events are coming from something that implements XMLReader. In theory there might be a library out there that expects you to give it doesn't implement XMLReader but expects you to pass it a ContentHandler -- in which case the SaxToDomHandler posted here would probably do the trick. (Nothing in the SAXSource API exposes itself as a ContentHandler that i can think of off the top of my head)

23:59  
Blogger svaens said...

another new post to an old blog. But, just in case there is an interesting reply, i'll do it anyway;

I find this sort of thing interesting, because i'm trying to work out how to build an XMLSocketReader kind of functionality (a class which lets you read an XML document from a socket without having either EOF or non-xml end-transmission markers).

1. Reading directly into a DOM parser is no good, because it will just keep on reading past the end of the document, and block.

2. Reading only until there is nothing more available is no good, because sometimes the only half of the XML document made it to the receiving socket before 'ready' returned false, and you end up with an invalid document.

3. using 'end-transmission' characters is dirty, because you are corrupting the XML because you need this application layer transport wrapper. It will work, but it is not nice, and not generic.

So, what else can we do?

I'm hoping I can use a SAX parser to read the socket stream, and when I receive the endDocument event, simply stop reading the stream.
In order not to have to read the same data twice, I want to be building the Document as I wait for the end of the document (to be more efficient).

My main worry now then is that the sax parser reads beyond the end tag of the XML document before I get the endDocument event.
I want to be able to exit out of the sax parse as soon as I have my endDocument event, have completed the building of my DOM Document object, have the stream pointer directly after the last '>' character of the end tag of the document element.

17:58  

Post a Comment

<< Home