How To Use

The following examples show you how to use NQXML's parsers, writer, and context-sensitive callback mechanism. See also the files in the examples directory.

Parsers

There are two flavors of XML parser. Both check for the well-formedness of documents and create entities representing XML tags and text.

The first kind of parser returns entities one at a time. Perhaps the most well-known of this type are SAX parsers (Simple API for XML). The NQXML streaming parser isn't a SAX parser because it doesn't use callbacks to return entities. Instead, the streaming parser iterates over the entities via calls to NQXML::StreamingParser.each.

The second kind of parser creates a tree of entity objects in memory and returns a document object containing the document prolog, object tree, and epilogue. The Document Object Model (DOM) is often used by these parsers. The NQXML tree parser isn't a DOM parser because it doesn't use exactly the same class names or hierarchy for the elements contained in the NQXML::Document object.

Creating XML Output

An XML writer can be used to build well-formed XML output. The NQXML::Writer has two ways of doing this. First, there are methods that output tags, attributes, and text bit-by-bit. For this purpose, the writer class has an interface similar to James Clark's com.jclark.xml.output.XMLWriter Java class.

Additionally, the writeDocument method accepts an NQXML::Document and prints out the entire document's XML.

Examples

Checking an XML document for well-formedness

The code in Example 1 shows an NQXML::StreamingParser being used to check the well-formedness of an XML document. First we create the parser and hand it either an XML string or a readable object (for example, IO, File, or Tempfile). Next, we iterate over all of the entities in the document. We ignore them because we are only interested in finding any errors. If an NQXML::ParserError exception is raised, the document is not well-formed.

Example 1. Checking a document for well-formedness

require 'nqxml/streamingparser'
begin
    parser = NQXML::StreamingParser.new(string_or_readable)
    parser.each { | entity | }
rescue NQXML::ParserError => ex
    puts "parser error on line #{ex.line()}," +
        " col #{ex.column()}: #{$!}"
end

Using the streaming parser

The code in Example 2 uses an NQXML::StreamingParser to visit each entity in an XML stream and print its class name.

The first step is to override the method to_s in each entity subclass with a version that prints the class name.

See the file printEntityClassNames.rb in the examples directory for a complete version of this script.

Example 2. Using the streaming parser

require 'nqxml/streamingparser'

# Override the `to_s()' method of all Entity classes that
# have one.
module NQXML
    class Entity; def to_s; return "I'm an #{self.class}."
        end; end
    class Text; def to_s; return "I'm an #{self.class}."
        end; end
end

# Here's where the fun begins.
begin
    # Create a parser.
    parser = NQXML::StreamingParser.new(string_or_readable)
    # Start parsing and returning entities.
    parser.each { | entity | puts entity.to_s }
rescue NQXML::ParserError => ex
    puts "parser error on line #{ex.line()}," +
        " col #{ex.column()}: #{$!}"
end

Using the tree parser

Using the NQXML::TreeParser class is a bit different than using NQXML::StreamingParser. Calling the tree parser's constructor causes the XML to be parsed. You may then request an NQXML::Document from the parser and walk the document's object tree or iterate over the entities stored in the document's prolog and epilogue collections.

See the file reverseTags.rb in the examples directory for a more complete example of using NQXML::TreeParser.

Example 3. Using the tree parser

require 'nqxml/treeparser'
begin
    # Creating a TreeParser parses the input. We immediately
    # ask for the Document object.
    doc = NQXML::TreeParser.new(string_or_readable).document

    # Print the entities in the document's prolog.
    doc.prolog.each { | entity | puts entity.to_s }

    # Do something with the nodes in the document.
    rootNode = doc.rootNode
    puts "The root entity is #{rootNode.entity}"
    # ...
rescue NQXML::ParserError => ex
    puts "parser error on line #{ex.line()}," +
        " col #{ex.column()}: #{$!}"
end

Traversing a document object

Document nodes have many ways to traverse themselves and their children. A node has the attributes :children (an Array), :parent (an NQXML::Node), and :entity (some subclass of NQXML::Entity). Other useful methods of node include addChild, firstChild, nextSibling, and previousSibling.

Perhaps the most useful thing to remember is that :children is an Array. That means you can iterate over it and call any of the module Enumerable's methods like each, collect, and detect.

Example 4. Finding a node

desiredNode = document.rootNode.children.detect { | node |
    entity = node.entity
    entity.instance_of?(NQXML::Tag) &&
        entity.name == 'repeatedTagName' &&
        entity.attrs['uniqueIdentifier'] == '12345'
}

Using the Dispatcher

The NQXML::Dispatcher class by David Alan Black allows you to register handlers (callbacks) for entering and/or exiting a given context. This section comes from the RDTool documentation found in the source code for NQXML::Dispatcher.

Create a New Dispatcher

nd = NQXML::Dispatcher.new(args)

args are same as for NQXML::StreamingParser.

Register Handlers For Various Events

The streaming parser provides a stream of four types of entity: (1) element start-tags, (2) element end-tags, (3) text segments, and (4) comments. You can register handlers for any or all of these. You do this by writing a code block which you want executed every time one of the four types is encountered in the stream in a certain context.

"Context," in this context, means nesting of elements -- for instance, (book(chapter(paragraph))). See the examples, below, for more on this.

The handler will return the entity that triggered it back to the block, so the block should be prepared to grab it. (See documentation for NQXML::StreamingParser and other components of NQXML for more information on this.)

Note: when you register a handler, you must specify an event, a context, and an action (block). The event must be a symbol. The context may be a list of strings, a list of symbols, an array of strings, or an array of symbols.

Examples:

  1. Register a handler for starting an element. Arguments are: context and a block, where context is an array of element names, in order of desired nesting, and block is a block.

    # For every new <chapter> element inside a <book> element:
    nd.handle(:start_element, [ :book, :chapter ] ) { |e|
      puts "Chapter starting"
    }
  2. Register a handler for dealing with text inside an element:

    # Print book chapter titles in bold (LaTex):
    nd.handle(:text, "book", "chapter", "title" ) { |e|
      puts "\\textbf{#{e.text}}"
    }
  3. Register a handler for end of an element.

    nd.handle(:end_element, %w{book chapter} ) { |e|
      puts "Chapter over"
    }
  4. Register a handler for all XML comments:

    # Note that this can be done one of two ways:
    nd.handle(:comment) { |c| puts "Comment: #{c} }
    nd.handle(:comment, "*") { |c| puts "Comment: #{c} }

Begin the Parse

nd.start()

Wildcards

NQXML::Dispatcher offers a lightweight wildcard facility. The single wildcard character "*" may be used as the last item in specifying a context. This is a "one-or-more" wildcard. See below for further explanation of its use.

How NQXML::Dispatcher matches contexts

In looking for a match between the current event and context with its list of registered event/context handlers, the Dispatcher looks first for an exact match. Then it starts peeling off context from the left (e.g., if it doesn't find a match for book/chapter/paragraph, it looks next for chapter/paragraph). If no exact match can be found that way, it reverts to the full context specification and starts replacing right-most items with "*". It works leftward through the items, looking for a match.

Some examples:

If you define callbacks for these:

  1. [book chapter paragraph bold]

  2. [paragraph bold]

  3. [book chapter *]

  4. [chapter *]

then the following matches will hold:

  • [book intro paragraph bold] matches 2

  • [bold] no match

  • [book chapter paragraph] matches 3

  • [chapter paragraph] matches 4

  • [book appendix chapter figure] matches 4

Writing XML

The NQXML::Writer class creates and outputs well-formed XML. There are two ways to use a writer: call methods that create the XML a bit at a time or create an NQXML::Document object and hand it to the writer.

For writing XML a bit at a time, NQXML::Writer has an interface similar to James Clark's com.jclark.xml.output.XMLWriter Java class. For printing entire document trees, there is NQXML::Writer.writeDocument.

A writer's constructor has two arguments. The first is the object to which the XML is written. This argument can be any object that responds to the << method, including IO, File, Tempfile, String, and Array objects.

The second, optional boolean argument to the constructor activates some simple ``prettifying'' code that inserts newlines after tags' closing brackets, indents opening tags, and minimizes empty tags. This behavior is turned off by default. The ``prettifying'' behavior can be turned on or off at any time by modifying the writer's prettify attribute.

Writers check to make sure that tags are nested properly. If there is an error, an NQXML::WriterError exception is raised.

When a writer outputs an empty tag such as <foo attr="x"/>, it normalizes the tag by printing <foo attr="x"></foo>.

Example 5. Writing XML a tag at a time

require 'nqxml/writer'

# Create a write and hand it an IO object. Tell the writer to
# insert newlines to make the output a bit easier to read.
writer = NQXML::Writer.new($stdout, true)

# Write a processing instruction.
writer.processingInstruction('xml', 'version="1.0"')

# Write an open tag and a close tag. This will produce
# <tag1 attr1="foo"/>.
writer.startElement('tag1')
writer.attribute('attr1', 'foo')
writer.endElement('tag1')

# Write text. Automatically closes tag first. All '&', '<',
# '>', and single- and double-quote characters are replaced
# with their markup equivalents.
#
# (Note that the newline inserted after the previous tag will
# be part of this text. To avoid this, call
# "writer.prettify = false" to turn off this behavior).
writer.write("  data with <funky & groovy chars>\n")

writer.endElement('tag2')

Example 6. Writing a document created by a tree parser

require 'nqxml/treeparser'
require 'nqxml/writer'

# Open a file, create a tree parser and give it the file, and
# retrieve the parsed document. Gee, this comment is longer
# than the code.
doc = NQXML::TreeParser.new(File.open('doc.xml')).document

# Create a writer and point it to stdout
writer = NQXML::Writer.new($stdout)
writer.writeDocument(doc)

# The last newline, since it is outside the root entity of
# the document and is purely whitespace, is ignored by the
# NQXML::TreeParser. Let's add a newline to our output.
writer.write("\n")

Example 7. Writing a lovingly hand-crafted document

require 'nqxml/document'
require 'nqxml/writer'

# Create a document object.
doc = NQXML::Document.new()

# Create a processing instruction and add it to the
# document prolog.
pi = NQXML::ProcessingInstruction.new('xml',
    {'version' => '1.0'}))
doc.addToProlog(pi)
doc.setRoot(NQXML::Tag.new('root', {'type' => 'manual'}))

# Add two sub-nodes of the root node. We save one node but
# will find the other later.
tag = NQXML::Tag.new('thing', {'id' => '1'}))
thing1Node = doc.rootNode.addChild(tag)

# All on one line:
doc.rootNode.addChild(NQXML::Tag.new('thing', {'id' => '2'}))

# Add a child to thing1.
thing1Node.addChild(NQXML::Text.new('this is some text'))

# Find thing2 under root.
thing2Node = doc.rootNode.children.detect { | n |
    n.entity.instance_of?(NQXML::Tag) &&
        n.entity.name == 'thing' &&
        n.entity.attrs['id'] == '2'
}

# Add a child to thing2.
thing2Node.addChild(NQXML::Text.new('thing 2 node text'))

# Create a writer and point it to a string.
str = ''
writer = NQXML::Writer.new(str)
writer.prettify = true

# Write the document to the string.
writer.writeDocument(doc)

# Output the XML string.
print str

More Example Scripts

Here are short descriptions of each of the examples found in the examples directory.

There are also a few XML data files in the examples directory.