Developer's Guide

1. Writing Java Applications with the XML:DB API

The preferred way to work with eXist when developing Java applications is to use the XML:DB API. This API provides a common interface to native or XML-enabled databases and supports the development of portable, reusable applications. eXist's implementation of the XML:DB standards follows the Xindice implementation, and conforms to the latest working drafts put forth by the XML:DB Initiative. For more information, refer to the Javadocs for this API.

The basic components employed by the XML:DB API are drivers, collections, resources and services.

Drivers are implementations of the database interface that encapsulate the database access logic for specific XML database products. They are provided by the product vendor and must be registered with the database manager.

A collection is a hierarchical container for resources and further sub-collections. Currently two different resources are defined by the API: XMLResource and BinaryResource. An XMLResource represents an XML document or a document fragment, selected by a previously executed XPath query.

Finally, services are requested for special tasks such as querying a collection with XPath, or managing a collection.

Note

There are several XML:DB examples provided in eXist's samples directory . To start an example, use the start.jar jar file and pass the name of the example class as the first parameter, for instance:

java -jar start.jar
org.exist.examples.xmldb.Retrieve [- other options]

Programming with the XML:DB API is straightforward. You will find some code examples in the samples/org/exist/examples/xmldb directory. In the following simple example, a document can be retrieved from the eXist server and printed to standard output.

Example: Retrieving a Document with XML:DB

import org.xmldb.api.base.*;
import org.xmldb.api.modules.*;
import org.xmldb.api.*;
import javax.xml.transform.OutputKeys;

public class RetrieveExample {
    protected static String URI = "xmldb:exist://localhost:8080/exist/xmlrpc";

    public static void main(String args[]) throws Exception {
        String driver = "org.exist.xmldb.DatabaseImpl";
        
        // initialize database driver
        Class cl = Class.forName(driver);
        Database database = (Database) cl.newInstance();
        DatabaseManager.registerDatabase(database);

        // get the collection
        Collection col = DatabaseManager.getCollection(URI + args[0]);
        col.setProperty(OutputKeys.INDENT, "no");
        XMLResource res = (XMLResource)col.getResource(args[1]);
        if(res == null)
            System.out.println("document not found!");
        else
            System.out.println(res.getContent());   }
}
        

With this example, the database driver class for eXist (org.exist.xmldb.DatabaseImpl) is first registered with the DatabaseManager. Next we obtain a Collection object from the database manager by calling the static method DatabaseManger.getCollection(). The method expects a fully qualified URI for its parameter value, which identifies the desired collection. The format of this URI should look like the following:

xmldb:[DATABASE-ID]://[HOST-ADDRESS]/db/collection

Because more than one database driver can be registered with the database manager, the first part of the URI (xmldb:exist) is required to determine which driver class to use. The database-id is used by the database manager to select the correct driver from its list of available drivers. To use eXist, this ID should always be "exist" (unless you have set up multiple database instances; additional instances may have other names).

The final part of the URI identifies the collection path, and optionally the host address of the database server on the network. Internally, eXist uses two different driver implementations: The first talks to a remote database engine using XML-RPC calls, the second has direct access to a local instance of eXist. The root collection is always identified by /db. For example, the URI

xmldb:exist://localhost:8080/exist/xmlrpc/db/shakespeare/plays

references the Shakespeare collection on a remote server running the XML-RPC interface as a servlet at localhost:8080/exist/xmlrpc. If we leave out the host address, the XML:DB driver will try to connect to a locally attached database instance, e.g.:

xmldb:exist:///db/shakespeare/plays

In this case, we have to tell the XML:DB driver that it should create a new database instance if none has been started. This is done by setting the create-database property of class Database to "true" (more information on embedded use of eXist can be found in the deployment guide.

The setProperty calls are used to set database-specific parameters. In this case, pretty-printing of XML output is turned on for the collection. eXist uses the property keys defined in the standard Java package javax.xml.transform. Thus, in Java you can simply use class OutputKeys to get the correct keys.

Calling col.getResource() finally retrieves the document, which is returned as an XMLResource. All resources have a method getContent(), which returns the resource's content, depending on it's type. In this case we retrieve the content as type String.

To query the repository, we may either use the standard XPathQueryService or eXist's XQueryService class. The XML:DB API defines different kinds of services, which may or may not be provided by the database. The getService method of class Collection calls a service if it is available. The method expects the service name as the first parameter, and the version (as a string) as the second, which is used to distinguish between different versions of the service defined by the XML:DB API.

The following is an example of using the XML:DB API to execute a database query:

Example: Querying the Database (XML:DB API)

import org.xmldb.api.base.*;
import org.xmldb.api.modules.*;
import org.xmldb.api.*;

public class QueryExample {
    public static void main(String args[]) throws Exception {
        String driver = "org.exist.xmldb.DatabaseImpl";
        Class cl = Class.forName(driver);			
        Database database = (Database)cl.newInstance();
        DatabaseManager.registerDatabase(database);
        
        Collection col = 
            DatabaseManager.getCollection(
                "xmldb:exist://localhost:8080/exist/xmlrpc/db"
            );
        XPathQueryService service =
            (XPathQueryService) col.getService("XPathQueryService", "1.0");
        service.setProperty("indent", "yes");
                
        ResourceSet result = service.query(args[0]);
        ResourceIterator i = result.getIterator();
        while(i.hasMoreResources()) {
            Resource r = i.nextResource();
            System.out.println((String)r.getContent());
        }
    }
}
        

To execute the query, method service.query(xpath) is called. This method returns a ResourceSet, containing the Resources found by the query. ResourceSet.getIterator() gives us an iterator over these resources. Every Resource contains a single document fragment or value selected by the XPath expression.

Internally, eXist does not distinguish between XPath and XQuery expressions. XQueryService thus maps to the same implementation class as XPathQueryService. However, it provides a few additional methods. Most important, when talking to an embedded database, XQueryService allows for the XQuery expression to be compiled as an internal representation, which can then be reused. With compilation, the previous example code would look as follows:

Example: Compiling a Query (XML:DB API)

import org.xmldb.api.base.*;
import org.xmldb.api.modules.*;
import org.xmldb.api.*;
import org.exist.xmldb.XQueryService;

public class QueryExample {
    public static void main(String args[]) throws Exception {
        String driver = "org.exist.xmldb.DatabaseImpl";
        Class cl = Class.forName(driver);			
        Database database = (Database)cl.newInstance();
        database.setProperty("create-database", "true");
        DatabaseManager.registerDatabase(database);
        
        Collection col = 
            DatabaseManager.getCollection("xmldb:exist:///db");
        XQueryService service =
            (XQueryService) col.getService("XQueryService", "1.0");
        service.setProperty("indent", "yes");
        
        CompiledExpression compiled = service.compile(args[0]);
        ResourceSet result = service.execute(compiled);
        ResourceIterator i = result.getIterator();
        while(i.hasMoreResources()) {
            Resource r = i.nextResource();
            System.out.println((String)r.getContent());
        }
    }
}
        

The XML-RPC server automatically caches compiled expressions, and so calling compile through the remote driver produces no effect if the expression is already cached.

Next, we would like to store a new document into the repository. This is done by creating a new XMLResource, assigning it the content of the new document, and calling the storeResource method of class Collection. First, a new Resource is created by method Collection.createResource(), and expects two parameters: the id and type of resource being created. If the id-parameter is null, a unique resource-id will be automatically generated .

In some cases, the collection may not yet exist, and so we must create it. To create a new collection, call the createCollection method of the CollectionManagementService service. In the following example, we simply start at the root-collection object to get the CollectionManagementService service.

Example: Adding a File (XML:DB API)

public class StoreExample {
    public final static String URI = "xmldb:exist://localhost:8080/exist/xmlrpc";

    public static void main(String args[]) throws Exception {
        if(args.length < 2) {
            System.out.println("usage: StoreExample collection-path document");
            System.exit(1);
        }

        String collection = args[0], file = args[1];

        // initialize driver
        String driver = "org.exist.xmldb.DatabaseImpl";
        Class cl = Class.forName(driver);
        Database database = (Database)cl.newInstance();
        DatabaseManager.registerDatabase(database);

        // try to get collection
        Collection col =
            DatabaseManager.getCollection(URI + collection);
        if(col == null) {
            // collection does not exist: get root collection and create
            // for simplicity, we assume that the new collection is a
            // direct child of the root collection, e.g. /db/test.
            // the example will fail otherwise.
            Collection root = DatabaseManager.getCollection(URI + "/db");
            CollectionManagementService mgtService = (CollectionManagementService)
                root.getService("CollectionManagementService", "1.0");
            col = mgtService.createCollection(collection.substring("/db".length()));
        }
        // create new XMLResource; an id will be assigned to the new resource
        XMLResource document = (XMLResource)col.createResource(null, "XMLResource");
        File f = new File(file);
        if(!f.canRead()) {
            System.out.println("cannot read file " + file);
            return;
        }
        document.setContent(f);
        System.out.print("storing document " + document.getId() + "...");
        col.storeResource(document);
        System.out.println("ok.");
    }
}

Please note that the XMLResource.setContent() method takes a Java object as its parameter. The eXist driver checks if the object is a File. Otherwise, the object is transformed into a String by calling the object's toString() method. Passing a File has one big advantage: If the database is running in the embedded mode, the file will be directly passed to the indexer. Thus, the file's content does not have to be loaded into the main memory. This is handy if your files are very large.

2. Extensions to XML:DB

2.1. Additional Services

eXist provides several services in addition to those defined by the XML:DB specification:

The UserManagementService service contains methods to manage users and handle permissions. These methods resemble common Unix commands such as chown or chmod. As with other services, UserManagementService can be retrieved from a collection object, as in:

UserManagementService service =
(UserManagementService)collection.getService("UserManagementService", "1.0");

Another service called DatabaseInstanceManager, provides a single method to shut down the database instance accessed by the driver. You have to be a member of the dba user group to use this method or an exception will be thrown. See the Deployment Guide for an example.

Finally, interface IndexQueryService supports access to the terms and elements contained in eXist's internal index. Method getIndexedElements() returns a list of element occurrences for the current collection. For each occurring element, the element's name and a frequency count is returned.

Method scanIndexTerms() allows for a retrieval of the list of occurring words for the current collection. This might be useful, for example, to provide users a list of searchable terms together with their frequency.

2.2. Multiple Database Instances

As explained above, passing a local XML:DB URI to the DatabaseManager means that the driver will try to start or access an embedded database instance. You can configure more than one database instance by setting the location of the central configuration file. The configuration file is set through the configuration property of the DatabaseImpl driver class. If you would like to use different drivers for different database instances, specify a name for the created instance through the database-id property. You may later use this name in the URI to refer to a database instance. The following fragment sets up two instances:

Example: Multiple Database Instances

// initialize driver
String driver = "org.exist.xmldb.DatabaseImpl";
Class cl = Class.forName(driver);			
Database database1 = (Database)cl.newInstance();
database1.setProperty("create-database", "true");
database1.setProperty("configuration", "/home/exist/test/conf.xml");
database1.setProperty("database-id", "test");
DatabaseManager.registerDatabase(database1);

Database database2 = (Database)cl.newInstance();
database2.setProperty("create-database", "true");
database2.setProperty("configuration", "/home/exist/production/conf.xml");
database2.setProperty("database-id", "exist");
DatabaseManager.registerDatabase(database1);

With the above example, the URI

xmldb:test:///db

selects the test database instance. Both instances should have their own data and log directory as specified in the configuration files.

3. XMLDBTransformer for Cocoon (Deprecated)

Important

The XMLDBTransformer is no longer actively developed since all its functionality can be replaced with simple XQuery.

eXist offers several ways to access the database from Cocoon-based applications. This includes access via the XMLDB pseudo-protocol, through XSP pages, and through the XMLDBTransformer. The XMLDBTransformer provides a simple way to query the database, and works in a similar way to other transformers supplied with Cocoon. Consult the Cocoon documentation for more on using Transformers and about their basic concepts.

As with other transformers, the XMLDBTransformer listens for a limited set of tags that belong to the namespace http://exist-db/transformer/1.0. These are <collection> , <for-each> , <select-node> , <current-node> . To examine how they are used, let's consider the following example (Note that the complete version of this example can be found at webapp/examples/simple2.xml):

Example: XMLDBTransformer Example

<xdb:collection xmlns:xdb="http://exist-db.org/transformer/1.0"
	uri="xdb:exist:///db">
	<!-- iterate through all rdf:Description elements containing the
	     term "computer" -->
	<xdb:for-each query="//rdf:Description[dc:title &amp;= 'computer']"
		from="0" to="9" sort-by="/dc:title">
		<!-- output a book element for each entry -->
		<book>
			<!-- extract the title. There's only one title, so we use
			     select-node -->
			<title><xdb:select-node query="dc:title/text()"/></title>
			<!-- extract the creators. There's probably more than one,
			     so we use a nested for-each -->
			<xdb:for-each query="dc:creator/text()">
				<creator><xdb:current-node/></creator>
			</xdb:for-each>
		</book>
	</xdb:for-each>
</xdb:collection>

As we can see above, before you can start to query the database, you must specify a collection in the <collection> element, which accepts a standard XMLDB URI in its uri attribute. To process a query, you may either use the <for-each> , or the <select-node> tag. The difference is the following:

The <current-node> element is used to return the current node being processed in a for-each iteration to the output document. You can restrict the number of for-each iterations by specifying the bounds set by the from and to attributes. The sort-by attribute is still experimental: the query results will be sorted by an XPath expression. For each of the results, the XPath expression is evaluated and the resulting string value is used to sort the query results in ascending order.

As shown above, it is possible to nest multiple for-each or select-node tags. The nested tag will be evaluated relative to the current result node. In the example above, the main for-each statement selects all <rdf:Description> fragments whose title contains the term "computer". During each iteration, we further process the current result fragment by using nested <for-each> and <select-node> tags to select the title and creators.

Notice that the same result could be achieved by an XSLT stylesheet. However, if the selected fragments are rather large, post-processing with XSLT can be much slower, since each fragment has to be serialized and then parsed by the XSLT processor.

The results of the XMLDBTransformer query are enclosed in the element <result-set> . Attributes for this tag include the number of hits for the query, the XPath query processed, the query time (in milliseconds), and the start and end position of the retrieved records in the result set. The output of the XMLDBTransformer for the above fragment is shown below:

Example: XMLDBTransformer Output

<xdb:result-set count="72" xpath="//rdf:Description[dc:title &= 'computer']"
	query-time="370" from="0" to="9">
	<book xdb:document-id="zit.rdf" xdb:collection="/db/library"> 
		<title> A Centennial History of the American Society of Mechanical Engineers 1880-1980 </title> 
		<creator xdb:document-id="zit.rdf" xdb:collection="/db/library"> Sinclair, Bruce </creator>
	</book>
	<!-- more books here ... -->
</xdb:result-set>

4. XML:DB Logicsheet for Cocoon

Important

The XMLDB logicsheet is no longer actively developed. The XQueryGenerator provides a much easier way to generate web page contents.

Cocoon offers a powerful mechanism called XSP (eXtensible Server Pages) to write dynamic XML-based web pages. Similar to JSP, XSP embeds Java code in the XML pages. However, embedding large sections of Java code in an XML document is usually considered poor programming form. To support the separation of content and programming logic, XSP allows us to put reusable code into "logicsheets", which correspond to the tag libraries found in JSP. A logicsheet helps to minimize the amount of Java code used inside an XSP page.

Version 0.8 of eXist includes a logicsheet based on the XML:DB API, which defines tags for all important tasks. While it is possible to write all of the XML:DB related code by hand, these predefined tags make the XML file more readable and helps users without Java experience to understand the process involved.

An overview of the available XSP tags is available with the stylesheet documentation (generated using xsldoc). In the following simple XSP example, a document is retrieved and displayed:

Example: Simple XSP Page (example1.xsp)

<xsp:page xmlns:xsp="http://apache.org/xsp"
          xmlns:xdb="http://exist-db.org/xmldb/1.0"
>
<document>
    <body>
        <section title="View document">
            
        <p>Retrieving document <xsp:expr>request.getParameter("doc")</xsp:expr></p>
        
        <xdb:collection uri="xdb:exist:///db/shakespeare/plays">
            <xml-source>
                <xdb:get-document encoding="ISO-8859-1" as="xml">
                     <xdb:name>request.getParameter("doc")</xdb:name>
                </xdb:get-document>
            </xml-source>
        </xdb:collection>
        </section>
    </body>
</document>
</xsp:page>

The Cocoon version included with eXist is already configured to recognize the xmldb namespace and associate it with the XML:DB logicsheet. The logicsheet is defined in src/org/exist/xmldb.xsl. To use the logicsheet from our page we just declare the xmldb namespace (i.e. xmlns:xdb="http://exist-db.org/xmldb/1.0").

The above sample code retrieves a document from the collection /db/shakespeare/plays. The name of the document is passed in the HTTP request parameter doc.

To post-process the retrieved XML data, we set the attribute as to "xml". This indicates that the resource should be fed into the current Cocoon processing stream. To include the data as a string value, you may specify as="string". As a result, all XML markup characters will be escaped.

Please note that the parameters of the logicsheet tags may be specified either as an attribute of an element or as a child element. If you specify a parameter as a child element, its content will be interpreted as a Java expression. Literal values should be set via an attribute. For example, the xpath parameter is specified as a Java expression, it is thus embedded in an <xdb:xpath> element.

Finally, in order to tell Cocoon how to process this page, we have to add a new <map:match> pattern to the sitemap - for example:

Example: Cocoon Sitemap Snippet (XSP)

<map:match pattern="test.xsp">
    <map:generate type="serverpages" src="test.xsp"/>
    <map:transform src="stylesheets/doc2html-2.xsl"/>
    <map:serialize type="xhtml"/>
</map:match>

The next example shows how to query the database:

Example: Querying the Database (example2.xsp)

<xsp:page xmlns:xsp="http://apache.org/xsp"
          xmlns:xdb="http://exist-db.org/xmldb/1.0"
>
    <html>
        <body>
            <h1>Find books by title</h1>
            <xdb:collection uri="xdb:exist:///db">
                <xdb:execute>
                    <xdb:xpath>
                        "xmldb:document()//rdf:Description[dc:title" +
                        "&amp;='" + request.getParameter("title") + "']"
                    </xdb:xpath>
                    <p>Found <xdb:get-hit-count/> hits.</p>
                    
                    <xdb:results>
                        <pre>
                            <xdb:get-xml as="string"/>
                        </pre>
                    </xdb:results>
                </xdb:execute>
            </xdb:collection>
        </body>
    </html>
</xsp:page>

This XSP example page takes the HTTP request parameter title as its input and creates an XPath expression that finds all <df:Description> elements having a <dc:title> element containing the keywords entered by the user. As required by the XML:DB API, any action has to be enclosed in an <xdb:collection> element. The query is specified in the <xdb:xpath> element using a Java expression, which inserts the value of the request parameter title into the XPath query string.

The <xdb:results> element will iterate through the generated result set, inserting each resource into the page by calling <xdb:get-xml> . In this case, <xdb:get-xml> inserts the resource contents as a string, which means that all XML markup is escaped.

September 2009
Wolfgang M. Meier
wolfgang at exist-db.org