XProc Introduction

1. XProc Overview

W3C XProc is a specification that defines a processor for working with XML technologies. eXist XProc implementation is called xprocxq and is mostly developed in XQuery.

Using XProc's core, standard, optional and extension steps one defines XML pipelines which can model a wide range of processes.

Steps accept input XML and produce output XML, its in this manner (somewhat analogous to unix pipes) that you can orchestrate and create sophisticated XML workflows.

2. Quick Start & Examples

Since eXist v1.3/1.4, xprocxq is built and configured by default and should be enabled and ready to use.

Check out some simple examples to check if your eXist is working.

3. Using XProc in eXist

The following XQuery file is an example of how to run xprocxq from within eXist.

Example: Example of Running XProc from XQuery

				

xquery version "1.0" encoding "UTF-8";

(: for now you need to declare these namespaces :)
import module namespace const = "http://xproc.net/xproc/const";
import module namespace xproc = "http://xproc.net/xproc";
import module namespace u = "http://xproc.net/xproc/util";

(: define standard input source binding :)
let $stdin :=document{<test>Hello World</test>}

(: the xproc pipeline :)
let $pipeline :=document{
                    <p:pipeline name="pipeline"
                                xmlns:p="http://www.w3.org/ns/xproc">
                        <p:identity/>
                    </p:pipeline>
                }

return
(: the xproc entry function :)
     xproc:run($pipeline,$stdin)


			
run xproc from xquery

list and define all xproc entry functions

The result of running this xquery should resemble:

Example: Result

				
<test>Hello World</test>

			

4. Learning XProc

At their simplest, XProc pipelines contain steps, each of which accept zero or more XML documents as their input and produce zero or more XML documents as output.

The XProc code in the following listing consists of a <p:pipeline> top-level element, a <p:xslt> step, and not much else.

Example: Simple Pipeline - XSLT transformation

				
<p:declare-step xmlns:p="http://www.w3.org/ns/xproc" name="simple-pipeline">
    <p:input port="source" primary="true" sequence="false"/>
    <p:output port="result" primary="true" sequence="false"/>
    <p:xslt name="step1">
        <p:input port="source">
            <p:pipe step="simple-pipeline" port="source"/>
        </p:input>
        <p:input port="stylesheet">
            <p:document href="/db/xproc/examples/stylesheet.xml"/>
        </p:input>
    </p:xslt>
</p:declare-step>
			
			
run simple-pipeline.xproc?stdin=/db/xproc/test.xml

An XML document is brought in as standard input using the stdin url param. The XProc processor uses this XML document as the input to the first step, <p:xslt> step, which applies an XSLT process using stylesheet.xml.

As the pipeline contains only a single step, results of XSLT processing are placed onto the result port for the pipeline, providing the XML document to standard output. The following figure illustrates this process, outlining where the XML document flows from source and result ports.

In the Simple Pipeline listing, I used <p:pipeline>, which implicitly declared a source input and result output port. Using <p:declare-step> now means that I have to explicitly define these ports as well as declare step bindings between sequential sibling steps. These bindings and ports are summarized below:

With a one step pipeline, it's difficult to illustrate step bindings between steps; so, I have created a nontrivial example which contains several steps. The next code listing presents a more representative XProc example containing multiple steps along with some conditional logic steps.

Example: Complex Pipeline - p:choose and p:for-each

<p:pipeline xmlns:p="http://www.w3.org/ns/xproc" xmlns:c="http://www.w3.org/ns/xproc-step" name="pipeline">

<p:compare name="compare">                       (: compare test step :)
<p:input port="alternate">
   <p:document href="/db/xproc/test.xml"/> (: example of using p:document :)
</p:input>
</p:compare>

<p:choose name="mychoosestep">
   <p:when test=".//c:result[.='false']">      (: note the eXist specific path convention with root :)
       <p:identity>
           <p:input port="source">
               <p:inline>
                   <p>This pipeline failed.</p>
               </p:inline>
           </p:input>
       </p:identity>
   </p:when>
   <p:when test=".//c:result[.='true']">  (: success :)
   <p:identity>
       <p:input port="source">
           <p:inline>
               <p>This pipeline successfully processed.</p>
           </p:inline>
       </p:input>
   </p:identity>
   </p:when>
   <p:otherwise>
       <p:identity>
           <p:input port="source">
               <p:inline>
                   <p>This pipeline failed.</p>
               </p:inline>
           </p:input>
       </p:identity>
   </p:otherwise>
</p:choose>

<p:identity>   (: currently need to define p:step to get multi container step output :)
    <p:input port="source">
        <p:step port="result" step="mychoosestep"/>
    </p:input>
</p:identity>


</p:pipeline>
					
	
run complex-pipeline.xproc

This pipeline roughly translates to the following:

Steps can have other input or output ports defined that work with non-XML documents, but only XML documents (as in XML infoset) can flow between primary input and output ports.

4.1. Core Steps

4.2. XProc Components

4.3. Standard Steps

4.4. Optional Steps

4.5. Extension Steps

Implementation specific steps

4.6. XProc Functions

5. Reuse and Extending XProc

eXist XProc implementation provides a range of extension mechanisms for creating new steps.

5.1. Defining XProc libraries using XQuery

5.2. Defining XProc libraries using XSLT

5.3. Defining XProc libraries using Java

6. W3C XProc Unit Test

The following links run xprocxq against W3C XProc Unit Test suite

Important

Please note that you will need to enable the File extension module and download the W3C XProc test suite to run these tests.

7. Additional XProc Resources

7.1. Mailing Lists

7.2. XProc articles

7.3. XProc Specifications

7.4. Useful

7.5. EXProc, EXpath and EXQuery Community Specifications

7.6. XQuery

7.7. XProc links

8. xprocxq FAQ

8.1. Why can't I access files from filesytem ?

By default, XProc is set to read files only from the XML Database. If you want to access files from the hard drive then you will need to enable eXist File extension module and make sure to use file:// prefix in your file paths.

8.2. What is xprocxq ?

Initially, development of eXist XProc processor was a standalone project, called xprocxq. The i

8.3. xprocxq compliance and limitations

xprocxq, being implemented in XQuery, currently has several limitations and is no where compliant with the existing XProc draft specification. The best way to understand what works or doesn't currently is to check out step examples included in the release.

Here is a list of the more severe limitations:

8.4. Why is xprocxq implemented in XQuery ?

XQuery's somewhat functional approach appealed to me and having been a long time XSLT user I wanted to gain some understanding of the nuances between XSLT and XQuery.

In building xprocxq, my primary goals were;

I think that most people will find using XProc with XQuery is a powerful combination which can be used to implement a wide range of server side applications.

November 2009