XProc Introduction

1. XProc Overview

W3C XProc is a specification that defines a processor for working with XML technologies. eXist XProc implementation is called xprocxq and is mostly developed in XQuery.

Using XProc's core, standard, optional and extension steps one defines XML pipelines which can model a wide range of processes.

Steps accept input XML and produce output XML, its in this manner (somewhat analogous to unix pipes) that you can orchestrate and create sophisticated XML workflows.

2. Quick Start & Examples

Since eXist v1.3/1.4, xprocxq is built and configured by default and should be enabled and ready to use.

Check out some simple examples to check if your eXist is working.

3. Using XProc in eXist

The following XQuery file is an example of how to run xprocxq from within eXist.

Example: Example of Running XProc from XQuery

				

xquery version "1.0" encoding "UTF-8";

(: for now you need to declare these namespaces :)
import module namespace const = "http://xproc.net/xproc/const";
import module namespace xproc = "http://xproc.net/xproc";
import module namespace u = "http://xproc.net/xproc/util";

(: define standard input source binding :)
let $stdin :=document{<test>Hello World</test>}

(: the xproc pipeline :)
let $pipeline :=document{
                    <p:pipeline name="pipeline"
                                xmlns:p="http://www.w3.org/ns/xproc">
                        <p:identity/>
                    </p:pipeline>
                }

return
(: the xproc entry function :)
     xproc:run($pipeline,$stdin)

run xproc from xquery

list and define all xproc entry functions

The result of running this xquery should resemble:

Example: Result

				
<test>Hello World</test>

4. Learning XProc

At their simplest, XProc pipelines contain steps, each of which accept zero or more XML documents as their input and produce zero or more XML documents as output.

The XProc code in the following listing consists of a <p:pipeline> top-level element, a <p:xslt> step, and not much else.

Example: Simple Pipeline - XSLT transformation

				
<p:declare-step xmlns:p="http://www.w3.org/ns/xproc" name="simple-pipeline">
    <p:input port="source" primary="true" sequence="false"/>
    <p:output port="result" primary="true" sequence="false"/>
    <p:xslt name="step1">
        <p:input port="source">
            <p:pipe step="simple-pipeline" port="source"/>
        </p:input>
        <p:input port="stylesheet">
            <p:document href="/db/xproc/examples/stylesheet.xml"/>
        </p:input>
    </p:xslt>
</p:declare-step>

run simple-pipeline.xproc?stdin=/db/xproc/test.xml

An XML document is brought in as standard input using the stdin url param. The XProc processor uses this XML document as the input to the first step, <p:xslt> step, which applies an XSLT process using stylesheet.xml.

As the pipeline contains only a single step, results of XSLT processing are placed onto the result port for the pipeline, providing the XML document to standard output. The following figure illustrates this process, outlining where the XML document flows from source and result ports.

In the Simple Pipeline listing, I used <p:pipeline>, which implicitly declared a source input and result output port. Using <p:declare-step> now means that I have to explicitly define these ports as well as declare step bindings between sequential sibling steps. These bindings and ports are summarized below:

Top-level source input port will receive any standard input.
Top-level result output port will receive the results of the step1 result port and place them on the standard output.
step1 source input is bound to the source input port.

With a one step pipeline, it's difficult to illustrate step bindings between steps; so, I have created a nontrivial example which contains several steps. The next code listing presents a more representative XProc example containing multiple steps along with some conditional logic steps.

Example: Complex Pipeline - p:choose and p:for-each

<p:pipeline xmlns:p="http://www.w3.org/ns/xproc" xmlns:c="http://www.w3.org/ns/xproc-step" name="pipeline">

<p:compare name="compare">                       (: compare test step :)
<p:input port="alternate">
   <p:document href="/db/xproc/test.xml"/> (: example of using p:document :)
</p:input>
</p:compare>

<p:choose name="mychoosestep">
   <p:when test=".//c:result[.='false']">      (: note the eXist specific path convention with root :)
       <p:identity>
           <p:input port="source">
               <p:inline>
                   <p>This pipeline failed.</p>
               </p:inline>
           </p:input>
       </p:identity>
   </p:when>
   <p:when test=".//c:result[.='true']">  (: success :)
   <p:identity>
       <p:input port="source">
           <p:inline>
               <p>This pipeline successfully processed.</p>
           </p:inline>
       </p:input>
   </p:identity>
   </p:when>
   <p:otherwise>
       <p:identity>
           <p:input port="source">
               <p:inline>
                   <p>This pipeline failed.</p>
               </p:inline>
           </p:input>
       </p:identity>
   </p:otherwise>
</p:choose>

<p:identity>   (: currently need to define p:step to get multi container step output :)
    <p:input port="source">
        <p:step port="result" step="mychoosestep"/>
    </p:input>
</p:identity>


</p:pipeline>

run complex-pipeline.xproc

This pipeline roughly translates to the following:

1. Input stdin XML documents.
2. Apply <xinclude> processing (step1).
3. Choose (step2) between using an newer (step2a1) or older (step2b1) schema and validate.
4. Extract each (step3) HTML <div> element, applying a string replace operation (step3a1).
5. Wrap up (step4) the final sequence of <div> elements with a <document> element.
6. Output the XML documents to stdout.

Steps can have other input or output ports defined that work with non-XML documents, but only XML documents (as in XML infoset) can flow between primary input and output ports.

4.1. Core Steps

4.2. XProc Components

4.3. Standard Steps

p:add-attribute [spec]: Add an attribute to a set of matching elements.
p:add-xml-base [spec]: Add or correct xml:base attributes on elements.
p:compare [spec]: Compare two documents for equivalence.
p:count [spec]: Count the number of documents in source input.
p:delete [spec]: Delete items specified by a match pattern from the source input.
p:directory-list [spec]: Enumerate the directory listing into the result output.
p:error [spec]: Generate a dynamic error.
p:escape-markup [spec]: Escape source input.
p:http-request [spec]: Interact with resources identified by Internationalized Resource Identifiers (IRIs) over HTTP.
p:identity [spec]: Make an exact copy of an input source to the result output.
p:insert [spec]: Insert an XML selection into the source input.
p:label-elements [spec]: Create a label for each matched element, and store the value of the label in an attribute.
p:load [spec]: Load an XML resource that an IRI specifies and provide it as result output.
p:make-absolute-uris [spec]: Make the value of an element or attribute in the source input an absolute IRI value in the result output .
p:namespace-rename [spec]: Rename the namespace declarations.
p:pack [spec]: Merge two document sequences.
p:parameters [spec]: Make available a set of parameters as a c:param-set XML document in the result output.
p:rename [spec]: Rename elements, attributes, or processing instruction.
p:replace [spec]: Replace matching elements.
p:set-attributes [spec]: Set attributes on matching elements.
p:sink [spec]: Accept source input and generate no result output.
p:split-sequence [spec]: Divide a single sequence into two.
p:store [spec]: Store a serialized version of its source input to a URI.
p:string-replace [spec]:Perform string replacement on the source input.
p:unescape-markup [spec]: Unescape the source input.
p:unwrap [spec]: Replace matched elements with their children.
p:wrap [spec]: Wrap matching nodes in the source document with a new parent element.
p:wrap-sequence [spec]: Produce a new sequence of documents.
p:xinclude [spec]: Apply XInclude processing to the input source.
p:xslt [spec]: Apply an XSLT version 1.0 or XSLT version 2.0 style sheet input source.

4.4. Optional Steps

p:exec [spec]: Apply an external command to the input source.
p:hash [spec]: Generate a message digest or a digital fingerprint for some value.
p:uuid [spec]: Generate a Universally Unique Identifier (UUID).
p:validate-with-relax-ng [spec]: Validate the input XML with RelaxNG schema.
p:validate-with-schematron [spec]: Validate the input XML with Schematron schema.
p:validate-with-xml-schema [spec]: Validate the input XML with XML schema.
p:www-form-urldecode [spec]: Decode the x-www-form-urlencoded string into a set of XProc parameters.
p:www-form-urlencode [spec]: Encode a set of XProc parameter values as an x-www-form-urlencoded string.
p:xquery [spec]: Apply an XQuery version 1.0 query.
p:xsl-formatter [spec]: Render an XSL version 1.1 document (as in XSL-FO).

4.5. Extension Steps

Implementation specific steps

4.6. XProc Functions

p:system-property(string property)
p:step-available(string step-type)
p:iteration-position()
p:iteration-size()
p:base-uri()
p:resolve-uri(String relative)

5. Reuse and Extending XProc

eXist XProc implementation provides a range of extension mechanisms for creating new steps.

5.1. Defining XProc libraries using XQuery

5.2. Defining XProc libraries using XSLT

5.3. Defining XProc libraries using Java

6. W3C XProc Unit Test

The following links run xprocxq against W3C XProc Unit Test suite

Important

Please note that you will need to enable the File extension module and download the W3C XProc test suite to run these tests.

7. Additional XProc Resources

7.1. Mailing Lists

W3C XProc dev mailing list xproc-dev@w3.org

7.2. XProc articles

7.3. XProc Specifications

XProc: An XML Pipeline Language]: Explore this W3C XProc Editors Draft (the W3C working draft dated 01 May 2008).
XProc: Section E. Guidance on Namespace Fixup: Review the non-normative list of suggestions for implementors to follow to reduce the need to fix up namespaces.
XProc W3C unit test suite: Repository where draft unit test suite for XProc is being developed.

7.4. Useful

7.5. EXProc, EXpath and EXQuery Community Specifications

EXProc: Intended to be a place to discuss and define extension steps to XProc.
EXPath: Community defined extensions to XPath. Currently, xprocxq uses expath http-client module.
EXQuery: Community defined extensions to XQuery. Predicted that xprocxq will use any output from this effort.

7.6. XQuery

Jim Fuller's Advancing with XQuery: IBM Developerworks article on implementing application idioms in XQuery
XQuery WikiBooks: Excellent XQuery reference led by Chris Wallace et al.
Chris Wallace Unit Testing: Nice article on the right way to do Unit Testing in XQuery
Jim Fuller 'poor mans' XQuery unit testing: simple XQuery module for facilitating tests that I developed during development of xprocxq.

7.7. XProc links

XML Pipeline Definition Language Version 1.0: W3C Note 28 February 2002: Read the note submitted by Sun Microsystems, Alis Technologies, Arbortext, Cisco Systems, Fujitsu, Markup Technology, and Oracle.
XML Processing Model Requirements: W3C Working Group Note 05 April 2004: Peruse the W3C WG Note from 05 April 2004.
XML Pipeline Language (XPL) Version 1.0 (Draft): Review the W3C Member Submission submitted by Orbeon on 11 March 2005 and published on 11 April 2005.
XProc: An XML Pipeline Language (with revision marks): Peruse the W3C Working Draft with differences dated 8 May 2008.
XML Processing Model Requirements and Use Cases: Read the W3C XProc requirements and use cases document dated 11 April 2006.
XML Pipeline Language (XPL) Version 1.0 (Draft): Check out this draft of the early W3C member submission of an XML pipeline language.

8. xprocxq FAQ

8.1. Why can't I access files from filesytem ?

By default, XProc is set to read files only from the XML Database. If you want to access files from the hard drive then you will need to enable eXist File extension module and make sure to use file:// prefix in your file paths.

8.2. What is xprocxq ?

Initially, development of eXist XProc processor was a standalone project, called xprocxq. The i

8.3. xprocxq compliance and limitations

xprocxq, being implemented in XQuery, currently has several limitations and is no where compliant with the existing XProc draft specification. The best way to understand what works or doesn't currently is to check out step examples included in the release.

Here is a list of the more severe limitations:

when selecting elements in a namespace u will have to generically select them
due to my use of weak typing in XQuery there are several interrelated issues, but this will change as I sanitize the code
have disabled namespace management for the time being which applies XProc namespace fixup rules
defining reusable pipelines using p:library, p:import and p:declare-step is currently disabled
errors are thrown as XQuery errors, making it difficult to report correct line numbers, not to mention that errors themselves look quite ugly
p:choose xpath-context, iteration, etc is not implemented just yet
some step sorting issues (specifically intermixing of steps using p:inline), best to use explicit port binding to completely avoid.
p:xquery currently uses a c:result element to output, I also added a xproc:escape attribute to c:query to avoid having to wrap XQuery using CDATA (which is XProc spec requirement).
declare base-uri affects module imports (need to investigate across all XQuery processors)
cant pass required/add-attribute--002.xml test depends on namespace handling module to be finished
errors and/or p:error does not write to error port (also discussing need for a generic 'implementator specific error code' with XProc WG)
current preparsing routine is naive; I have a more rigorous solution in a source control branch to merge

8.4. Why is xprocxq implemented in XQuery ?

XQuery's somewhat functional approach appealed to me and having been a long time XSLT user I wanted to gain some understanding of the nuances between XSLT and XQuery.

In building xprocxq, my primary goals were;

creating an implementation in XQuery meant that XProc processor will be as performant as the underlying XQuery implementation and be able to naturally process XML data stored in XML Database.
make it easier to allow for extension steps to be built using xquery (and xslt)
Exercise fundamental FP principles building a non trivial XQuery application. I think I have achieved this goal, as xprocxq 'main engine' is the u:step-fold function (in util.xqm) which operates on simple pipelines.
To understand variability between XQuery processors. I think there is a surprising amount of differences between XQuery processors, but thankfully mostly in the form of extension functions, implicit type casting and in smaller part some interpretations of the spec (where the spec allowed for impl to do things as they see fit). The real impact though is that we need things like EXPath and EXQuery to allow for even a remote chance of compatibility between XQuery processors

I think that most people will find using XProc with XQuery is a powerful combination which can be used to implement a wide range of server side applications.

November 2009

James Fuller
jim.fuller at webcomposite.com

Open Source Native XML Database

About Us

Documentation

Examples

Community

Development

Administration