eXist: Current State, Features and Roadmap

Module	Status	Priority	Test Coverage	Progress	Who
1. Document Storage
1.1. File size/complexity limits	Stable	x	Tested	100%	wolf
The numbering scheme at the core of eXist 1.0 did limit the maximum size of a document to be stored within the database. In eXist releases 1.1 and later this is fixed.
1.2. Collection storage	Stable, but subject to redesign	High	No tests	0%	wolf
The current organization of collections and resources causes a number of problems with respect to (a) locking, (b) query performance, (c) update performance. Right now, documents are tightly bound to the collection in which they are contained. Any operation on a document has to go through its parent collection. As a result, locking and access control becomes quite complex as we need to take care of the document and the collection. There's also a direct dependency between the size of a collection (in terms of the number of documents stored in it) and document update speed. If a collection has a large number of documents, removing a single document becomes very slow. This problem can be solved by physically decoupling documents and collections: a collection should be modelled as a logical unit, not a physical one. current integer document ids should be replaced by hierarchical ids, which model the entire collection/document path. the link between the resource and the collection is only done through the hierarchical document id. locking and access control should be delegated to a central lock manager, which just contains a map DocumentProxy -> Lock
1.3. DOM Redesign	Stable	x	Tested	100%	wolf
eXist currently uses 2 DOM (document object model) implementations: one for nodes stored in the db, and a different one for in-memory nodes constructed during an XQuery. The two models can not be mixed. To process an XQuery expression on an in-memory node, the query engine needs to create a temporary persistent copy. This costs performance and did cause stability issues in the past. To solve those problems, we are currently redesigning the in-memory DOM to implement the same interfaces as the persistent DOM. The query engine should be able to mix nodes of both models. The current SVN trunk can directly evaluate XQuery expressions on the in-memory DOM, which solves all previous issues and contributes to the overall stability of the db. Refer to this article for details.
1.4. Allow metadata to be associated with a document	Open	Avg	N/A	0%
Metadata could include system properties like last-modification date or user-defined metadata. Preferably, metadata records should be ordinary XML documents. The format should not be restricted.
2. Indexing
Since version 1.2, there are now alternative index configuration methods which support the optimizer in rewriting a query for best performance. A new modularized indexing architecture allows to plug in arbitrary new indexes into the indexing pipeline. A N-gram and a spatial index module were added as prototypes to test the new architecture. There will be other index types added in the future, for example: combined path indexes to speed up frequently-used XPath expressions indexes to support operations on atomic-values, e.g. in order-by expressions or the distinct-values function integration of specialized, external index types like the spatial index module mentioned above.
2.1. Full text indexing	Beta	Avg	Tested	75%	wolf
As of Oct 2008, a new full text index is available. It is based on Lucene and eXist's new modularized indexing architecture.
2.2. XQuery and XPath Full Text 1.0	Open	Avg	N/A	15%
Implement the W3C's full text extensions to XQuery, probably based on the new Lucene index module (see above). Many features could be easily implemented on top of Lucene. The grammar for the full text extensions has already been merged into the XQuery parser as part of a Google Summer of Code project and is available in an SVN branch. It just needs to be filled with life.
2.3. Range indexing	Stable	x	Tested	100%
No remarks available.
2.4. Indexes on xml:id	Stable	x	Tested	100%
Currently stored in the structural index. Should be moved to the range index. As of version 1.2.4 and 1.3, xml:id values are stored in the range index.
2.5. N-gram	Stable	x	Tested	100%	wolf
When dealing with texts in many non-European languages, the token-based full-text index produces insufficient results. Tokenization is currently based on Unicode code points. Most chinese characters, for example, are thus stored as single tokens. Users have to abuse the near() or phrase() function to search for character sequences consisting of more than one character, which is quite slow. It also means that real proximity searches are not available. An N-gram based index would be much more suitable for these languages. It would also allow additional functionality to be implemented, e.g. to deal with varying spellings. The main question is how the N-gram index would integrate conceptually with the existing full-text functions. A N-gram index based on the new modularized indexing architecture is in SVN trunk as of July 2007 and in eXist 1.2 and later releases.
2.6. Integration of other index types (e.g. Spatial indexes, external indexes)	Beta	Avg	N/A	75%
eXist now offers spatial indexes in SVN trunk as of July 2007 and in version 1.2 and later releases.
2.7. Index-support for order-by, distinct-values	Open	Avg	N/A	0%
Order-by expressions and other functions that need to access atomized nodes are not supported by indexes.
2.8. Collation-driven indexing	Open	Avg	N/A	0%
Maybe part of FT index redesign.
3. Transactions and Recovery
The journal log and the recovery manager should be stable and are covered by extensive tests. However, recovery failures can not be excluded entirely. The tests can't reproduce every possible real-world scenario. However, some steps remain for eXist to become a fully transactional database system. Transaction support is currently limited to the functionality needed for crash recovery. Though we maintain transactions internally, they are currently not exposed to applications. Also, read operations are not transactional right now. In order to allow user-defined ACID transactions with support for rollback, all index files would need to be protected by the journaling log. The required functionality is basically available, but the feature is currently not regarded as high-priority.
3.1. Journal log	Stable	x	Tested	100%
No remarks available.
3.2. Recovery	Stable	x	Tested	100%
No remarks available.
3.3. Internal transaction management	Stable	x	Tested	100%
Transactions are maintained internally, but they are not exposed to applications. eXist does not yet support full ACID transactions. Read-only operations bypass the transaction system.
3.4. User-definable transactions	Open	Low	N/A	0%
Journal logs are limited to critical data required for recovery. No transaction rollbacks.
4. Backup / Restore
4.1. Backup / Restore Tool	Stable	x	No tests	100%
No remarks available.
4.2. Store configuration into backup	Open	x	No tests	0%
The backup utility should also include a copy of the relevant server configuration files (mainly conf.xml) into the created backup. Settings like page size, additional XQuery or index modules etc. are important.
4.3. DB repair tool	Stable	x	No tests	100%	wolf
Create a DB repair tool which can handle and resolve inconsistencies in the database structure. It should be possible to recreate the db if at least dom.dbx, collections.dbx and symbols.dbx are more or less intact. If a single document is damaged, it could be filtered out. As of version 1.2.4, eXist provides a consistency check and repair tool.
5. Configuration
5.1. Dynamic configuration of the database via Java Management Extensions (JMX)	Open	x	No tests	0%
Main problem: access control and security.
6. Node-level updates
6.1. XUpdate	Stable	x	Tested	100%
No remarks available.
6.2. XQuery Update Extensions	Stable, but subject to redesign	x	Tested	75%
eXist's XQuery Update Extensions were implemented before the first W3C drafts were published. The differences between the W3C recommendation and our implementation are not that big, though there are some subtle differences concerning the processing model. In fact, the recommendation simplifies a few issues. http://www.w3.org/TR/xquery-update-10/
7. Access-Control
The currently implemented Unix-like access control scheme is sufficient to protect resources and collections in a multi-user environment. However, it might be too coarse-grained for some types of applications. A more dynamic ACL implementation could help here. Right now, security management forms part of the database core. This is unnecessary. A more modular architecture would allow different security managers to be plugged in. It would be the responsibility of the security manager implementation to handle ACL lists. Since version 1.1, eXist supports the XACML standard for fine-grained access control to stored XQueries, Java classes etc.
7.1. User management	Stable	x	No tests	100%
No remarks available.
7.2. Access control on resources and collections	Stable, but subject to redesign	Avg	No tests	100%
Need more dynamic ACL structures that can adapt to varying requirements.
7.3. Access control on stored XQueries, XQuery functions and modules	Stable	x	incomplete	100%
No remarks available.
7.4. Java binding	Stable	x	N/A	100%
No remarks available.
8. Schema Validation
8.1. Validate document against schema when indexing	Stable	x	No tests	100%
No remarks available.
8.2. Validate document after node-level updates	Open	Avg	N/A	0%
No remarks available.
8.3. Locate schema's and DTDs stored in database	Beta	High	x	90%
No remarks available.
8.4. Support for catalog files in database	Beta	High	x	90%
No remarks available.
8.5. Manual validation against schema	Beta	High	Tested	75%
No remarks available.
8.6. XQuery validation features	Open	Avg	N/A	0%
No remarks available.
8.7. Store PSVI with the node tree in the database	Open	Low	N/A	0%	dizzzz
No remarks available.
8.8. Static typing based on PSVI	Open	Low	N/A	0%	dizzzz
No remarks available.
8.9. Support for RelaxNG and Schematron	Open	Low	No tests	0%	dizzzz
No remarks available.
9. XQuery
The XQuery engine as well as the standard function libraries should be updated to align with the XQuery 1.0 recommendation. Basically, almost all core language features are implemented, excluding schema related features, which are currently beyond eXist's scope. XQuery support in eXist is covered by the official W3C XQuery Test Suite (XQTS) 1.02. Implementing the official XQTS XQuery test suite was a top priority in order to guarantee standard conformance and avoid future regressions.
9.1. Core XPath and XQuery	Stable	x	tested	100%
Updated to the XPath 2.0 and XQuery 1.0 recommendations. Stable, excluding schema-related features.
9.2. XPath and XQuery atomic value types	Stable	Avg	tested	99.4%
9.3. XPath and XQuery function libraries	Stable	High	tested	99.4%
Updated to XPath 2.0 and XQuery 1.0 recommendations. Stable, excluding schema-related features.
9.4. XPath and XQuery function libraries	Stable	High	tested	99.4%
Updated to XPath 2.0 and XQuery 1.0 recommendations.
9.5. XQuery serialization	Stable, but subject to redesign	Avg	tested	80%
Though we implement most of the serialization options specified in the XQuery and XSLT serialization spec, some options need to be reworked and should be covered by tests.
9.6. XQuery test suite – XQTS	Stable	x	N/A	100%
9.7. XQuery Optimizer	Stable	High	tested	90%	wolf, perig
With the 1.2 release, eXist features a new query-rewriting optimizer. It analyzes the query at compile time and searches for optimizable subexpressions within the query tree. If it finds an optimizable expression, the optimizer will modify the query and wrap some special instructions around the optimizable code block. Together with the new indexing features (see blog article, the optimizer can achieve dramatic improvements. However, the optimizer is currently limited to predicate expressions. It does not optimize e.g. "where" clauses in a FLWOR statement. The query-rewriting should thus be extended to recognize other types of expressions beyond predicate statements. In short, we need a better static analysis of the query. Based on eXist's current indexes, it is also often difficult to decide if a certain optimization path leads to performance improvements or not. Better index statistics could help here. Also, there's a wide range of performance optimizations which could be applied if we had appropriate statistics on node distribution and frequency. As of Oct 2008, the current trunk has a statistics module, but it is not yet used for real optimizations.
9.8. Error reporting	Stable, but subject to redesign	Avg	N/A	75%
Error reports by the XQuery parser and compiler need to be improved.
9.9. Make function calls tail-recursive	Stable	x	70	100%
Recursive functions may trigger a StackOverflowException. We need to handle tail-recursion. No issues with recursive functions have been reported during the past year. The tail recursion handling is reliable.
9.10. Better Try-Catch	Stable	Low	0	0%
eXist already provides a util:catch() function which basically corresponds to a Java try-catch. The XQuery 1.1 draft also defines try-catch, but more closely integrated with XQuery. It should not be too difficult to implement this in eXist. See http://www.zorba-xquery.com/doc/zorba-0.9.2/zorba/html/trycatch.html for some examples.
9.11. XQuery Debugger	Open	High	N/A	75%	ljo, dmitriy
Remote debugging protocol - DBGp Command line debugger similar to jdb as prototype Decide which functionality to expose.
9.12. Drop and deprecate xmldb:collection()	Stable	x	tested	100%	delirium
9.13. Move fn:document() to xmldb:document() and deprecate	Stable	x	tested	100%	perig
10. XInclude	Stable	Low	No tests	80%
XInclude expansion happens at serialization time. Queries across the included document fragments are not possible. Stable, but limited.
11. Interfaces
11.1. XML:DB API	Stable	x	Tested	100%
No remarks available.
11.2. XML-RPC	Stable	x	Partially tested	100%
Exposes the entire database functionality.
11.3. REST	Stable, but subject to redesign	Low	Partially tested	90%	delirium
Does not cover administrative functions, e.g. user-management and permissions. Stable, but further functionality could be exposed. The XQuery API for Java (XQJ) implementation makes heavy use of REST and improves the interface in some aspects. However, the XQJ branch has not yet been merged into trunk.
11.4. SOAP	Stable	Low	No tests	90%
11.5. Cocoon Integration	Stable	x	No tests	100%
General functionality tests required
11.6. XQJ XQuery API for Java (JSR-225)	Beta	Low	N/A	70%	allad, perig, ljo, dizzzz
A nearly finished implementation of the XQuery API for Java (XQJ) is available in SVN. It was started as a Google Summer of Code project in 2007 and provides two XQJ drivers, one for embedded database access, one for remote access. I recently invested some time to bring the branch into sync with the current trunk. The remote database driver, which is based on the REST-style HTTP interface of eXist, should already be pretty usable. The embedded driver needs some redesign though (it breaks the test suite). However, it should not be too much work to fix this and finally merge it into trunk. Open Tasks: integrate the official XQJ test suite test for standard conformance fix the embedded database driver merge into trunk performance tests
11.7. XForms filter	Beta	Low	N/A	80%	delirium
No remarks available.
12. Documentation
12.1. XQuery stored modules	Stable	x	N/A	100%
calling XQuery scripts stored in the DB; import stored modules into a query passed to the DB.
12.2. WebDAV	Beta	Avg	N/A	90%
No remarks available.
12.3. Deployment	Stable	x	N/A	100%
Integration with a servlet engine, Cocoon, stand-alone server, embedded use.
12.4. Index creation, index configuration and query rewriting	Stable	x	N/A	100%
No remarks available.
12.5. Validation	Stable	x	N/A	100%	dizzzz
No remarks available.
12.6. Trigger	Beta	Low	N/A	0%	delirium
No remarks available.
12.7. Searchable Documentation	Stable	x	N/A	100%	wolf
The documentation that comes with eXist should be made searchable. I started working on a search interface which is based on eXist's own full text search features. As of Oct. 2008, a search function is provided. It requires features only available in SVN trunk though.
12.8. XQDoc integration	Alpha	High	Partially tested	60%	ljo, wolf
Migrate the function documentation to XQDoc. Use XQDoc to better document all XQuery examples.
13. Releases
13.1. 1.2	Stable	High	N/A	99%
Imperative release with new features from 2006–2007 to replace version 1.0-line. The last release based on Java 1.4.
13.2. 1.4	Open	Low	N/A	0%
Based on Java 5.
14. Other Tasks
14.1. Complete New Sandbox Application	Beta	Avt	N/A	70%%	wolf
Some time ago we did a major rewrite of the sandbox application: http://demo.exist-db.org/sandbox2/sandbox.xql It is nearly usable, but the syntax-highlighting editor component is limited. In particular, it has no support for safari. Also, the sandbox still uses util:eval to execute the entered XQuery code. This works well for simple queries, but can have side-effects if the query imports external modules or tries to access the http context. For the purpose of the sandbox, it would be better to post the user-supplied query to a dedicated servlet, which executes the code and sends back a result.
14.2. I18n	Open	Low	N/A	20%
Provide translations for error messages, console outputs etc. At least, resource bundles should be used, so others can translate them if they want.
14.3. Clean up/upgrade libraries	Beta	Low	N/A	60%
All libraries included with eXist need to be checked.
14.4. Move to Java 5	Open	Low	N/A	0%
Change build taget in switch for development branch 1.3 to Java 5 after eXist version 1.2 is released. This will make some more stuff easier. Especially since the jmx monitors and junit 4 tests are by requirements targeted for Java 5 already.
14.5. Move to ANTLR3 parser	Instable	High	N/A	60%	ljo
Change the parser to ANTLR3 which is better performant, LL* lookahead capable and processes whitespace in a better manner than ANTLR2 which we currently are using. Maybe use gunit for testing?
14.6. Move to AtomicWiki	Stable	High	N/A	100%	wolf
Change from the current spam-ridden and unmaintained wiki to our own Atom-based AtomicWiki.

Legend

Percentage	Description
0	work not started
20	1-20 Percentage of completion
40	21-40 Percentage of completion
60	41-60 Percentage of completion
80	61-80 Percentage of completion
99	81-99 Percentage of completion
Done	100 Percentage of completion

Priority	Description
1. Highest	Very important
2. High	Important
3. Avg	Nice to have
4. Low	Not very important
5. x	Not yet decided

Open Source Native XML Database

About Us

Documentation

Examples

Community

Development

Administration

eXist: Current State, Features and Roadmap

Legend