A RESTful browser for eXist Java-Based Function Modules


http://exist-db.org/xquery/text
A module for text searching extension functions.

text:filter($text as xs:string, $regularexpression as xs:string) xs:string*
Filter substrings that match the regular expression in the text.
$textThe text to filter
$regularexpressionThe regular expression to perform against the text

Returns the substrings

text:filter-nested($node-set as node()*) node()*
Filters out all nodes in the node set, which do have descendant nodes in the same node set. This is useful if you do a combined query like //(a|b)[. &= $terms] and some 'b' nodes are nested within 'a' nodes, but you only want to see the innermost matches, i.e. the 'b' nodes, not the 'a' nodes containing 'b' nodes.
$node-setThe node set

Returns a node set containing nodes that do not have descendent nodes.

text:fuzzy-index-terms($term as xs:string?) xs:string*
Compares the specified argument against the contents of the fulltext index. Returns a sequence of strings which are similar to the argument. Similarity is based on Levenshtein distance. This function may not be useful in its current form and is subject to change.
$termThe term

Returns a sequence of strings which are similar to the argument $term

text:fuzzy-match-all($source as node()*, $keyword as xs:string, ...) node()*
Fuzzy keyword search, which compares strings based on the Levenshtein distance (or edit distance). The function tries to match each of the keywords specified in the keyword string against the string value of each item in the sequence $source.
$sourceThe source
$keywordThe keyword string

Returns the sequence of nodes that match the keywords

text:fuzzy-match-any($source as node()*, $keyword as xs:string, ...) node()*
Fuzzy keyword search, which compares strings based on the Levenshtein distance (or edit distance). The function tries to match any of the keywords specified in the keyword string against the string value of each item in the sequence $source.
$sourceThe source
$keywordThe keyword string

Returns the sequence of nodes that match the keywords

text:groups($text as xs:string, $regularexpression as xs:string) xs:string*
Tries to match the string in $text to the regular expression. Returns an empty sequence if the string does not match, or a sequence whose first item is the entire string, and whose following items are the matched groups.
$textThe text to filter
$regularexpressionThe regular expression to perform against the text

Returns an empty sequence if the string does not match, or a sequence whose first item is the entire string, and whose following items are the matched groups.

text:groups($text as xs:string, $regularexpression as xs:string, $flags as xs:string) xs:string*
Tries to match the string in $text to the regular expression, using the flags specified. Returns an empty sequence if the string does not match, or a sequence whose first item is the entire string, and whose following items are the matched groups.
$textThe text to filter
$regularexpressionThe regular expression to perform against the text
$flagsThe flags

Returns an empty sequence if the string does not match, or a sequence whose first item is the entire string, and whose following items are the matched groups.

text:highlight-matches($source as text()*, $callback-function-ref as function, $parameters as item()*) node()*
Highlight matching strings within text nodes that resulted from a fulltext search. When searching with one of the fulltext operators or functions, eXist keeps track of the fulltext matches within the text. Usually, the serializer will mark those matches by enclosing them into an 'exist:match' element. One can then use an XSLT stylesheet to replace those match elements and highlight matches to the user. However, this is not always possible, so Instead of using an XSLT to post-process the serialized output, the highlight-matches function provides direct access to the matching portions of the text within XQuery. The function takes a sequence of text nodes as first argument $source and a callback function (defined with util:function) as second parameter. $parameters may contain a sequence of additional values that will be passed to the callback functions third parameter. Text nodes without matches will be returned as they are. However, if the text contains a match marker, the matching character sequence is reported to the callback function, and the result of the function call is inserted into the resulting node set where the matching sequence occurred. For example, you can use this to mark all matching terms with a <span class="highlight">abc</span>.
$sourceThe sequence of text nodes
$callback-function-refThe callback function (defined with util:function)
$parametersThe sequence of additional values that will be passed to the callback functions third parameter.

Returns the source with the added highlights

text:index-terms($nodes as node()*, $qnames as xs:QName+, $start as xs:string?, $function as function, $returnMax as xs:int) item()*
This version of the index-terms function is to be used with indexes that were defined on a specific element or attribute QName. The second argument lists the QNames or elements or attributes for which occurrences should bereturned. Otherwise, the function behaves like the 4-argument version.
$nodesThe set of nodes in which the returned tokens occur
$qnamesOne or more element or attribute names for which index terms are returned
$startThe optional start string
$functionThe callback function reference
$returnMaxThe maximum number of terms to report

Returns the results from the evaluation of the function reference

text:index-terms($nodes as node()*, $start as xs:string?, $function as function, $returnMax as xs:int) item()*
This function can be used to collect some information on the distribution of index terms within a set of nodes. The set of nodes is specified in the first argument $nodes. The function returns term frequencies for all terms in the index found in descendants of the nodes in $nodes. The second argument $start specifies a start string. Only terms starting with the specified character sequence are returned. If $nodes is the empty sequence, all terms in the index will be selected. $function is a function reference, which points to a callback function that will be called for every term occurrence. $returnMax defines the maximum number of terms that should be reported. The function reference for $function can be created with the util:function function. It can be an arbitrary user-defined function, but it should take exactly 2 arguments: 1) the current term as found in the index as xs:string, 2) a sequence containing four int values: a) the overall frequency of the term within the node set, b) the number of distinct documents in the node set the term occurs in, c) the current position of the term in the whole list of terms returned, d) the rank of the current term in the whole list of terms returned.
$nodesThe set of nodes in which the returned tokens occur
$startThe optional start string
$functionThe callback function reference
$returnMaxThe maximum number of terms to report

Returns the results from the evaluation of the function reference

text:kwic-display($text as text()*, $width as xs:positiveInteger, $callback-function as function, $parameters as item()*) node()*
Deprecated: kwic functionality is now provided by an XQuery module, see http://exist-org/kwic.html.This function takes a sequence of text nodes in $a, containing matches from a fulltext search. It highlights matching strings within those text nodes in the same way as the text:highlight-matches function. However, only a defined portion of the text surrounding the first match (and maybe following matches) is returned. If the text preceding the first match is larger than the width specified in the second argument $b, it will be truncated to fill no more than (width - keyword-length) / 2 characters. Likewise, the text following the match will be truncated in such a way that the whole string sequence fits into width characters. The third parameter $c is a callback function (defined with util:function). $d may contain an additional sequence of values that will be passed to the last parameter of the callback function. Any matching character sequence is reported to the callback function, and the result of the function call is inserted into the resulting node set where the matching sequence occurred. For example, you can use this to mark all matching terms with a <span class="highlight">abc</span>. The callback function should take 3 or 4 arguments: 1) the text sequence corresponding to the match as xs:string, 2) the text node to which this match belongs, 3) the sequence passed as last argument to kwic-display. If the callback function accepts 4 arguments, the last argument will contain additional information on the match as a sequence of 4 integers: a) the number of the match if there's more than one match in a text node - the first match will be numbered 1; b) the offset of the match into the original text node string; c) the length of the match as reported by the index.
$textThe text nodes
$widthThe width
$callback-functionThe callback function
$parametersThe parameters passed into the last argument of the callback function

Returns the results

Deprecated: Improved kwic functionality is now provided by a separate XQuery module, see http://exist-db.org/kwic.html. This function could be removed at anytime during the 1.5 development and will be removed in the 1.6 release.

text:kwic-display($text as text()*, $width as xs:positiveInteger, $callback-function as function, $result-callback as function, $parameters as item()*) node()*
This function takes a sequence of text nodes in $a, containing matches from a fulltext search. It highlights matching strings within those text nodes in the same way as the text:highlight-matches function. However, only a defined portion of the text surrounding the first match (and maybe following matches) is returned. If the text preceding the first match is larger than the width specified in the second argument $b, it will be truncated to fill no more than (width - keyword-length) / 2 characters. Likewise, the text following the match will be truncated in such a way that the whole string sequence fits into width characters. The third parameter $c is a callback function (defined with util:function). $d may contain an additional sequence of values that will be passed to the last parameter of the callback function. Any matching character sequence is reported to the callback function, and the result of the function call is inserted into the resulting node set where the matching sequence occurred. For example, you can use this to mark all matching terms with a <span class="highlight">abc</span>. The callback function should take 3 or 4 arguments: 1) the text sequence corresponding to the match as xs:string, 2) the text node to which this match belongs, 3) the sequence passed as last argument to kwic-display. If the callback function accepts 4 arguments, the last argument will contain additional information on the match as a sequence of 4 integers: a) the number of the match if there's more than one match in a text node - the first match will be numbered 1; b) the offset of the match into the original text node string; c) the length of the match as reported by the index.
$textThe text nodes
$widthThe width
$callback-functionThe callback function
$result-callbackThe result callback function
$parametersThe parameters passed into the last argument of the callback function

Returns the results

Deprecated: Improved kwic functionality is now provided by a separate XQuery module, see http://exist-db.org/kwic.html. This function could be removed at anytime during the 1.5 development and will be removed in the 1.6 release.

text:make-token($text as xs:string) xs:string*
Split a string into tokens
$textThe string to tokenize

Returns a sequence of tokens

text:match-all($source as node()*, $regular-expression as xs:string+) node()*
Tries to match each of the regular expression strings against the keywords contained in the fulltext index. The keywords found are then compared to the node set in $source. Every node containing ALL of the keywords is copied to the result sequence. By default, a keyword is considered to match the pattern if any substring of the keyword matches. To change this behaviour, use the 3-argument version of the function and specify flag 'w'. With 'w' specified, the regular expression is matched against the entire keyword, i.e. 'explain.*' will match 'explained' , but not 'unexplained'.
$sourceThe node set that is to be searched for the keyword set
$regular-expressionThe regular expressions to be matched against the fulltext index

Returns the sequence of all of the matching nodes

text:match-all($source as node()*, $regular-expression as xs:string+, $flag as xs:string) node()*
Tries to match each of the regular expression strings against the keywords contained in the fulltext index. The keywords found are then compared to the node set in $source. Every node containing ALL of the keywords is copied to the result sequence. By default, a keyword is considered to match the pattern if any substring of the keyword matches. To change this behaviour, use the 3-argument version of the function and specify flag 'w'. With 'w' specified, the regular expression is matched against the entire keyword, i.e. 'explain.*' will match 'explained' , but not 'unexplained'.
$sourceThe node set that is to be searched for the keyword set
$regular-expressionThe regular expressions to be matched against the fulltext index
$flagWith 'w' specified, the regular expression is matched against the entire keyword, i.e. 'explain.*' will match 'explained' , but not 'unexplained'.

Returns the sequence of all of the matching nodes

text:match-any($source as node()*, $regular-expression as xs:string+) node()*
Tries to match each of the regular expression strings against the keywords contained in the fulltext index. The keywords found are then compared to the node set in $source. Every node containing ANY of the keywords is copied to the result sequence. By default, a keyword is considered to match the pattern if any substring of the keyword matches. To change this behaviour, use the 3-argument version of the function and specify flag 'w'. With 'w' specified, the regular expression is matched against the entire keyword, i.e. 'explain.*' will match 'explained' , but not 'unexplained'.
$sourceThe node set that is to be searched for the keyword set
$regular-expressionThe regular expressions to be matched against the fulltext index

Returns the sequence of all of the matching nodes

text:match-any($source as node()*, $regular-expression as xs:string+, $flag as xs:string) node()*
Tries to match each of the regular expression strings against the keywords contained in the fulltext index. The keywords found are then compared to the node set in $source. Every node containing ANY of the keywords is copied to the result sequence. By default, a keyword is considered to match the pattern if any substring of the keyword matches. To change this behaviour, use the 3-argument version of the function and specify flag 'w'. With 'w' specified, the regular expression is matched against the entire keyword, i.e. 'explain.*' will match 'explained' , but not 'unexplained'.
$sourceThe node set that is to be searched for the keyword set
$regular-expressionThe regular expressions to be matched against the fulltext index
$flagWith 'w' specified, the regular expression is matched against the entire keyword, i.e. 'explain.*' will match 'explained' , but not 'unexplained'.

Returns the sequence of all of the matching nodes

text:match-count($source as node()?) xs:integer
Counts the number of fulltext matches within the nodes and subnodes in $source.
$sourceThe node and subnodes to do the fulltext match on

Returns the count

text:text-rank($text as node()?) xs:double
This is just a skeleton for a possible ranking function. Don't use this.
$textThe text to rank

Returns the ranking of the text
Return to list of all modules