XQuery

XQuery

(pwalmsley@datypic.com)

ISBN: 0596006349

1st edition, , O'Reilly Media, Inc.

Chapter 18: Regular expressions

Table 18-2. Quantifier examples
Regular expressionStrings that matchStrings that do not match
fo fo f, foo
fo? f, fo foo
fo* f, fo, foo, fooo, … fx
fo+ fo, foo, fooo, … f
fo{2} foo fo, fooo
fo{2,} foo, fooo, foooo, … f, fo
fo{2,3} foo, fooo f, fo, foooo
Table 18-3. Examples of parentheses in regular expressions
Regular expressionStrings that matchStrings that do not match
(fo)+z foz, fofoz z, fz, fooz, ffooz
(fo|xy)z foz, xyz z
(fo|xy)+z fofoz, foxyz, xyfoz z
(f+o)+z foz, ffoz, foffoz z, fz, fooz
yes|no yes, no
Table 18-5. Representing individual characters
Regular expressionStrings that matchStrings that do not match
d d g
d+efg+ defg, ddefgg defgefg, deffgg
defg defg d, efg
d|e|f d, e, f g
f*o fo, ffo, fffo f*o
f\*o f*o fo, ffo, fffo
déf déf def, df
Table 18-6. The wildcard escape character
Regular expressionStrings that matchStrings that do not match
f.o fao, fbo, f2o fo, fbbo
f..o faao, fbco, f12o fo, fao
f.*o fo, fao, fbcde23o f o
f\.o f.o fao
In the third example, assume a line feed character between f and o. This string does not match unless you are in dot-all mode.
Table 18-10. Representing groups of characters
Regular expressionStrings that matchStrings that do not matchComment
f\d f0, f1 f, f01 multi-character escape
f\d* f, f0, f012 ff multi-character escape
f\s*o fo, fo foo multi-character escape
\p{Ll} a, b A, B, 1, 2 category escape
\P{Ll} A, B, 1, 2 a, b category escape
\p{L} a, b, A, B 1, 2 category escape
\P{L} 1, 2 a, b, A, B category escape
\p{IsBasicLatin} a, b â, ß block escape
\P{IsBasicLatin} â, ß a, b block escape
Table 18-11. Character class expression examples
Regular expressionStrings that matchStrings that do not matchComment
[def] d, e, f def Single characters
[def]* d, eee, dfed a, b Single characters, repeating
[\p{Ll}d] a, b, 1 A, B Single characters with escapes
[d-f] d, e, f a, D Range of characters
[0-9d-fD-F] 3, d, F a, 3dF Multiple ranges
[0-9stu] 4, 9, t a, 4t Range plus single characters
[s-u\d] 4, 9, t a, t4 Range plus single-character escape
[a-x-[f]] a, d, x f, 2 Subtracting from a range
[a-x-[fg]] a, d, x f, g, 2 Subtracting from a range
[a-x-[e-g]] a, d, x e, g, 2 Subtracting from a range with a range
[^def] a, g, 2 d, e, f Negating single characters
[^\[] a, b, c [ Negating a single-character escape
[^\d] d, E 1, 2, 3 Negating a multi-character escape
[^a-cj-l] d, 4 b, j, l Negating a range
Table 18-12. Reluctant versus nonreluctant quantifiers
ExampleReturn value
replace("reluctant", "r.*t", "X") X
replace("reluctant", "r.*?t", "X") Xant
replace("aaah", "a{2,3}", "X") Xh
replace("aaah", "a{2,3}?", "X") Xah
replace("aaaah", "a{2,3}", "X") Xah
replace("aaaah", "a{2,3}?", "X") XXh
Table 18-13. Anchors
Regular expressionStrings that matchStrings that do not match
str str, str5, 5str, 5str5 st, sttr
^str$ str 5str5, str5, 5str
^str str, str5 5str5, 5str
str$ str, 5str 5str5, str5
Table 18-14. Anchors in multi-line mode
Regular expressionStrings that matchStrings that do not match
str str st
^str$ str, 555 str 555 555str 555
^str str555, 555 555str 555str 555
str$ 555str, 555str 555 555 str555
Some of the examples span several lines; individual examples are separated by commas.
Table 18-15. Examples of the $flags argument
ExampleReturn value
matches($address, "Street.*City") false
matches($address, "Street.*City", "s") true
matches($address, "Street$") false
matches($address, "Street$", "m") true
matches($address, "street") false
matches($address, "street", "i") true
matches($address, "Main Street") true
matches($address, "Main Street", "x") false
matches($address, "Main \s Street", "x") true
matches($address, "street$", "im") true
Useful function: get-matches-and-non-matches (see also functx:get-matches-and-non-matches)
declare namespace functx = "http://www.functx.com";
declare function functx:get-matches-and-non-matches
($string as xs:string?, $regex as xs:string) as element()* {
  let $iomf := functx:index-of-match-first($string, $regex)
  return
   if (empty($iomf))
   then <non-match>{$string}</non-match>
   else if ($iomf > 1)
   then (<non-match>{substring($string,1,$iomf - 1)}</non-match>,
         functx:get-matches-and-non-matches(
           substring($string,$iomf),$regex))
   else
   let $length :=
        string-length($string) -
        string-length(functx:replace-first($string, $regex,''))
    return (<match>{substring($string,1,$length)}</match>,
            if (string-length($string) > $length)
            then functx:get-matches-and-non-matches(
                  substring($string,$length + 1),$regex)
            else ())
 } ;
      
declare function functx:index-of-match-first
($arg as xs:string?, $pattern as xs:string) as xs:integer? {
  if (matches($arg,$pattern))
  then string-length(tokenize($arg, $pattern)[1]) + 1
  else ()
 } ;

declare function functx:replace-first
($arg as xs:string?, $pattern as xs:string,
$replacement as xs:string ) as xs:string {
  replace($arg, concat('(^.*?)', $pattern),
           concat('$1',$replacement))
 } ;
      
Table 18-16. Examples of using replacement variables
ExampleReturn value
replace("Chap 2…Chap 3…Chap 4…","Chap (\d)", "Sec $1.0") Sec 2.0…Sec 3.0…Sec 4.0…
replace("abc123", "([a–z])", "$1x") axbxcx123
replace("2315551212", "(\d{3})(\d{3})(\d{4})", "($1) $2-$3") (231) 555-1212
replace("2006-10-18", "\d{2}(\d{2})-(\d{2})-(\d{2})", "$2/$3/$1") 10/18/06
replace("25", "(\d+)", "\$$1.00") $25.00
Datypic XQuery Services