Balaji Vajjala's Blog

A DevOps Blog from Trenches

Groovy regular expressions

Because of the compact syntax regular expressions in Groovy are more readable than in Java. Here is how Jeffrey Friedl’s example looks in Groovy:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
def subDomain  = '(?i:[a-z0-9]|[a-z0-9][-a-z0-9]*[a-z0-9])' // simple regex in single quotes
def topDomains = $/
    (?x-i : com         \b     # you can put whitespaces and comments
          | edu         \b     # inside regex in eXtended mode
          | biz         \b
          | in(?:t|fo)  \b     # backslash is not escaped
          | mil         \b     # in dollar-slash strings
          | net         \b
          | org         \b
          | [a-z][a-z]  \b
    )/$

def hostname = /(?:${subDomain}\.)+${topDomains}/  // variable substitution in slashy string

def NOT_IN   = /;\"'<>()\[\]{}\s\x7F-\xFF/     // backslash is not escaped in slashy strings
def NOT_END  = /!.,?/
def ANYWHERE = /[^${NOT_IN}${NOT_END}]/
def EMBEDDED = /[$NOT_END]/                        // you can ommit {} around var name

def urlPath  = "/$ANYWHERE*($EMBEDDED+$ANYWHERE+)*"

def url =
    """(?x:
             # you have to escape backslash in multi-line double quotes
             \\b

             # match the hostname part
             (
               (?: ftp | http s? ): // [-\\w]+(\\.\\w[-\\w]*)+
             |
               $hostname
             )

             # allow optional port
             (?: :\\d+ )?

             # rest of url is optional, and begins with /
             (?: $urlPath )?
       )"""

assert 'http://www.google.com/search?rls=en&q=regex&ie=UTF-8&oe=UTF-8' ==~ url
assert 'pages.github.io' ==~ url

As you can see, there are several notations, and for every subexpression you can choose the one that is most expressive.

Resources