A regular expression is a string representing a pattern used for matching some portion(s) of a target string. Regular expressions are very general and as a consequence, very complex with many different types of operations represented as special characters, or meta-characters.
^ . [ $ ( ) | * + ? { \
A non-alphanumeric character acts as the literal character
(whether special or not) by
escaping it, namely, adding \ in front of it.
If the character list begins with '^', it matches any single character not from the rest of the list.
If two characters in the list are separated by '', this is shorthand for the (inclusive) range of characters between those two. It is illegal for two ranges to share an endpoint, e.g. a-c-e. Ranges are collating-sequence dependent and should be avoided for portability. Most special characters lose their special status and become literals within brackets. Additionally,
( regular_expression )
For example, these are atoms:
a b [abc] (a|b+c) [^bc]these, in addition to all above, are pieces:
a* c+ (ca*b)* (a|b+c){3}
these, in addition to all above, are branches:
aba a*b* [ab]*c* (a|b+c){3,10}(b|a+c){4}
these, in addition to all above, are regular expressions:
a*b|ab* a|b+|c* ((ab|ba){3,10}[cd]|aa*bc)+
| pattern | description |
|---|---|
| .|\n | matches any character |
| [a-zA-Z] | matches any letter |
| [a-z]{4} | matches a lower-case four-letter word |
| [^^-] | matches any character except ^ or - |
| ^[a-zA-Z]+$ | is a word of letters |
| ^[a-zA-Z_]\w*$ | is a Java identifier |
| ^[\d.]+$ | is a string of digits or dots |
| ^\s*$ | is whitespace sequence or empty |
| ^\s+|\s+$ | is a leading or terminating whitespace sequence |
| ^(I|you|them)$ | is one of these three words |
| ^\S+$ | is a non-whitespace sequence |
| ^[+-]?([1-9]\d*)?\d$ | is a signed integer (no leading zeros) |
| ^[+-]?\d*\.\d{1,6}$ | is a signed 1-6 place decimal number |
| ^([1-9]\d*)?\d(\.\d{2})?$ | is an unsigned decimal number with or without 2-place decimal |
string: abbabccd reg.expr. pattern: ((a(b+))+)(c*)The matched portions of the string are taken by the order of left parentheses, and so we would get:
abbab, ab, b, ccThe fact that the (a(b+)) repeated pattern matches both the abb and ab substrings is cause for confusion, but the last match is taken as the matching string.
^(\w+)(B+)$ (greedy) ^(\w+?)(B+)$ (minimal)In the former case, \w+ would match AABBB leaving a single B to match B+. In the latter case \w+ would minimally match only AA, leaving BBBB to match B+.