The RegexProgs.zip archive
has the sample scripts discussed in the handout.
Install
RegexProgs
in NetBeans as a
Java Project with Existing Sources
Run each of the programs individually:
A Java String can use the member function matches
to determine whether a string matches a regular expression or not.
The match operation is a complete in the sense that the entire
string must match the pattern.
Here are some simple examples using the pattern string,
patternStr, which
represents a signed integer number with no leading zeros
optionally followed by two decimal digits.
Observe that we need
the literal "\" in the pattern, and therefore
must escape it, getting occurrences of "\\".
The match performed in this manner
is always a match of the complete string
(not a substring) as if the
anchor characters ^ and $ surrounded patternStr.
In this example all strings match except the last two.
Java also has two regular expression based substitution operations:
target = "replace number(s) -235, 0222, -01.17 - in this string";
in which case
target.replaceFirst(patternStr, "Num")
becomes:
replace number(s) Num, 0222, -01.17 - in this string
and
target.replaceAll(patternStr, "Num")
becomes:
replace number(s) Num, NumNum, NumNum - in this string
One of the limitations of this form of substitution is that the
the replacement_string
cannot use the matched substring as part of the replacement.
The java.util.regex classes
For more sophisticated matching operations,
Java uses two classes Pattern and Matcher
in the java.util.regex package.
An alternate way of expressing validation
is via the static matches function
Pattern.matches(patternStr,testStr);
which
behaves exactly like "testStr.matches(patternStr)" used above.
More sophisticated regular expression operations use
the following statements:
The Pattern.compile operation can be used with a second parameter
to specify other features of the intended matching operation.
The most common
example is to ensure that matches are case-insensitive by defining:
The call matcher.find() initiates other operations.
One useful feature of the matcher.find() which is crucial to our
later example is the ability to produce the string positions which delimit
the matching substring with these member functions:
int start = matcher.start();
int end = matcher.end();
The prints true signifying that testStrcontains
a match of the pattern. We can obtain and show all matches by
repeatedly applying matcher.find() in a loop like this:
This program segment illustrates that matcher.group() yields the
matched substring starting at position matcher.start() and ending
before matcher.end(). In this case there are four matched substrings:
+22, -4.51, 8, 0
Subpattern matches
In many circumstances we're interested in subpatterns of a matched
pattern. For example, consider the pattern and test string:
The pattern represents a lower case letter sequence
followed by digit sequence.
In this case, the parenthesized subpatterns
separate the letter sequence from
the digit sequence. We can identify the substrings which match
the parenthesized subpatterns.
Consider this program segment:
Ab c55 24 Hello3 a.2 8a bbb00
Ab --- 24 Hello3 a.2 8a bbb00
Ab === 24 H=== a.2 8a ===
Ab c:55 24 Hello:3 a.2 8a bbb:00
In particular, the "$1", "$2"
have special significance in
the replacement string: they represent the matched substrings
identified by the parenthesized subpatterns.
Keyword search and highlight
When we search for a keyword in a text, it can either be considered
as "standalone" word, or part of a larger word. In this example
we will consider the former situation, i.e., that our keyword should not
be part of a larger "word". As is common in keyword searches,
we also want the search to be case-insensitive.
The Java regular expression match setup is as follows:
String keyword = // the keyword, assume only alphanumeric characters
String text = // the target text, possibly containing keywords
String patternStr = "\\b" + keyword + "\\b";
Pattern pattern = Pattern.compile(patternStr, Pattern.CASE_INSENSITIVE);
Matcher matcher = pattern.matcher(text);
while (matcher.find()) {
int start = matcher.start(), end = matcher.end();
}
The word-boundary "\b" anchors mean that
if there is an adjacent character, that it is not a "word" character,
thus creating a pattern that identifies a standalone keyword.
The start and end positions
of the matching substring (which is an occurrence
of the keyword) can be used then to highlight the text in a
JTextArea (or other Swing text components).
The class java.swing.text.Highlighter
is used to create a highlight effect around a portion of the
textarea content.
Assuming that the variable ta
is the JTextArea which holds the text,
then we would use this code to create the desired effect:
Highlighter.HighlightPainter myPainter
= new DefaultHighlighter.DefaultHighlightPainter( Color.yellow );
ta.getHighlighter().addHighlight(start, end, myPainter);
Here is the full sample program which illustrates this usage:
SearchHighlight
import javax.swing.*;
import java.awt.*;
import java.util.regex.*;
import javax.swing.text.*;
public class SearchHighlight {
public static void main(String[] args) {
JTextArea ta = new JTextArea();
// create a simple GUI frame with scrolled text area
JFrame gui = new JFrame();
gui.setDefaultCloseOperation(JFrame.EXIT_ON_CLOSE);
gui.setLayout(new BorderLayout());
gui.setSize(new Dimension(500,300));
gui.add(new JScrollPane(ta));
gui.setVisible(true);
ta.setFont(Font.decode("Sans Serif Bold 14"));
ta.setEditable(false);
// sample keyword and search text
String keyword = "here";
String text =
"Here, not there we are testing search and highlight.\n" +
"Hereby we look for the word \"here\", where else, but here."
;
ta.setText(text);
String patternStr = "\\b" + keyword + "\\b";
Pattern pattern = Pattern.compile(patternStr, Pattern.CASE_INSENSITIVE);
Matcher matcher = pattern.matcher(text);
Highlighter.HighlightPainter myPainter
= new DefaultHighlighter.DefaultHighlightPainter( Color.yellow );
while (matcher.find()) {
int start = matcher.start(), end = matcher.end();
try {
ta.getHighlighter().addHighlight(start, end, myPainter);
}
catch(Exception x) {
x.printStackTrace(); // we did something wrong!
}
}
}
}