Java Regular Expression Programs
— print (last updated: Nov 14, 2009) print

Select font size:
The RegexProgs.zip archive has the sample scripts discussed in the handout. Install RegexProgs in NetBeans as a Java Project with Existing Sources Run each of the programs individually:
StringBased.javaString-based matching and replacement
SubstringMatch.javafind substring matches
SubpatternMatch.javaidentify subpattern matches
Replacement.javacreate string with replacements
SearchHighlight.javakeyword search and highlight

Java Regular Expression operations

String-based matching and replacement

A Java String can use the member function matches to determine whether a string matches a regular expression or not. The match operation is a complete in the sense that the entire string must match the pattern. Here are some simple examples using the pattern string, patternStr, which represents a signed integer number with no leading zeros optionally followed by two decimal digits.
String patternStr = "[+-]?([1-9]\\d*)?\\d(\\.\\d{2})?"; 

String[] tests = { "12", "+12", "-33.44", "0", "+0.11", "02", "1.3" };
for (String testStr: tests)
  System.out.println( testStr.matches( patternStr ) );
Observe that we need the literal "\" in the pattern, and therefore must escape it, getting occurrences of "\\". The match performed in this manner is always a match of the complete string (not a substring) as if the anchor characters ^ and $ surrounded patternStr. In this example all strings match except the last two.

Java also has two regular expression based substitution operations:
String new_string = target.replaceFirst( pattern_string, replacement_string);
String new_string = target.replaceAll( pattern_string, replacement_string);
Our sample program uses:
target = "replace number(s) -235, 0222, -01.17 - in this string";
in which case target.replaceFirst(patternStr, "Num") becomes:
replace number(s) Num, 0222, -01.17 - in this string
and target.replaceAll(patternStr, "Num") becomes:
replace number(s) Num, NumNum, NumNum - in this string
One of the limitations of this form of substitution is that the the replacement_string cannot use the matched substring as part of the replacement.

The java.util.regex classes

For more sophisticated matching operations, Java uses two classes Pattern and Matcher in the java.util.regex package. An alternate way of expressing validation is via the static matches function
Pattern.matches(patternStr,testStr);
which behaves exactly like "testStr.matches(patternStr)" used above. More sophisticated regular expression operations use the following statements:
Pattern pattern = Pattern.compile(patternStr);
Matcher matcher = pattern.matcher(testStr);
The Pattern.compile operation can be used with a second parameter to specify other features of the intended matching operation. The most common example is to ensure that matches are case-insensitive by defining:
Pattern pattern = Pattern.compile(patternStr, Pattern.CASE_INSENSITIVE);
The call matcher.find() initiates other operations. One useful feature of the matcher.find() which is crucial to our later example is the ability to produce the string positions which delimit the matching substring with these member functions:
int start = matcher.start();
int end = matcher.end();

Substring matches

For example, consider the following program:
String patternStr = "[+-]?([1-9]\\d*)?\\d(\\.\\d{2})?"; 
String testStr = "AB +22 C -4.51 D 8.0";

Pattern pattern = Pattern.compile(patternStr);  
Matcher matcher = pattern.matcher(testStr);

System.out.println( "test matches pattern: " + matcher.find() );
The prints true signifying that testStr contains a match of the pattern. We can obtain and show all matches by repeatedly applying matcher.find() in a loop like this:
while (matcher.find()) {
  System.out.println( matcher.group() 
     + "\tstart-end: " + matcher.start() + "-" + matcher.end() );
}
This program segment illustrates that matcher.group() yields the matched substring starting at position matcher.start() and ending before matcher.end(). In this case there are four matched substrings: +22, -4.51, 8, 0

Subpattern matches

In many circumstances we're interested in subpatterns of a matched pattern. For example, consider the pattern and test string:
patternStr = "([a-z]+)(\\d+)"; 
testStr = "Ab c55 24 Hello3 a.2 8a bbb00";

pattern = Pattern.compile(patternStr);   
matcher = pattern.matcher(testStr);
The pattern represents a lower case letter sequence followed by digit sequence. In this case, the parenthesized subpatterns separate the letter sequence from the digit sequence. We can identify the substrings which match the parenthesized subpatterns. Consider this program segment:
while (matcher.find()) {
  System.out.println( matcher.group() 
     + "\tfirst: " + matcher.group(1) + ", second:" + matcher.group(2) );
}
The expression matcher.group(i) yields the substring which matches the subpattern defined by the ith parenthesis subpattern.

Replacement

Replacement of matching pattern uses the replaceFirst and replaceAll member functions. Given the definition of the above matcher object, these calls:
System.out.println(testStr);
System.out.println(matcher.replaceFirst("---"));
System.out.println(matcher.replaceAll("==="));
System.out.println(matcher.replaceAll("$1:$2"));
would have the following output:
Ab c55 24 Hello3 a.2 8a bbb00
Ab --- 24 Hello3 a.2 8a bbb00
Ab === 24 H=== a.2 8a ===
Ab c:55 24 Hello:3 a.2 8a bbb:00
In particular, the "$1", "$2" have special significance in the replacement string: they represent the matched substrings identified by the parenthesized subpatterns.

Keyword search and highlight

When we search for a keyword in a text, it can either be considered as "standalone" word, or part of a larger word. In this example we will consider the former situation, i.e., that our keyword should not be part of a larger "word". As is common in keyword searches, we also want the search to be case-insensitive. The Java regular expression match setup is as follows:
String keyword = // the keyword, assume only alphanumeric characters
String text    = // the target text, possibly containing keywords

String patternStr = "\\b" + keyword + "\\b";

Pattern pattern = Pattern.compile(patternStr, Pattern.CASE_INSENSITIVE);
Matcher matcher = pattern.matcher(text);

while (matcher.find()) {
  int start = matcher.start(), end = matcher.end();
}
The word-boundary "\b" anchors mean that if there is an adjacent character, that it is not a "word" character, thus creating a pattern that identifies a standalone keyword. The start and end positions of the matching substring (which is an occurrence of the keyword) can be used then to highlight the text in a JTextArea (or other Swing text components).

The class java.swing.text.Highlighter is used to create a highlight effect around a portion of the textarea content. Assuming that the variable ta is the JTextArea which holds the text, then we would use this code to create the desired effect:
Highlighter.HighlightPainter myPainter 
   = new DefaultHighlighter.DefaultHighlightPainter( Color.yellow );
   
ta.getHighlighter().addHighlight(start, end, myPainter);
Here is the full sample program which illustrates this usage:

SearchHighlight
import javax.swing.*; import java.awt.*; import java.util.regex.*; import javax.swing.text.*; public class SearchHighlight { public static void main(String[] args) { JTextArea ta = new JTextArea(); // create a simple GUI frame with scrolled text area JFrame gui = new JFrame(); gui.setDefaultCloseOperation(JFrame.EXIT_ON_CLOSE); gui.setLayout(new BorderLayout()); gui.setSize(new Dimension(500,300)); gui.add(new JScrollPane(ta)); gui.setVisible(true); ta.setFont(Font.decode("Sans Serif Bold 14")); ta.setEditable(false); // sample keyword and search text String keyword = "here"; String text = "Here, not there we are testing search and highlight.\n" + "Hereby we look for the word \"here\", where else, but here." ; ta.setText(text); String patternStr = "\\b" + keyword + "\\b"; Pattern pattern = Pattern.compile(patternStr, Pattern.CASE_INSENSITIVE); Matcher matcher = pattern.matcher(text); Highlighter.HighlightPainter myPainter = new DefaultHighlighter.DefaultHighlightPainter( Color.yellow ); while (matcher.find()) { int start = matcher.start(), end = matcher.end(); try { ta.getHighlighter().addHighlight(start, end, myPainter); } catch(Exception x) { x.printStackTrace(); // we did something wrong! } } } }


© Robert M. Kline