|
New I/0 Functionality for JavaTM 2 Standard Edition 1.4(5) The Pattern class provides a whole slew of constrUCts for matching regular eXPressions. Basically, you provide the pattern as a String. See the class documentation for full details of the patterns. Here are some samples to get you started: - Line pattern, any number of characters followed by carriage return and/or line feed: .*\r?\n or .*$
- Series of numbers: [0-9]* or \d*
- A control character {cntrl}
- An upper or lowercase US-ASCII character, followed by white space, followed by punctuation: [{lower}{upper}]\s{punct}
When you provide the pattern, you tell the Pattern class to compile it. Because pattern matching tries to find the largest possible match, in the case of the end-of-line character ($), you don't want to match the entire file to the end of it. You must use the compile option of MULTILINE. There are other options for tasks like case-insensitive matching and Unicode-aware case folding, among a few others. So, if your pattern was for the line pattern above, the code would look like sUCh: Pattern linePattern = Pattern.compile(".*$",Pattern.MULTILINE);When it is time to match the pattern, you call the matcher method to get a Matcher back. From this, you can find out if the pattern matches or find and get the matching piece with group, or you can split the string by providing the break pattern, and getting the individual pieces back with split. For instance, the following is a framework for reading a line at a time and getting Words out of each line. Matcher matcher = p.matcher(aString);Pattern WordBreakPattern =Pattern.compile("[{space}{punct}]");// Loop through the lineswhile (lineMatcher.find()) {CharSequence line = lineMatcher.group();String Words[] = WordBreakPattern.split(line);// For each Wordfor (int i=0, n=Words.length; i<n; i++) { // Lines with just break characters return an empty string if (Words[i].length() > 0) {System.out.println(":" + Words[i] + ":"); }}}
|