java

Common Methods for Regular Expression in Java

In this Article I will try to cover common methods of Regular Expression in Java and understand them with example.

Regular Expressions related classes can be found under java.util.regex package which consist of three classes primarily used. 

Pattern: Pattern class is basically a compiled representation of Regular Expression into a Pattern. This class is the start point for working with regular expression.

Methods in Pattern class

Pattern compile(String regex): This method is the start point to work with Regular Expression, the method compiles given Regular expression in Pattern. 

 Pattern p = Pattern.compile("a*b");

 

Pattern compile(String regex, int flags):  The second version of method accept another parameter called flags, it basically helps in enabling or you can say awareness to the compiled Pattern that how pattern should match the data.

Different type of flags 

/**
For example if you want to enable case
insensitive matching then flags you can use
CASE_INSENSITIVE flag
*/ 
Pattern p = Pattern.compile("a*b", Pattern.CASE_INSENSITIVE);

 

Matcher matcher(CharSequence input): Creates a Matcher object that will match to the given input against the given reg-ex compiled Pattern. We will discuss about Matcher in detail. 

Pattern pattern = Pattern.compile(pattern);
Matcher matcher = pattern.matcher(data);

 

boolean matches(String regex,CharSequence input)This method compile the given regular expression and match with the given input against it. 

boolean match = Pattern.matches("[a-z]","abcd");
/*
The same can be achieved in this way as well
*/
Pattern pattern = Pattern.compile("[a-z]");
Matcher matcher = pattern.matcher("abcd");
boolean match = matcher.matches();
//or
matcher.find();

 

String[] split(CharSequence input)This method split the given input based on the regular expression pattern. This method is useful in tokenizing the String based on certain delimiter.

Note: Trailing empty strings are therefore not included in the resulting array. 

Pattern pattern = Pattern.compile(":");
String [] states = pattern.split("CA:OH:GA");
// output
// states = {"CA", "OH", "GA"}
/*
Explanation on Note:
For example: For a given input "foo:bar:foo" if Reg-ex
is "o" then out put will be {"f", "", ":bar:f"},
so foo at the start got split into "f" and "" (empty) string
but for the word "foo" at end of the input string
it doesn't included the empty string
*/

 

String[] split(CharSequence input, int limit):This method also split the given input based on the regular expression pattern. But the second parameter "limit" has different role here. It actually controls the number of times the pattern should be applied on input data and which in turns effect the length of the resulting array. If the limit n is greater than zero then the pattern will be applied at most n - 1 times, the array's length will be no greater than n, and the array's last entry will contain all input beyond the last matched delimiter. If n is non-positive then the pattern will be applied as many times as possible and the array can have any length. If n is zero then the pattern will be applied as many times as possible, the array can have any length, and trailing empty strings will be discarded.

Example

import java.util.regex.Pattern;

public class TestRegExSplit {

	public static void main(String[] args) {
		Pattern pattern = Pattern.compile(":");
		//RegEx : and limit : 2
		String [] results = pattern.split("foo:bar:foo", 2);
		printResult(results); //{foo,bar:foo}
		
		results = pattern.split("foo:bar:foo", 5);
		printResult(results); //{foo,bar,foo}
		
		results = pattern.split("foo:bar:foo", -2);
		printResult(results); //{foo,bar,foo}
		
		pattern = Pattern.compile("o");
		results = pattern.split("foo:bar:foo", 5);
		printResult(results); //{f,,:bar:f,,}
		
		results = pattern.split("foo:bar:foo", -2);
		printResult(results); //{f,,:bar:f,,}
		
		results = pattern.split("foo:bar:foo", 0);
		printResult(results); //{f,,:bar:f}
	}
	
	private static void printResult(String [] results){
		StringBuilder builder = new StringBuilder();
		builder.append("{");
		for (String result : results) {
			builder.append(result);
			builder.append(",");
		}
		builder.deleteCharAt(builder.length()-1);
		builder.append("}");
		
		System.out.println(builder.toString());
	}

 

Flags Constants: 

Modifier and Type Field and Description
static int CANON_EQ
Enables canonical equivalence.
static int CASE_INSENSITIVE
Enables case-insensitive matching.
static int COMMENTS
Permits whitespace and comments in pattern.
static int DOTALL
Enables dotall mode.
static int LITERAL
Enables literal parsing of the pattern.
static int MULTILINE
Enables multiline mode.
static int UNICODE_CASE
Enables Unicode-aware case folding.
static int UNICODE_CHARACTER_CLASS
Enables the Unicode version of Predefined character classes and POSIX character classes.
static int UNIX_LINES
Enables Unix lines mode.