java

 Regular Expression Q&A 

I am trying to present this article in Q&A format on Regular Expression.

 

Q: How to find/ignore Metacharacters in Regular Expression?

Metacharacters are the characters with special meanings which is interpreted by matcher. Metacharacter changes the meaning of patterns.

The list of Metacharacters supported by Java API's are: <([{\^-=$!|]})?*+.>

For example:


^ The beginning of a line
$ The end of a line
X? X, once or not at all
X* X, zero or more times
X+ X, one or more times
X{n} X, exactly n times
X{n,} X, at least n times
X{n,m} X, at least n but not more than m times

So coming to the answer of this Question, You can find or ignore the meanings of special characters in following ways:

  • Escape the metacharacters with backslash (\): This will need you to iterate through the String pattern and replace it with backslash
  • You can quote the expression using Pattern.quote(regexPattern) or quote the string in "\Q"+regEx+"\E".

Code Example: 

package test;

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class RegExMetaCharacter {

	public static void main(String[] args) {
		String input = "Dogs are wonderful$";
		String regEx1 = "Dog.";
		String regEx2 = "wonderful$";
		
		Pattern p1 = Pattern.compile(regEx1);
		Matcher m1 = p1.matcher(input);
		//It's found the match, because dot(.) means match any character.
		System.out.println("***Before Escaping/Quoting the Metacharacters***\n");
		System.out.println(m1.find()); //true
		System.out.println("Start: "+m1.start());
		System.out.println("End: "+m1.end());
		
		Pattern p2 = Pattern.compile(regEx2);
		Matcher m2 = p2.matcher(input);
		// It's not able to find the wonderful$ character, 
		//because $ has special meaning which means end of line.
		System.out.println(m2.find());
		
		/*
		 * To let this regex work properly, you have to escape the characters
		 * and I am going to use the easy way of escaping the characters which has special meaning.
		 */
		System.out.println("\n***After Escaping/Quoting the Metacharacters***\n");
		regEx1 = Pattern.quote(regEx1);
		p1 = Pattern.compile(regEx1);
		m1 = p1.matcher(input);
		//It's found the match, because dot(.) means match any character.
		System.out.println(m1.find()); // false
		/*
		 * Commented that because it will throw IllegalStateException
		 */
		//System.out.println("Start: "+m1.start());
		//System.out.println("End: "+m1.end());
		regEx2 = Pattern.quote(regEx2);
		p2 = Pattern.compile(regEx2);
		m2 = p2.matcher(input);
		System.out.println(m2.find()); //true
		System.out.println("Start: "+m2.start());
		System.out.println("End: "+m2.end());
	}

}

 

Output: 

***Before Escaping/Quoting the Metacharacters***

true
Start: 0
End: 4
false

***After Escaping/Quoting the Metacharacters***

false
true
Start: 9
End: 19

 

Q: What is the easiest way of finding any characters from the following set of Characters !"#$%&'()*+,-./:;<=>?@[\]^_`{|}~ )?

For this you can write a regular expression for finding the characters in the above set but there is another way of finding characters in this set.
You have to create your pattern as Pattern.compile("\p{Punct}"), which is called Punctuation characters.

Code Example:

package test;

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class RegExPunctuationPattern {

	public static void main(String[] args) {
		String input="#Deepak$%&";
		Pattern p = Pattern.compile("\\p{Punct}");
		Matcher match = p.matcher(input);
		System.out.println(match.find());
		System.out.println(match.replaceAll(""));
	}

}

Output:

true
Deepak

 

How to match a pattern exactly followed by the same pattern?

You can achieve this by Backreferences concept in Regular Expression. So what is Backreferences?

Basically when we match input string with regular expression then the section of the input matching the capturing group(s) get saved in the memory for later recall via backreferences. You can specify Backreferences in the regular expression by backslash (\) followed by digit indicating the number of the group to be recalled.

So to Answer the above question let's say you want to match two digits followed by the same exact two digits, you would use (\d\d)\1 as the regular expression:

Code Example:

 

package test;

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class RegExBackReference {
	public static void main(String ...args){
		String input ="00111100131341313";
		String regEx= "(\\d\\d)\\1";
		Pattern p = Pattern.compile(regEx);
		Matcher m = p.matcher(input);
		while(m.find()){
			System.out.println("Start: "+m.start());
			System.out.println("End: "+m.end());
		}
	
	}
}

 

Output:

Start: 2
End: 6
Start: 8
End: 12
Start: 13
End: 17

 

Q: Write a pattern which return true only when it contains letters?

You can achieve this with below pattern. 

public static boolean containsOnlyAlphabets(String data){
		Pattern regEx = Pattern.compile("^[A-Za-z]+$");
		Matcher matcher = regEx.matcher(data);
		return matcher.find();
  }
  
 System.out.println(containsOnlyAlphabets("test-alpha")); //false contains dash (-) character
 System.out.println(containsOnlyAlphabets("test alpha")); //false contains space character
 System.out.println(containsOnlyAlphabets("testalpha")); //true contains only alphabets

 

Similarly you can create pattern for Numbers ("^[0-9]+$"), which only contains Numbers.

 

I will keep updating the article, if I found any other scenarios.