• Janarthanan Soundararajan

Learn Regular Expression by matching EMail Pattern.

Updated: Sep 22, 2019

As a developer, I encountered many situations, need to find a text pattern for a lot of tasks. In my early stages of learning programming, I wrote more loops and conditions to find patterns in text. It is okay for simple patterns but handling complex patterns we need to write down more lines of code. But we can handle more complex text patterns by the smart and powerful way, which is Regular Expression.

Today developer communities are huge on the Internet. Developers can find solutions by Google search, most of them using regex for their tasks without understanding. So I would like to give a simple introduction to the regular expression.

What is the regular expression?

The regular expression is special kind of string to define the search pattern which helps to match the string, word or a character.

Many programming languages have libraries or built-in for a regular expression. But now, I'm not going to give example snippets in any programming language.I'm going to focus only on regular expression.

Matching e-mail address

Email addresses have two parts one is local part and domain part both are divided by '@' sign.

This is a typical regular expression to match email address. Start and end have a caret symbol (^) and a dollar sign($). These both characters have some functionality in a regex. The caret (^) denotes the string should start with a defined pattern. For example ^a will only match words start from letter 'a'.

Use of dollar sign ($) is opposite to caret symbol (^) which denotes string should end with that defined pattern. It is simple, right? a caret symbol to define the start and dollar sign to define an end of the pattern.

Before the '@' sign some kind of string enclosed with parenthesis. This parenthesis use to make a group. Between parenthesis, there is a string enclosed with the square brackets and '+' sign.

Now focus on square brackets, square brackets are using to create the character class which helps to match one character in given character class. For instance, consider two words 'soap' and 'soup'. A character in the third position only difference in both words. We can match both words by the following regular expression so[au]p.

Within the character class, we have three ranges such as 'A-Z', 'a-z' and '0-9' followed by the set of special characters. The '+' sign is on the right side of character class. That '+' sign is a quantifier which defines 1 or more than one repetitions. There are six kinds of quantifiers in regular expression divided as the greedy quantifier and lazy quantifier (non-greedy quantifier). We can see more about quantifiers later now we have to focus only on e-mail pattern.

Okay now take a look at right side of '@' sign. This is a domain part in e-mail address which has two groups separated by a dot (.). Dot (.) character is a special character in regular expression which matches any character in given string. So that should be escaped by a backslash (\.).

A backslash is a special character using to escape other special characters.

Between '@' sign and dot character, a group has three character classes. Start and end of character classes don't have the quantifier. That means only one character occurs at start and end based on a defined pattern. You can guess the pattern in the middle. It matches one or more times occurrence.

The right side of dot character we have a group with one character class followed by a quantifier. That quantifier defines the repetition at least two or more than two characters.

I hope this helps. The regular expression has a lot of features which helps to ease the tasks.

Thanks for reading!

#regularexpression #regex #pattern #programming

©2020 by Techaaroorian.

  • Pinterest
  • Twitter Social Icon
  • Facebook
  • YouTube