Sunday, March 8, 2015

Email Validation with Regular Expression (PCRE)

The following is an regular expression for matching a valid email address.

It has been tested with PHP's PCRE, but it can work with other regex environments.

It will match valid email addresses in multiple lines (one email address per line)

Email Validation Regex


/^([\w\-_~!$&'()*+,;=:]|(?<!^)(?<!\.)\.(?!\.)(?!@))+@(\w|(?<!@)(?<!\.)(?<!\-)\.(?!\.)(?!\-)(?!$)|(?<!@)\-(?!$))+$/img

Regex analysis


It validates based on the following rules:

  • start of line
  • user part (local part)
    • one or more of the following:
      • alphanumeric character
      • specific punctuation: -_~!$&'()*+,;=:
      • a dot (.)
        • if it is not preceded by
          • start of line
          • another dot (.)
        • and not followed by
          • another dot (.)
          • @ symbol
  • @
  • domain part (hostname part)
    • one or more of the following:
      • alphanumeric character
      • a dot (.)
        • if it is not preceded by
          • @ symbol
          • another dot (.)
          • a hyphen (-)
        • and if it is not followed by
          • another dot (.)
          • a hyphen (-)
          • end of line
      • a hyphen (-)
        • if it is not preceded by
          • @
        • and if it is not followed by
          • end of line
  • end of line

Email Syntax Rules

The email syntax rules that are implemented are the following:

Local part
  • Uppercase and lowercase Latin letters (a–z, A–Z) (ASCII: 65–90, 97–122)
  • Digits 0 to 9 (ASCII: 48–57)
  • These special characters: - _ ~ ! $ & ' ( ) * + , ; = : and percentile encoding i.e. %20
  • Character . (dot, period, full stop) (ASCII: 46) provided that it is not the first or last character, and provided also that it does not appear consecutively (e.g. John..Doe@example.com is not allowed).
Hostname part

  •  Alphanumeric characters
  • dots (.) but not two or more consequtives dots, not at start of hostname or end of hostname part
  • hyphens, but not at start or end of a label (name between dots)
There are more advanced rules that are not covered by this regex.

No comments:

Post a Comment