This guide will delve into the advanced concepts of regular expressions (regex), which are essential for manipulating text and data.

Common Regex Symbols

  • . - Matches any character except a newline.
  • * - Matches zero or more of the preceding element.
  • + - Matches one or more of the preceding element.
  • ? - Matches zero or one of the preceding element.
  • [] - Defines a character class, matching any one of the characters inside.
  • ^ - Asserts the position at the start of a line.
  • $ - Asserts the position at the end of a line.

Advanced Techniques

  • Lookaheads and Lookbehinds: These are zero-width assertions that assert that a pattern exists or does not exist at a certain position in the text without consuming any of the text.
    • Example: (?=hello) matches any string that contains "hello" but does not include "hello" in the match.
  • Named Groups: You can give a name to a group in a regex pattern using parentheses with a name after the : character.
    • Example: (name)=(.*) captures the name and its value separately.

Example

Suppose you want to extract the email addresses from a given text. You can use the following regex pattern:

\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b

Resources

For more information on regex, you can check out our Regex Basics Guide.

Regex Diagram