Regular Expressions

Getting Started with Regular Expressions

Regex powerful tools for matching patterns in text. In Python, the re module provides everything you need to search, extract, and manipulate strings using regex patterns. In order to do this, the first thing you will need to do in your Python file is to import the re module.

import re

When writing regular expressions in your code, it is convention to use raw strings (r””) as this will prevent Python from treating backslashes as escape characters.

pattern = r"^.*\."  # Matches any string that starts with anything and ends with a dot.

Basic Characters

  • . - Matches any character except a newline.
  • * - Matches zero or more repetitions of the previous character or group.
  • + - Matches one or more repetitions.
  • ? - Matches zero or one repetition.
  • {m} - Matches exactly m repetitions.

Anchors

  • ^ - Matches the start of a string.
  • $ - Matches the end of a string.

Character Sets

  • [] - Syntax for matching a set of characters.
  • [a-z] - Any lowercase letter.
  • [A-Z] - Any uppercase letter.
  • [a-zA-Z] - Any letter, upper or lower case.
  • [a-zA-Z0-9_] - Any alphanumeric character or underscore.
  • [^@] - Any character except @ (the ^ inside [] means "not").
  • {m,n} - Matches between m and n repetitions.

Special Sequences

  • \d - Matches a digit (0-9).
  • \D - Matches any non-digit.
  • \s - Matches any whitespace character (space, tab, newline).
  • \S - Matches any non-whitespace character.
  • \w - Matches alphanumeric characters and underscore.
  • \W - Matches anything not alphanumeric or underscore.

Grouping and Capturing

  • () - Groups expressions and captures the match. Captured matches can be accessed in a results list.
  • (com|edu) - Matches either 'com' or 'edu'.
  • (?:...) - Non-capturing group: groups expressions without capturing them in the result list. Useful for grouping without cluttering your match results.

re Functions

Optional flags can be added as an extra parameters in most re functions (all except re.escape() and re.purge()) e.g.

match = re.search(r"\d+", "There are 42 apples", re.IGNORECASE)
  • re.IGNORECASE or re.I: Makes the pattern case-insensitive. Example: re.search(r"hello", "HELLO", re.I).
  • re.MULTILINE or re.M: ^ and $ match the start and end of each line (not just the whole string).
  • re.DOTALL or re.S: Makes the . character match any character including newline (\n).
  • re.VERBOSE or re.X: Allows you to write multi-line regex with comments and spacing for readability.
  • re.ASCII or re.A: Makes \w, \d, and \s match only ASCII characters, rather than Unicode.
  • re.LOCALE: Deprecated and rarely used; was used for locale-aware matching of character classes (\w, etc.).
re.search()

Searches for a pattern anywhere in the string and returns the first match.

re.match()

Like re.search(), but it only matches if the pattern is at the beginning of the string.

match = re.match(r"\d+", "123abc")  # Match found
re.fullmatch()

Matches the entire string against the pattern.

match = re.fullmatch(r"\d+", "123")  # Match found
re.findall()

Returns a list of all non-overlapping matches in the string.

numbers = re.findall(r"\d+", "12 cats, 9 dogs")
print(numbers)  # Output: ['12', '9']
re.split()

Splits a string at each match of the pattern.

parts = re.split(r"\s+", "Split by spaces or tabs")
re.sub()

Replaces all matches of the pattern with a replacement string.

cleaned = re.sub(r"\d+", "[number]", "Room 101 and Room 202")
print(cleaned)  # Output: Room [number] and Room [number]

Examples

import re

# . Matches any character except newline
re.search(r"a.c", "abc")       # Matches "abc" because "." matches "b"

# * Zero or more of the preceding character/group
re.search(r"lo*l", "lol")      # Matches "lol", "loll", "loooool" etc.

# + One or more of the preceding character/group
re.search(r"go+d", "good")     # Matches "good", "goood", but not "gd"

# ? Zero or one of the preceding character/group
re.search(r"colou?r", "color") # Matches both "color" and "colour"

# {m} Exactly m repetitions
re.search(r"a{3}", "aaabc")    # Matches "aaa"

# ^ Start of string
re.match(r"^Hello", "Hello world")  # Matches "Hello" only if it's at the beginning

# $ End of string
re.search(r"world$", "Hello world") # Matches "world" only if it's at the end

# [] Any one of the characters inside the brackets
re.search(r"[aeiou]", "cat")         # Matches the first vowel, "a"

# [a-z] Any lowercase letter
re.search(r"[a-z]", "ABCd")          # Matches "d"

# [A-Z] Any uppercase letter
re.search(r"[A-Z]", "abcD")          # Matches "D"

# [a-zA-Z] Any letter (upper or lower case)
re.search(r"[a-zA-Z]", "123ABC")     # Matches "A"

# [a-zA-Z0-9_] Alphanumeric or underscore
re.search(r"[a-zA-Z0-9_]", "#$a@")   # Matches "a"

# [^@] Any character except @
re.search(r"[^@]+", "hello@world")   # Matches "hello"

# {m,n} Between m and n repetitions of the preceding character/group
re.search(r"\d{2,4}", "Year 2025")   # Matches 2 to 4 digits, e.g. "2025"

# \d Any digit (0-9)
re.search(r"\d", "abc123")          # Matches "1"

# \D Any non-digit
re.search(r"\D", "123abc")          # Matches "a"

# \s Whitespace (space, tab, newline)
re.search(r"\s", "Hello world")     # Matches the space between "Hello" and "world"

# \S Non-whitespace
re.search(r"\S", "   abc")          # Matches "a"

# \w Alphanumeric character or underscore
re.search(r"\w", "!@#hello")        # Matches "h"

# \W Non-alphanumeric character
re.search(r"\W", "abc!")            # Matches "!"
import re

password = "P@ssw0rd!"

pattern = r"""
^                       # Start of string
(?=.*[a-z])             # At least one lowercase letter
(?=.*[A-Z])             # At least one uppercase letter
(?=.*\d)                # At least one digit
(?=.*[@$!%*?&])         # At least one special character
[A-Za-z\d@$!%*?&]{8,}   # Only these characters, at least 8 total
$                       # End of string
"""

# Use re.VERBOSE to allow comments/whitespace in the pattern
if re.match(pattern, password, re.VERBOSE):
    print("Valid password")
else:
    print("Invalid password")