Logo

How can I find all matches to a regular expression in Python?

In Python, you typically use re.findall() or re.finditer() to retrieve all matches of a regular expression in a string. Below are the common ways and their differences.

1. re.findall() for a List of Matches

import re text = "Hello 123, goodbye 456" pattern = r"\d+" matches = re.findall(pattern, text) print(matches) # Output: ["123", "456"]
  • re.findall(pattern, string) returns a list of matched substrings.
  • If the pattern has capturing groups, findall returns either:
    • a list of strings (if there's exactly one capturing group), or
    • a list of tuples (if there are multiple capturing groups).

Example with Capturing Group

text = "Name: John, Age: 30" pattern = r"(\w+):\s(\w+)" # Each match has two groups -> result is a list of tuples matches = re.findall(pattern, text) print(matches) # Output: [('Name', 'John'), ('Age', '30')]

2. re.finditer() for an Iterator of Match Objects

import re text = "Hello 123, goodbye 456" pattern = r"\d+" for match in re.finditer(pattern, text): print("Match:", match.group(0), "at", match.span())
  • re.finditer(pattern, string) returns an iterator of match objects (re.Match in Python 3.7+).
  • Each match object gives you start/end indices (.span()), the full match (.group(0)), and any capturing groups (e.g. .group(1), .group(2), etc.).
  • Ideal if you need detailed info about the positions or groups for each match.

3. Other Tips & Flags

3.1 Regex Flags

import re text = "HELLO\nhello" pattern = r"hello" # re.IGNORECASE -> case-insensitive # re.DOTALL -> '.' matches newline # re.MULTILINE -> '^' and '$' match start/end of lines matches = re.findall(pattern, text, flags=re.IGNORECASE) print(matches) # ['HELLO', 'hello']

Common flags include:

  • re.IGNORECASE or re.I: case-insensitive matching.
  • re.DOTALL or re.S: '.' matches newline.
  • re.MULTILINE or re.M: ^ and $ match start/end of each line, not just the entire string.

3.2 Overlapping Matches

  • findall() and finditer() find non-overlapping matches. If you need overlapping matches, you have to devise a custom loop (e.g. adjusting the search start index on each iteration) or use a regex trick like lookahead. For example:
    import re text = "aaaa" pattern = r"(?=(aa))" # lookahead-based approach matches = re.findall(pattern, text) print(matches) # ['aa', 'aa', 'aa']
    This captures overlapping occurrences of "aa".

4. Summary

  • re.findall(pattern, string): Returns a list of all matched substrings (or a list of tuples if multiple capturing groups).
  • re.finditer(pattern, string): Returns an iterator of match objects, offering more control (like match positions, individual groups, etc.).
  • Non-overlapping: By default, both skip overlapping matches unless you use lookaheads or specialized logic.

Bonus: Level Up Your Regex & Coding Interview Skills

If you’re digging into Python and regex while preparing for interviews or real-world tasks, check out these DesignGurus.io resources:

  1. Grokking the Coding Interview: Patterns for Coding Questions
    Master common coding patterns essential for interviews and problem-solving.

  2. Grokking Data Structures & Algorithms for Coding Interviews
    Strengthen your DS&A fundamentals—key for technical interviews.

  3. Grokking Python Fundamentals
    Dive into Python essentials.

For personalized feedback from ex-FAANG engineers, explore Mock Interviews:

Also, find free content on the DesignGurus.io YouTube channel.

Conclusion: Use re.findall() or re.finditer() to retrieve all regex matches in Python. findall gives you a list of matches or tuples, while finditer yields match objects for more detailed info.

CONTRIBUTOR
TechGrind