How can I find all matches to a regular expression in Python?

In Python, you typically use re.findall() or re.finditer() to retrieve all matches of a regular expression in a string. Below are the common ways and their differences.

1. `re.findall()` for a List of Matches

import re

text = "Hello 123, goodbye 456"
pattern = r"\d+"

matches = re.findall(pattern, text)
print(matches)
# Output: ["123", "456"]

re.findall(pattern, string) returns a list of matched substrings.
If the pattern has capturing groups, findall returns either:
- a list of strings (if there's exactly one capturing group), or
- a list of tuples (if there are multiple capturing groups).

Example with Capturing Group

text = "Name: John, Age: 30"
pattern = r"(\w+):\s(\w+)"

# Each match has two groups -> result is a list of tuples
matches = re.findall(pattern, text)
print(matches)
# Output: [('Name', 'John'), ('Age', '30')]

2. `re.finditer()` for an Iterator of Match Objects

import re

text = "Hello 123, goodbye 456"
pattern = r"\d+"

for match in re.finditer(pattern, text):
    print("Match:", match.group(0), "at", match.span())

re.finditer(pattern, string) returns an iterator of match objects (re.Match in Python 3.7+).
Each match object gives you start/end indices (.span()), the full match (.group(0)), and any capturing groups (e.g. .group(1), .group(2), etc.).
Ideal if you need detailed info about the positions or groups for each match.

3. Other Tips & Flags

3.1 Regex Flags

import re

text = "HELLO\nhello"
pattern = r"hello"

# re.IGNORECASE -> case-insensitive
# re.DOTALL -> '.' matches newline
# re.MULTILINE -> '^' and '$' match start/end of lines
matches = re.findall(pattern, text, flags=re.IGNORECASE)
print(matches)  # ['HELLO', 'hello']

Common flags include:

re.IGNORECASE or re.I: case-insensitive matching.
re.DOTALL or re.S: '.' matches newline.
re.MULTILINE or re.M: ^ and $ match start/end of each line, not just the entire string.

3.2 Overlapping Matches

findall() and finditer() find non-overlapping matches. If you need overlapping matches, you have to devise a custom loop (e.g. adjusting the search start index on each iteration) or use a regex trick like lookahead. For example:
```
import re

text = "aaaa"
pattern = r"(?=(aa))"  # lookahead-based approach

matches = re.findall(pattern, text)
print(matches)  # ['aa', 'aa', 'aa']
```
This captures overlapping occurrences of "aa".

4. Summary

re.findall(pattern, string): Returns a list of all matched substrings (or a list of tuples if multiple capturing groups).
re.finditer(pattern, string): Returns an iterator of match objects, offering more control (like match positions, individual groups, etc.).
Non-overlapping: By default, both skip overlapping matches unless you use lookaheads or specialized logic.

Bonus: Level Up Your Regex & Coding Interview Skills

If you’re digging into Python and regex while preparing for interviews or real-world tasks, check out these DesignGurus.io resources:

Grokking the Coding Interview: Patterns for Coding Questions
Master common coding patterns essential for interviews and problem-solving.
Grokking Data Structures & Algorithms for Coding Interviews
Strengthen your DS&A fundamentals—key for technical interviews.
Grokking Python Fundamentals
Dive into Python essentials.

For personalized feedback from ex-FAANG engineers, explore Mock Interviews:

Also, find free content on the DesignGurus.io YouTube channel.

Conclusion: Use re.findall() or re.finditer() to retrieve all regex matches in Python. findall gives you a list of matches or tuples, while finditer yields match objects for more detailed info.

CONTRIBUTOR

TechGrind