How can I find all matches to a regular expression in Python?
In Python, you typically use re.findall()
or re.finditer()
to retrieve all matches of a regular expression in a string. Below are the common ways and their differences.
1. re.findall()
for a List of Matches
import re text = "Hello 123, goodbye 456" pattern = r"\d+" matches = re.findall(pattern, text) print(matches) # Output: ["123", "456"]
re.findall(pattern, string)
returns a list of matched substrings.- If the pattern has capturing groups,
findall
returns either:- a list of strings (if there's exactly one capturing group), or
- a list of tuples (if there are multiple capturing groups).
Example with Capturing Group
text = "Name: John, Age: 30" pattern = r"(\w+):\s(\w+)" # Each match has two groups -> result is a list of tuples matches = re.findall(pattern, text) print(matches) # Output: [('Name', 'John'), ('Age', '30')]
2. re.finditer()
for an Iterator of Match Objects
import re text = "Hello 123, goodbye 456" pattern = r"\d+" for match in re.finditer(pattern, text): print("Match:", match.group(0), "at", match.span())
re.finditer(pattern, string)
returns an iterator of match objects (re.Match
in Python 3.7+).- Each match object gives you start/end indices (
.span()
), the full match (.group(0)
), and any capturing groups (e.g..group(1)
,.group(2)
, etc.). - Ideal if you need detailed info about the positions or groups for each match.
3. Other Tips & Flags
3.1 Regex Flags
import re text = "HELLO\nhello" pattern = r"hello" # re.IGNORECASE -> case-insensitive # re.DOTALL -> '.' matches newline # re.MULTILINE -> '^' and '$' match start/end of lines matches = re.findall(pattern, text, flags=re.IGNORECASE) print(matches) # ['HELLO', 'hello']
Common flags include:
re.IGNORECASE
orre.I
: case-insensitive matching.re.DOTALL
orre.S
: '.' matches newline.re.MULTILINE
orre.M
:^
and$
match start/end of each line, not just the entire string.
3.2 Overlapping Matches
findall()
andfinditer()
find non-overlapping matches. If you need overlapping matches, you have to devise a custom loop (e.g. adjusting the search start index on each iteration) or use a regex trick like lookahead. For example:
This captures overlapping occurrences ofimport re text = "aaaa" pattern = r"(?=(aa))" # lookahead-based approach matches = re.findall(pattern, text) print(matches) # ['aa', 'aa', 'aa']
"aa"
.
4. Summary
re.findall(pattern, string)
: Returns a list of all matched substrings (or a list of tuples if multiple capturing groups).re.finditer(pattern, string)
: Returns an iterator of match objects, offering more control (like match positions, individual groups, etc.).- Non-overlapping: By default, both skip overlapping matches unless you use lookaheads or specialized logic.
Bonus: Level Up Your Regex & Coding Interview Skills
If you’re digging into Python and regex while preparing for interviews or real-world tasks, check out these DesignGurus.io resources:
-
Grokking the Coding Interview: Patterns for Coding Questions
Master common coding patterns essential for interviews and problem-solving. -
Grokking Data Structures & Algorithms for Coding Interviews
Strengthen your DS&A fundamentals—key for technical interviews. -
Grokking Python Fundamentals
Dive into Python essentials.
For personalized feedback from ex-FAANG engineers, explore Mock Interviews:
Also, find free content on the DesignGurus.io YouTube channel.
Conclusion: Use re.findall()
or re.finditer()
to retrieve all regex matches in Python. findall
gives you a list of matches or tuples, while finditer
yields match objects for more detailed info.