0% completed
Regular expressions (regex) are a powerful tool for handling strings. They provide a concise and flexible means to "match" (search, manipulate, and edit) strings based on very specific and complex patterns.
Before learning regular expressions, let's understand why they are necessary with a simple example.
Imagine you want to check if a string contains any digit. One way to do this without regular expressions is using a loop.
Explanation:
Regular expressions simplify this process, enabling you to search for complex patterns quickly and with less code.
Regular expressions are sequences of characters that form a search pattern. They can be used to check if a string contains the specified search pattern, to replace the search pattern with a specified pattern, or to split a string on a pattern.
A regular expression is created in Python using the re
module, which is included in the standard library.
Explanation:
import re
imports the re
module.r'\d+'
represents the regular expression, where r
represents a row string.Regular expressions are composed of:
Modifier | Description |
---|---|
i | Makes the pattern matching case-insensitive |
g | Global search (find all matches rather than stopping after the first match) |
m | Multiline mode (allows start and end anchors to match at the start and end of each line) |
Metacharacter | Description |
---|---|
. | Matches any single character except newline \n |
^ | Matches the start of the string |
$ | Matches the end of the string |
* | Matches 0 or more repetitions of the preceding element |
+ | Matches 1 or more repetitions of the preceding element |
? | Matches 0 or 1 repetition of the preceding element |
Sequence | Description |
---|---|
\d | Matches any decimal digit |
\D | Matches any non-digit character |
\s | Matches any whitespace character |
\S | Matches any non-whitespace character |
Expression | Description |
---|---|
[abc] | Matches any one character from the set {a, b, c} . |
[^abc] | Matches any one character not in the set {a, b, c} . |
[0-9] | Matches any digit from 0 to 9 . |
[a-z] | Matches any lowercase letter from a to z . |
[A-Z] | Matches any uppercase letter from A to Z . |
[A-z] | Matches any letter from A to z . |
[a-zA-Z] | Matches any letter from a to z or A to Z . |
Quantifiers specify how many instances of a character, group, or character class must be present in the input for a match to be found.
Expression | Description |
---|---|
n+ | Matches any string that contains at least one n . |
n* | Matches any string that contains zero or more occurrences of n . |
n? | Matches any string that contains zero or one occurrence of n . |
n{x} | Matches exactly x occurrences of the item n . |
n{x,} | Matches x or more occurrences of the item n . |
n{x,y} | Matches at least x but no more than y occurrences of the item n . |
Let’s start by creating a regular expression to find strings that start with 'S' and end with 'e'. This example will illustrate how to construct and utilize a simple regular expression.
pattern = r'^S.*e$'
Explanation:
^
: Asserts the start of a line, ensuring the pattern matches from the beginning of the string.S
: Specifies that the first character of the string must be 'S'..*
: .
matches any single character (except newline), and *
allows zero or more repetitions of the preceding element, together matching any sequence of characters.e
: Specifies that the last character must be 'e'.$
: Asserts the end of a line, ensuring the pattern matches right up to the end of the string.In this example, we use the regular expression to check if a string starts with 'S' and ends with 'e'.
Explanation:
import re
: This line imports Python's regular expression module, re
, which contains functions and classes for working with regular expressions.pattern = r'^S.*e$'
:
^S
asserts that the string should start with 'S'..*
allows any character sequence (including none) to appear in the middle.e$
asserts that the string should end with 'e'.test_string = "Sample"
: Defines a string that will be checked against the defined regular expression.re.search(pattern, test_string)
: This function searches test_string
for the first location where the regular expression pattern
matches. It returns a match object if the pattern is found, or None
if the pattern is not found.if
statement: Determines the output based on whether the re.search()
function found a match.Now, let’s construct a more complex regular expression to match a date format such as "DD-MM-YYYY".
pattern = r'\b\d{2}-\d{2}-\d{4}\b'
Explanation:
\b
: Word boundary, ensuring the pattern is matched at the start or end of a word.\d{2}
: \d
matches any digit, and {2}
specifies exactly two digits.-
: Matches the hyphen ('-') character, used to separate date components.\d{4}
: Matches exactly four digits, typically used for the year in a date.In this example, we use the regular expression to find dates within a string.
Explanation:
import re
: Imports the regular expression module.pattern = r'\b\d{2}-\d{2}-\d{4}\b'
:
\b
ensures the date is at a word boundary, preventing partial matches.\d{2}
matches exactly two digits, used here for day and month.-
matches the literal hyphen character, used as a separator.\d{4}
matches exactly four digits for the year.text = "Today's date is 16-04-2024."
: The string containing the text where we want to find the date.re.search(pattern, text)
: Searches for the regex pattern within text
.if match:
: If re.search()
finds a match, match.group()
is used to retrieve and display the matched string. If no match is found, it prints "No date found."Regular expressions are a potent tool for pattern matching and text manipulation, enabling you to write more efficient and concise code. Starting with basic patterns helps build foundational skills, while more advanced expressions allow for sophisticated text processing tasks. By mastering regular expressions, you enhance your ability to work effectively with textual data, optimizing tasks such as data validation, cleaning, and extraction in Python.
.....
.....
.....