Python From Beginner to Advanced

0% completed

Previous
Next
Python - Regular Expressions

Regular expressions (regex) are a powerful tool for handling strings. They provide a concise and flexible means to "match" (search, manipulate, and edit) strings based on very specific and complex patterns.

Why Use Regular Expressions?

Before learning regular expressions, let's understand why they are necessary with a simple example.

Example: Searching in a String Using a Loop

Imagine you want to check if a string contains any digit. One way to do this without regular expressions is using a loop.

Python3
Python3

. . . .

Explanation:

  • This loop iterates through each character in the string to check if it is a digit.
  • This method is not only verbose but also inefficient for more complex patterns.

Regular expressions simplify this process, enabling you to search for complex patterns quickly and with less code.

What is a Regular Expression?

Regular expressions are sequences of characters that form a search pattern. They can be used to check if a string contains the specified search pattern, to replace the search pattern with a specified pattern, or to split a string on a pattern.

Syntax and Creation

A regular expression is created in Python using the re module, which is included in the standard library.

Python3
Python3
. . . .

Explanation:

  • In the above syntax, import re imports the re module.
  • r'\d+' represents the regular expression, where r represents a row string.

Regular Expression Components

Regular expressions are composed of:

  • Modifiers: Control how a search is performed.
  • Metacharacters: Characters with a special meaning.
  • Special Sequences: Represent predefined sets of characters.
  • Sets: Define a set of characters for which a match must be found.

Modifiers

ModifierDescription
iMakes the pattern matching case-insensitive
gGlobal search (find all matches rather than stopping after the first match)
mMultiline mode (allows start and end anchors to match at the start and end of each line)

Metacharacters

MetacharacterDescription
.Matches any single character except newline \n
^Matches the start of the string
$Matches the end of the string
*Matches 0 or more repetitions of the preceding element
+Matches 1 or more repetitions of the preceding element
?Matches 0 or 1 repetition of the preceding element

Special Sequences

SequenceDescription
\dMatches any decimal digit
\DMatches any non-digit character
\sMatches any whitespace character
\SMatches any non-whitespace character

Sets

ExpressionDescription
[abc]Matches any one character from the set {a, b, c}.
[^abc]Matches any one character not in the set {a, b, c}.
[0-9]Matches any digit from 0 to 9.
[a-z]Matches any lowercase letter from a to z.
[A-Z]Matches any uppercase letter from A to Z.
[A-z]Matches any letter from A to z.
[a-zA-Z]Matches any letter from a to z or A to Z.

Quantifiers

Quantifiers specify how many instances of a character, group, or character class must be present in the input for a match to be found.

ExpressionDescription
n+Matches any string that contains at least one n.
n*Matches any string that contains zero or more occurrences of n.
n?Matches any string that contains zero or one occurrence of n.
n{x}Matches exactly x occurrences of the item n.
n{x,}Matches x or more occurrences of the item n.
n{x,y}Matches at least x but no more than y occurrences of the item n.

Creating and Explaining a Basic Regular Expression

Let’s start by creating a regular expression to find strings that start with 'S' and end with 'e'. This example will illustrate how to construct and utilize a simple regular expression.

Regular Expression

pattern = r'^S.*e$'

Explanation:

  • ^: Asserts the start of a line, ensuring the pattern matches from the beginning of the string.
  • S: Specifies that the first character of the string must be 'S'.
  • .*: . matches any single character (except newline), and * allows zero or more repetitions of the preceding element, together matching any sequence of characters.
  • e: Specifies that the last character must be 'e'.
  • $: Asserts the end of a line, ensuring the pattern matches right up to the end of the string.

Example: Using the Regular Expression

In this example, we use the regular expression to check if a string starts with 'S' and ends with 'e'.

Python3
Python3

. . . .

Explanation:

  • import re: This line imports Python's regular expression module, re, which contains functions and classes for working with regular expressions.
  • pattern = r'^S.*e$':
    • ^S asserts that the string should start with 'S'.
    • .* allows any character sequence (including none) to appear in the middle.
    • e$ asserts that the string should end with 'e'.
  • test_string = "Sample": Defines a string that will be checked against the defined regular expression.
  • re.search(pattern, test_string): This function searches test_string for the first location where the regular expression pattern matches. It returns a match object if the pattern is found, or None if the pattern is not found.
  • if statement: Determines the output based on whether the re.search() function found a match.

Creating and Explaining an Advanced Regular Expression

Now, let’s construct a more complex regular expression to match a date format such as "DD-MM-YYYY".

Regular Expression

pattern = r'\b\d{2}-\d{2}-\d{4}\b'

Explanation:

  • \b: Word boundary, ensuring the pattern is matched at the start or end of a word.
  • \d{2}: \d matches any digit, and {2} specifies exactly two digits.
  • -: Matches the hyphen ('-') character, used to separate date components.
  • \d{4}: Matches exactly four digits, typically used for the year in a date.

Example: Using the Advanced Regular Expression

In this example, we use the regular expression to find dates within a string.

Python3
Python3

. . . .

Explanation:

  • import re: Imports the regular expression module.
  • pattern = r'\b\d{2}-\d{2}-\d{4}\b':
    • \b ensures the date is at a word boundary, preventing partial matches.
    • \d{2} matches exactly two digits, used here for day and month.
    • - matches the literal hyphen character, used as a separator.
    • \d{4} matches exactly four digits for the year.
  • text = "Today's date is 16-04-2024.": The string containing the text where we want to find the date.
  • re.search(pattern, text): Searches for the regex pattern within text.
  • if match:: If re.search() finds a match, match.group() is used to retrieve and display the matched string. If no match is found, it prints "No date found."

Regular expressions are a potent tool for pattern matching and text manipulation, enabling you to write more efficient and concise code. Starting with basic patterns helps build foundational skills, while more advanced expressions allow for sophisticated text processing tasks. By mastering regular expressions, you enhance your ability to work effectively with textual data, optimizing tasks such as data validation, cleaning, and extraction in Python.

.....

.....

.....

Like the course? Get enrolled and start learning!
Previous
Next