Regular expressions (regex) are patterns used to match sequences of characters in strings. Python provides the re
module to work with regular expressions for tasks like searching, matching, replacing, and extracting specific patterns from text.
Importing the re
Module
To use regular expressions in Python, you must first import the re
module.
import re
Basic Regular Expression Functions
Here are the most commonly used functions in the re
module:
re.match()
: Determines if the regular expression matches at the beginning of the string.re.search()
: Searches for the first occurrence of the pattern in the string.re.findall()
: Returns all occurrences of the pattern in the string as a list.re.sub()
: Replaces occurrences of the pattern with a specified string.
Example 1: re.match()
import re text = "Python is fun" pattern = r"Python" # Check if the text starts with "Python" match = re.match(pattern, text) if match: print("Match found!") # Output: Match found! else: print("No match.")
Example 2: re.search()
import re text = "I love Python programming" pattern = r"Python" # Search for "Python" anywhere in the string result = re.search(pattern, text) if result: print("Pattern found!") # Output: Pattern found! else: print("Pattern not found.")
Example 3: re.findall()
import re text = "The rain in Spain falls mainly on the plain" pattern = r"\bain\b" # Matches "ain" as a whole word # Find all occurrences of the pattern matches = re.findall(pattern, text) print(matches) # Output: ['ain', 'ain']
Example 4: re.sub()
import re text = "The sky is blue. The ocean is blue." pattern = r"blue" # Replace "blue" with "green" new_text = re.sub(pattern, "green", text) print(new_text) # Output: The sky is green. The ocean is green.
Special Characters in Regular Expressions
Regular expressions use special characters to represent specific patterns. Here are some common ones:
Character | Description | Example |
---|---|---|
. |
Matches any character (except newline). | r"a.b" matches “acb”, “a7b”, etc. |
^ |
Matches the start of the string. | r"^Hello" matches “Hello world”. |
$ |
Matches the end of the string. | r"world$" matches “Hello world”. |
* |
Matches 0 or more repetitions of the preceding character. | r"do*g" matches “dg”, “dog”, “dooog”, etc. |
+ |
Matches 1 or more repetitions of the preceding character. | r"do+g" matches “dog”, “dooog”, etc. |
? |
Matches 0 or 1 occurrence of the preceding character. | r"do?g" matches “dg” or “dog”. |
\d |
Matches any digit (0-9). | r"\d" matches “1”, “9”, etc. |
\w |
Matches any word character (alphanumeric). | r"\w" matches “a”, “9”, etc. |
\s |
Matches any whitespace character (space, tab, newline). | r"\s" matches ” “, “\t”, etc. |
Compiling Regular Expressions
You can compile regular expressions for better performance, especially if the pattern is used multiple times.
import re pattern = re.compile(r"\d+") # Matches one or more digits text = "Order number: 12345" # Use the compiled pattern to search result = pattern.search(text) if result: print(f"Found: {result.group()}") # Output: Found: 12345
Conclusion
Regular expressions are a powerful tool for text processing in Python. Understanding how to use re
functions and special characters will help you solve complex string-matching problems with ease. Practice using regex to unlock its full potential!