Regular expressions (regex) are powerful tools for searching, matching, and manipulating strings based on specific patterns. In Python, the re
module provides functions for working with regex.
What is a Regular Expression? A regular expression is a sequence of characters that define a search pattern. This pattern can be used for various tasks such as validating input, searching for specific strings, or replacing parts of strings.
Basic Syntax : Here's a brief overview of some common regex components:
hello
matches "hello")..
: Matches any single character.^
: Matches the start of a string.$
: Matches the end of a string.*
: Matches 0 or more repetitions of the * preceding element.+
: Matches 1 or more repetitions of the preceding element.?
: Matches 0 or 1 occurrence of the preceding element.[]
: Matches any single character within the brackets (e.g., [abc] matches "a", "b", or "c").|
: Acts like a logical OR (e.g., abc|def matches either "abc" or "def").re
ModuleTo work with regular expressions in Python, you need to import the re
module.
import re
Common Functions
re.match()
: Determines if the regex matches at the beginning of the string.re.search()
: Searches the entire string for a match.re.findall()
: Returns a list of all matches in the string.re.sub()
: Replaces occurrences of the regex with a specified replacement.re.match()
import re
pattern = r'hello'
string = 'hello world'
match = re.match(pattern, string)
if match:
print("Match found:", match.group())
else:
print("No match")
re.search()
import re
pattern = r'world'
string = 'hello world'
search = re.search(pattern, string)
if search:
print("Search found:", search.group())
else:
print("No search found")
re.findall()
import re
pattern = r'\d+' # Matches one or more digits
string = 'I have 2 apples and 3 oranges.'
matches = re.findall(pattern, string)
print("All matches:", matches) # Output: ['2', '3']
re.sub()
import re
pattern = r'apples'
replacement = 'bananas'
string = 'I have apples and oranges.'
new_string = re.sub(pattern, replacement, string)
print("Replaced string:", new_string) # Output: I have bananas and oranges.
Syntax
: X(?=Y)
(e.g., r'cats(?= and)').
Lookbehinds: Assert that a pattern is preceded by another. Syntax
: (?<=Y)X
(e.g., r'(?<=email is )\w+@\w+.\w+').
Non-capturing Groups: Group patterns without capturing them for backreferencing. Syntax
: (?:...)
(e.g., r'(?:apples|oranges)').
Named Groups: Assign names to capturing groups for easier access. Syntax
: (?P<name>...)
(e.g., r'(?PSyntax
: \n
where n
is the group number (e.g., r'(\b\w+) \1').
Verbose Mode: Write regex with whitespace and comments for readability. Use re.VERBOSE
(e.g., r"""\b(cat|dog)\b""").
Flags: Modify regex behavior (e.g., re.IGNORECASE
for case-insensitive matching).import re
text = """I love cats and dogs.
My email is test@example.com.
The price of apples is $3 and oranges is $2.
Hello World
hello world
abc abcd abcde"""
# 1. Lookaheads
lookahead_pattern = r'cats(?= and)'
lookahead_match = re.search(lookahead_pattern, text)
if lookahead_match:
print("Lookahead Found:", lookahead_match.group())
# 2. Lookbehinds
lookbehind_pattern = r'(?<=email is )\w+@\w+\.\w+'
lookbehind_match = re.search(lookbehind_pattern, text)
if lookbehind_match:
print("Lookbehind Found:", lookbehind_match.group())
# 3. Non-capturing Groups
non_capturing_pattern = r'(?:apples|oranges)'
non_capturing_matches = re.findall(non_capturing_pattern, text)
print("Non-capturing Groups Found:", non_capturing_matches)
# 4. Named Groups
named_group_pattern = r'(?P<fruit>apples|oranges) is \$(?P<price>\d)'
named_group_matches = re.finditer(named_group_pattern, text)
for match in named_group_matches:
print(f"Named Group Found: {match.group('fruit')} costs ${match.group('price')}")
# 5. Backreferences
backreference_pattern = r'(\b\w+) \1'
backreference_matches = re.finditer(backreference_pattern, text)
for match in backreference_matches:
print("Backreference Found:", match.group())
# 6. Verbose Mode
verbose_pattern = re.compile(r"""
\b # Word boundary
(cat|dog) # Match 'cat' or 'dog'
\b # Word boundary
""", re.VERBOSE)
verbose_matches = verbose_pattern.findall(text)
print("Verbose Mode Found:", verbose_matches)
# 7. Flags
flag_pattern = r'hello'
flag_matches = re.findall(flag_pattern, text, re.IGNORECASE)
print("Flags Found:", flag_matches)