Regular Expressions

Introduction to Regular Expressions

Regular expressions (regex) are powerful tools for searching, matching, and manipulating strings based on specific patterns. In Python, the re module provides functions for working with regex.

What is a Regular Expression? A regular expression is a sequence of characters that define a search pattern. This pattern can be used for various tasks such as validating input, searching for specific strings, or replacing parts of strings.

Basic Syntax : Here's a brief overview of some common regex components:

The re Module

To work with regular expressions in Python, you need to import the re module.

import re

Common Functions

Basic Examples

Matching with re.match()

import re pattern = r'hello' string = 'hello world' match = re.match(pattern, string) if match: print("Match found:", match.group()) else: print("No match")

Searching with re.search()

import re pattern = r'world' string = 'hello world' search = re.search(pattern, string) if search: print("Search found:", search.group()) else: print("No search found")

Finding All Matches with re.findall()

import re pattern = r'\d+' # Matches one or more digits string = 'I have 2 apples and 3 oranges.' matches = re.findall(pattern, string) print("All matches:", matches) # Output: ['2', '3']

Replacing with re.sub()

import re pattern = r'apples' replacement = 'bananas' string = 'I have apples and oranges.' new_string = re.sub(pattern, replacement, string) print("Replaced string:", new_string) # Output: I have bananas and oranges.

Advanced Patterns

import re text = """I love cats and dogs. My email is test@example.com. The price of apples is $3 and oranges is $2. Hello World hello world abc abcd abcde""" # 1. Lookaheads lookahead_pattern = r'cats(?= and)' lookahead_match = re.search(lookahead_pattern, text) if lookahead_match: print("Lookahead Found:", lookahead_match.group()) # 2. Lookbehinds lookbehind_pattern = r'(?<=email is )\w+@\w+\.\w+' lookbehind_match = re.search(lookbehind_pattern, text) if lookbehind_match: print("Lookbehind Found:", lookbehind_match.group()) # 3. Non-capturing Groups non_capturing_pattern = r'(?:apples|oranges)' non_capturing_matches = re.findall(non_capturing_pattern, text) print("Non-capturing Groups Found:", non_capturing_matches) # 4. Named Groups named_group_pattern = r'(?P<fruit>apples|oranges) is \$(?P<price>\d)' named_group_matches = re.finditer(named_group_pattern, text) for match in named_group_matches: print(f"Named Group Found: {match.group('fruit')} costs ${match.group('price')}") # 5. Backreferences backreference_pattern = r'(\b\w+) \1' backreference_matches = re.finditer(backreference_pattern, text) for match in backreference_matches: print("Backreference Found:", match.group()) # 6. Verbose Mode verbose_pattern = re.compile(r""" \b # Word boundary (cat|dog) # Match 'cat' or 'dog' \b # Word boundary """, re.VERBOSE) verbose_matches = verbose_pattern.findall(text) print("Verbose Mode Found:", verbose_matches) # 7. Flags flag_pattern = r'hello' flag_matches = re.findall(flag_pattern, text, re.IGNORECASE) print("Flags Found:", flag_matches)