Loading video player…

Regular Expressions and Building Regexes in Python (Overview)

In this course, you’ll explore regular expressions, also known as regexes, in Python. A regex is a special sequence of characters that defines a pattern for complex string-matching functionality.

String matching like this is a common task in programming, and you can get a lot done with string operators and built-in methods. At times, though, you may need more sophisticated pattern-matching capabilities.

In this course, you’ll learn:

  • How to access the re module, which implements regex matching in Python
  • How to use re.search() to match a pattern against a string
  • How to create complex matching pattern with regex metacharacters
  • Explore more functions, beyond re.search(), that the re module provides
  • Learn when and how to precompile a regex in Python into a regular expression object
  • Discover useful things that you can do with the match object returned by the functions in the re module
Download

Sample Code (.zip)

13.9 KB
Download

Course Slides (.pdf)

876.5 KB

00:00 Welcome to Regular Expressions: Regexes in Python. My name is Chris and I will be your guide. In this course, you will learn what a regular expression is—when to use it and why, how to use matching characters and ranges of characters in your regular expressions, how to control the placement of a match, how to group and repeat patterns in a match, using the Python re (regular expression) module to apply regular expressions in your code, and how to modify the behavior of a regex using the flag parameters.

00:34 A quick note: the samples in this course were tested using Python 3.8.2. For the most part, regexes have been there since almost the beginning of Python and there haven’t been that many changes.

00:46 There are some subtle differences in how the re.sub() method works in Python 3.7 when you’re using zero-length matches, but most people don’t run into this.

00:56 The other big change between Python 2 and Python 3 is, like everything else in Python 3, the regexes default to Unicode. Besides that, anything you see should work fairly consistently across any version. Regular expressions are a language in their own right, so they’re really sort of a language inside of Python.

01:16 The short form is regex—you’ll hear me using that frequently. And what is it? It’s a sequence of characters that defines, typically, a search pattern—this is matching some text using a pattern, and is commonly used inside of a lot of applications for find and replace. The concepts behind regexes were kicking around in the 1950s and were first formalized in a paper by Stephen Cole Kleene.

01:42 There are a number of different syntaxes of regexes. The two most common are the POSIX standard, and what’s often referred to as Perl syntax, or Perl Compatible Regular Expressions.

01:54 So, where would you use a regex? Their primary purpose is text matching. This might mean verifying the user’s input complies to something—for example, whether or not a phone number input by the user matches the pattern of phone numbers expected in your locale.

02:09 They’re often used in applications for search and replace—“Find this chunk of text, replace it with that chunk of text.” If you’re familiar with the .split() and .replace() methods inside of the Python str library, there are regular expression versions of these that are far more powerful, allowing you to define complicated patterns for splitting on and complicated patterns for matching and replacing.

02:31 One of the more powerful parts of a regular expression is you are able to define the expression in such a way that you can parse out multiple parts of a string, identifying different components as it goes along.

02:42 This allows you to do complicated matching mechanisms in one pass.

02:47 You can find regexes almost everywhere in computer software. There are a variety of Unix utilities that use a base version of regexes, including sed, awk and grep.

02:57 And many word processors and text editors used for coding use regexes to enhance the capabilities of search and replace within the software. The POSIX standard for regexes has two parts: the Basic Regular Expression mechanism and the Extended Regular Expression mechanism. If you are using grep, by default, it uses the Basic Regular Expression definition.

03:20 Some versions of grep have a flag that allow you to use the Extended Regular Expression definition. There’s also a program out there called egrep that uses this by default.

03:31 Most major programming languages support—through some library mechanism—some variation on regexes. Perl, Java, JavaScript, Julia, Python, Ruby—the list goes on. Many languages, including Python, use the PCRE with some minor differences.

03:48 In the next lesson, I’ll introduce you to your first regular expression with character matching and ranges of characters.

Become a Member to join the conversation.