string Module

Python Coding Interviews: Tips & Best Practices James Uejio 05:48

In this lesson, you’ll learn about the string module. This module will help you quickly access some string constants. Here’s an example:

Python
      
        
      
    
>>> import string
>>> string.digits
'0123456789'
>>> string.ascii_lowercase 
'abcdefghijklmnopqrstuvwxyz'

Here are some useful constants:

string.ascii_letters
string.ascii_uppercase
string.ascii_lowercase
string.digits
string.hexdigits
string.octdigits
string.punctuation
string.printable
string.whitespace

You can check out the Python documentation on the string module.

00:00 In this section, you’ll learn all about Python’s standard library. Let’s start with the string module. The string module is really helpful when you have any interview question that deals with strings. Let’s first start by talking about ASCII code.

00:14 So, ASCII code is basically a mapping between characters to numbers. For example, the ASCII code for capital 'A' is 65.

00:24 The ASCII code for lowercase 'a' is 97. When you compare characters, it compares the ASCII value. When you do 'A' > 'a' you get False, because 65 is not greater than 97.

00:38 It’s also how you compare strings, because something like 'abc' > 'abd'—it would compare the ASCII code of the first character, then the ASCII code of the second, and then the third, to see if they’re different.

00:49 And that’s because ord('c') is 99 and ord('d') is 100. Now, moving on to how to check if a string is uppercase or not.

01:01 'HELLO WORLD'.isupper(). So, this is a built-in method in string, which will check if the whole string is uppercase. Now, let’s look at the string module to see what are the uppercase characters.

01:13 So, import string and then string.ascii_uppercase. This will spit out a string that is all the uppercase ASCII characters. Notice how space (' ') is not in there, but isupper() actually does return True.

01:27 Let’s write a better is_upper() method—or, I guess not better, but one that takes into account if the string has spaces. So, is_upper(), do something like for letter in s: if letter not in string.ascii_uppercase:—so, this means that the letter we’re on is not uppercase, then you know for a fact that we can return False, because the string is not uppercase.

01:51 Then, if you get through all the letters and none of them check this if statement, you can return True. So now, calling is_upper() on 'HELLO WORLD' will return False because there’s a whitespace, but if you remove that, it will return True. There’s one small optimization you can do here.

02:08 Notice how string.ascii_uppercase has 26 letters, and this line if letter not in […] actually checks membership, so it might potentially have to loop through the entire string to find if the letter is not in that string.

02:21 So instead, you could do something like uppercase_set, which is just a set of the string.ascii_uppercase,

02:30 and then you could rewrite this function to just check membership in that set.

02:35 Let’s just call this a different function—is_upper_using_set(). Let’s do a quick %timeit (time it). %timeit is a magic method that is built into IPython, where you pass it a expression and it will evaluate that a bunch of times, then find the mean and the standard deviation for how long it took.

02:51 Let’s see how long our is_upper() normal takes for 'HELLO WORLD'. 569 nanoseconds, plus or minus 0.8 nanoseconds per loop, and it did it about a million times.

03:01 Then, let’s run our is_upper_using_set(). 400 nanoseconds. So, you save about 170 nanoseconds each time, which really isn’t that significant, but if you’re running this, you know, millions of times, it might add up. And, it’s just good practice when looking for membership to use sets. Okay, let’s rewrite this function a little bit cleaner—is_upper_cleaner().

03:22 We’ll still use the set, but we’ll also do it in one line using the all() syntax. Remember, all() is a function that takes in an iterable and returns True if all of the values are True values—otherwise, False.

03:35 We can do all(letter in uppercase_set for letter in s).

03:43 is_upper_cleaner() of 'HELLO WORLD' returns False, and without the space, returns True. You might be thinking, “This short circuits, but doesn’t this evaluate each time?” Well, that’s not the case here, because letter in uppercase_set for letter in s is actually a generator, so all() will call next() on each value and short circuit if it sees a False value.

04:06 So, something like all(print(5) for _ in range(5)),

04:13 will only print 5 once, and short circuit. Let’s look at some of the other constants in the string module. string.ascii_letters will give you all the valid ASCII letters.

04:24 ascii_uppercase we saw already, ascii_lowercase is all the lowercases, digits is the valid digits,

04:34 string.hexdigits will give you all the possible digits when looking at hexadecimal numbers, string.octdigits will give you all the digits used in octal decimal numbers, string.punctuation will give you all the different types of punctuation, string.printable will give you all the possible characters that are considered printable, then string.whitespace is all the possible whitespaces.

04:58 Let’s just look quickly at an example of removing all the types of whitespaces from a string. First, you could create the whitespace_set and then create a generator expression letter for letter in s—just like, 'HELLO WORLD', or something.

05:12 if letter not in whitespace_set, and then .join() that generator expression,

05:21 like this. Just a clean one-line, or you could obviously extract it into a for loop and .append() to a new string. This was a short video on the string module. In an interview, if you have to somehow compare characters to some of these string constants, instead of defining your own constants, you can use the built-in ones. In the next video, you’ll learn about the itertools module, which has a lot of functions that return iterators for efficient looping.

abhinav on Jan. 7, 2021

I’m having trouble understanding why the all statement stops after the first print.

>>> all(print(5) for _ in range(5))
>>> 5
>>> False

Bartosz Zaczyński RP Team on Jan. 7, 2021

@abhinav TL;DR It’s because you’re using a generator expression, which evaluates lazily.

(By the way, functions including the built-on all() one, are expressions and not statements.)

Let’s take a look at the all() function’s documentation:

Return True if all elements of the iterable are true (or if the iterable is empty). Equivalent to:

def all(iterable):
    for element in iterable:
        if not element:
            return False
    return True

In other words, it expects a sequence such as the generator expression that you’re passing in your example. Perhaps, it would be easier to understand what’s happening if we used a list instead:

>>> [print(5) for _ in range(5)]
5
5
5
5
5
[None, None, None, None, None]

So, the function will get a sequence comprised of five None elements. Now, it will iterate over those elements, testing if one of them is falsy. Unless all elements evaluate to True, the function will stop after the first element that evaluates to False. It turns out that None is falsy when you try to use it in a Boolean context:

>>> bool(None)
False

Therefore, it will stop right after the first call to the print() function. Because you’re using a generator expression, which evaluates lazily, the subsequent calls to print() won’t be made. However, if you replaced the generator expression with a list, then you’d see all five invocations:

>>> all([print(5) for _ in range(5)])
5
5
5
5
5
False

abhinav on Jan. 7, 2021

@Bartosz

If I understand correctly, calling print() returns None which evaluates to False and that’s why the generator expression stops after the first one.

Thanks for the great explanation. btw I didn’t know there was a difference between statements and expressions, thanks for that, will read more. :)

Bartosz Zaczyński RP Team on Jan. 7, 2021

@abhinav Precisely, the print() function implicitly returns None, which is true for any function that doesn’t use the return keyword.

Become a Member to join the conversation.