string Module
In this lesson, you’ll learn about the string
module. This module will help you quickly access some string constants. Here’s an example:
>>> import string
>>> string.digits
'0123456789'
>>> string.ascii_lowercase
'abcdefghijklmnopqrstuvwxyz'
Here are some useful constants:
string.ascii_letters
string.ascii_uppercase
string.ascii_lowercase
string.digits
string.hexdigits
string.octdigits
string.punctuation
string.printable
string.whitespace
You can check out the Python documentation on the string
module.
00:00
In this section, you’ll learn all about Python’s standard library. Let’s start with the string
module. The string
module is really helpful when you have any interview question that deals with strings. Let’s first start by talking about ASCII code.
00:14
So, ASCII code is basically a mapping between characters to numbers. For example, the ASCII code for capital 'A'
is 65
.
00:24
The ASCII code for lowercase 'a'
is 97
. When you compare characters, it compares the ASCII value. When you do 'A' > 'a'
you get False
, because 65
is not greater than 97
.
00:38
It’s also how you compare strings, because something like 'abc' > 'abd'
—it would compare the ASCII code of the first character, then the ASCII code of the second, and then the third, to see if they’re different.
00:49
And that’s because ord('c')
is 99
and ord('d')
is 100
. Now, moving on to how to check if a string is uppercase or not.
01:01
'HELLO WORLD'.isupper()
. So, this is a built-in method in string
, which will check if the whole string is uppercase. Now, let’s look at the string
module to see what are the uppercase characters.
01:13
So, import string
and then string.ascii_uppercase
. This will spit out a string that is all the uppercase ASCII characters. Notice how space (' '
) is not in there, but isupper()
actually does return True
.
01:27
Let’s write a better is_upper()
method—or, I guess not better, but one that takes into account if the string has spaces. So, is_upper()
, do something like for letter in s:
if letter not in string.ascii_uppercase:
—so, this means that the letter we’re on is not uppercase, then you know for a fact that we can return False
, because the string is not uppercase.
01:51
Then, if you get through all the letters and none of them check this if
statement, you can return True
. So now, calling is_upper()
on 'HELLO WORLD'
will return False
because there’s a whitespace, but if you remove that, it will return True
. There’s one small optimization you can do here.
02:08
Notice how string.ascii_uppercase
has 26 letters, and this line if letter not in
[…] actually checks membership, so it might potentially have to loop through the entire string to find if the letter is not in that string.
02:21
So instead, you could do something like uppercase_set
, which is just a set of the string.ascii_uppercase
,
02:30 and then you could rewrite this function to just check membership in that set.
02:35
Let’s just call this a different function—is_upper_using_set()
. Let’s do a quick %timeit
(time it). %timeit
is a magic method that is built into IPython, where you pass it a expression and it will evaluate that a bunch of times, then find the mean and the standard deviation for how long it took.
02:51
Let’s see how long our is_upper()
normal takes for 'HELLO WORLD'
. 569
nanoseconds, plus or minus 0.8 nanoseconds per loop, and it did it about a million times.
03:01
Then, let’s run our is_upper_using_set()
. 400
nanoseconds. So, you save about 170 nanoseconds each time, which really isn’t that significant, but if you’re running this, you know, millions of times, it might add up. And, it’s just good practice when looking for membership to use sets. Okay, let’s rewrite this function a little bit cleaner—is_upper_cleaner()
.
03:22
We’ll still use the set, but we’ll also do it in one line using the all()
syntax. Remember, all()
is a function that takes in an iterable and returns True
if all of the values are True
values—otherwise, False
.
03:35
We can do all(letter in uppercase_set for letter in s)
.
03:43
is_upper_cleaner()
of 'HELLO WORLD'
returns False
, and without the space, returns True
. You might be thinking, “This short circuits, but doesn’t this evaluate each time?” Well, that’s not the case here, because letter in uppercase_set for letter in s
is actually a generator, so all()
will call next()
on each value and short circuit if it sees a False
value.
04:06
So, something like all(print(5) for _ in range(5))
,
04:13
will only print 5
once, and short circuit. Let’s look at some of the other constants in the string
module. string.ascii_letters
will give you all the valid ASCII letters.
04:24
ascii_uppercase
we saw already, ascii_lowercase
is all the lowercases, digits
is the valid digits,
04:34
string.hexdigits
will give you all the possible digits when looking at hexadecimal numbers, string.octdigits
will give you all the digits used in octal decimal numbers, string.punctuation
will give you all the different types of punctuation, string.printable
will give you all the possible characters that are considered printable, then string.whitespace
is all the possible whitespaces.
04:58
Let’s just look quickly at an example of removing all the types of whitespaces from a string. First, you could create the whitespace_set
and then create a generator expression letter for letter in s
—just like, 'HELLO WORLD'
, or something.
05:12
if letter not in whitespace_set
, and then .join()
that generator expression,
05:21
like this. Just a clean one-line, or you could obviously extract it into a for
loop and .append()
to a new string. This was a short video on the string
module. In an interview, if you have to somehow compare characters to some of these string
constants, instead of defining your own constants, you can use the built-in ones. In the next video, you’ll learn about the itertools
module, which has a lot of functions that return iterators for efficient looping.
Bartosz Zaczyński RP Team on Jan. 7, 2021
@abhinav TL;DR It’s because you’re using a generator expression, which evaluates lazily.
(By the way, functions including the built-on all()
one, are expressions and not statements.)
Let’s take a look at the all()
function’s documentation:
Return
True
if all elements of the iterable are true (or if the iterable is empty). Equivalent to:
def all(iterable):
for element in iterable:
if not element:
return False
return True
In other words, it expects a sequence such as the generator expression that you’re passing in your example. Perhaps, it would be easier to understand what’s happening if we used a list instead:
>>> [print(5) for _ in range(5)]
5
5
5
5
5
[None, None, None, None, None]
So, the function will get a sequence comprised of five None
elements. Now, it will iterate over those elements, testing if one of them is falsy. Unless all elements evaluate to True
, the function will stop after the first element that evaluates to False
. It turns out that None
is falsy when you try to use it in a Boolean context:
>>> bool(None)
False
Therefore, it will stop right after the first call to the print()
function. Because you’re using a generator expression, which evaluates lazily, the subsequent calls to print()
won’t be made. However, if you replaced the generator expression with a list, then you’d see all five invocations:
>>> all([print(5) for _ in range(5)])
5
5
5
5
5
False
abhinav on Jan. 7, 2021
@Bartosz
If I understand correctly, calling print()
returns None
which evaluates to False
and that’s why the generator expression stops after the first one.
Thanks for the great explanation. btw I didn’t know there was a difference between statements and expressions, thanks for that, will read more. :)
Bartosz Zaczyński RP Team on Jan. 7, 2021
@abhinav Precisely, the print()
function implicitly returns None
, which is true for any function that doesn’t use the return
keyword.
Become a Member to join the conversation.
abhinav on Jan. 7, 2021
I’m having trouble understanding why the
all
statement stops after the first print.