Debugging

The Python print() Function: Go Beyond the Basics Christopher Trudeau 10:40

The judicious insertion of print() statements can help you find bugs in your code:

Python
      
    
def movie_year(number):
    roman = { 1:'I', 5:'V', 10:'X', 50:'L', 100:'C', 500:'D', 1000:'M', }

    output = []
    divisor = 1000
    print(f'movie_year({number})')
    for digit in [int(x) for x in str(number)]:
        print(f'   digit={digit} divisor={divisor}')
        if digit <= 3:
            # e.g. 3 -> 3*'I' -> 'III'
            output.append( roman[divisor] * digit )
        elif digit == 4:
            output.append( roman[divisor] +  roman[divisor * 5] )
        elif 5 < digit <= 8:
            output.append( roman[divisor * 5] + roman[digit] * (divisor - 5) )
        elif digit == 9:
            output.append( roman[divisor] + roman[divisor * 10] )

        divisor = int(divisor / 10)
        print('      ', output)

    return ''.join(output)

Here’s what you’ll get:

Python
      
        
      
    
>>> movie_year(2015)
movie_year(2015)
   digit=2 divisor=1000
       ['MM']
   digit=0 divisor=100
       ['MM', '']
   digit=1 divisor=10
       ['MM', '', 'X']
   digit=5 divisor=1
       ['MM', '', 'X']
'MMX'

For production software, print() might not be a good long term solution. Python’s comprehensive logging library is a better choice:

Python
      
    
# counters.py
import logging, sys
logging.basicConfig(stream=sys.stdout, level=logging.DEBUG)
logger = logging.getLogger(__name__)

def count_lower_case(item):
    logger.info('count_lower_case(%s)', item)
    num = 0
    for letter in item:
        if 97 <= ord(letter) <= 122:
            logger.debug('  letter *%s* is lowercase', letter)
            num += 1

    logger.info('  returning %s', num)
    return num

Here’s what you’ll get:

Python
      
        
      
    
>>> from counters import count_lower_case
>>> count_lower_case('AbCdE')
INFO:counters:count_lower_case(AbCdE)
DEBUG:counters:  letter *b* is lowercase
DEBUG:counters:  letter *d* is lowercase
INFO:counters:  returning 2
2

00:00 In the previous lesson, I showed you how to use the mock library to write better tests, particularly when you’re testing code that has side effects like print(). In this lesson, I’m going to dive a little deeper into debugging.

00:12 I often find the hardest part of programming is actually finding and squashing the bugs. print() is a quick and easy way to help you hunt these things down to tell you what’s going on inside of your code.

00:23 It’s jokingly sometimes referred to as caveman debugging because it’s a simple and sort of old fashioned way of doing things, but often it is quite useful. In order to show you how to use print() to help you debug, I need to have some code to debug. The method on the screen at the moment generates a Roman numeral and it makes an assumption that it’s a four-digit Roman numeral, because it’s for creating the copyright year in Roman numerals that would be at the end of movie credits. The code works as follows.

00:52 There’s a dictionary at the top that declares the common letters that are used in Roman numerals. Start with a divisor of 1000, essentially assuming that our number is going to be a four-digit year. And then it loops through those divisors, essentially peeling off each of the digits, one at a time. For each digit, it looks up in the hash the value of the Roman numeral corresponding to the digit, and then prints out multiple versions of it if necessary. So for example, if the digit is less than 3, it takes the divisor, which is—if the first time, it will be 1000—the thousands position, and prints out digit number of them.

01:31 If you put in the year 2000, you will get 2 * roman[divisor], look up 1000, which gives you your two 'M'’s for your 2000. Similar logic is used if the digit is a 4, if the digit is between 5 and 8, and if the digit is equal to 9. Seeing as I’m going to show you how to debug, there’s some bugs in this code.

01:52 I’m going to try it out and see if I can find them. Start off with a movie year of 2000, get back 'MM'. So far, so good. That’s right. Now, I’m going to try 2015.

02:04 Well, that’s a problem. MMX is 2010. There’s obviously an issue with the code. Now what you want to do is put a bunch of print() statements in the code to see what happens at each step. I’ve created a bunch.

02:20 The first one here at line 6 will tell me that the movie_year() has been called and what number is being debugged, to make sure things are happening properly. At line 8, I’m printing out what digit is currently being processed and what the divisor is. You’ll notice that I put a few spaces in here. I like to do this when I’m putting print() statements in for debug.

02:40 What will happen is line 6 will put things flush against the left-hand side, and all subsequent calls will be indented. It’s easy for me to see on the screen when another function is called because it’ll be back out at the outset. Down at the bottom, I’m printing out the state of output on each iteration of the loop.

03:02 Great. So I’ve got the print() statements in there. I’m going to call the troubled 2015 again.

03:12 The first print() statement shows me that 2015 was called. The first iteration shows digit=2 divison=1000, which spits out the 'MM'—so far, so good.

03:23 The next digit is a 0. Well, right off you’ll notice there is no if statement handling this. Now, it works out because 0 is less than 3 and 0 times the hundreds digit will put out zero characters—so technically, line 11 is getting called, but it’s getting called with a blank.

03:43 It’s not really a bug, but it’s not particularly efficient.

03:48 Next, on to the third digit. The divisor is now 10. This happens correctly. On to the last digit, digit=5.

03:57 The divisor is 1, and nothing’s getting called. A quick inspection of the code shows that line 14 has an elif of < 5. The case for 5 actually isn’t in the code at the moment.

04:14 So, I’ll make a couple changes. First, line 9. This makes the code a little more efficient. And line 14, I fixed the less than or equal to (<=).

04:26 Now—as expected—when I rerun it,

04:32 processing looks much better this time, with the correct result.

04:37 Using print() as a quick tool to insert in your code and see what’s going on is useful for debugging. It does have some problems though. First off, not all software runs in a terminal.

04:49 Adding print() statements to server side code isn’t going to be particularly helpful if there’s no terminal for you to see what’s going on. Secondly, you may have to look for clues to a bug long after the bug has happened. For long running services, you need to get at information when a customer complains about something that went wrong yesterday, the day before, possibly even months earlier.

05:10 Last, print() isn’t thread-safe. Now for simple programs, this isn’t a problem, but if you are doing multithreaded processing, print() will mess things up.

05:19 You can end up with print() statements overlapping each other, and that can cause all sorts of weirdnesses inside of your terminal. As an alternative, you can use the logging libraries built into Python. Python has a very powerful, very configurable logging mechanism.

05:32 It’s a little harder to get going than a simple print() statement, but it is much more safe for large applications. Typically, your logging system sends output to a file. In large enterprises, there are often tools for sending them off to a common location, searching them, rotating them, filtering them—there’s lots and lots of tools that can help your debugging through manipulating your logs.

05:55 So, here’s a simple function that counts the number of lowercase letters inside of a string. I’m going to use this to show off the features of logging. In order to use the logger, first thing—you need to import logging from the package system. Built into logging is something called basicConfig().

06:11 This is the easiest way to configure a logger. The configuration I’m passing at the moment specifies the stream to be stdout (standard out).

06:18 This means the content will be printed to the screen.

06:22 In this configuration, logger calls are very, very similar to just print() statements. The second parameter I’m passing in is the level.

06:30 The logger has four different levels: INFO, DEBUG, WARNING, and ERROR. By setting this value, you’re turning on a filter.

06:39 This is the most promiscuous filter—it says all calls can be caught.

06:45 Finally, I’ve got logging configured, so I need to actually instantiate a logger. I do that through the getLogger() mechanism.

06:52 By passing in __name__, I’m creating a logger with the same name as this file. That allows me inside of the logs to easily distinguish between where a particular logging statement came from.

07:05 Line 8 is an .info() call to the logger. This is similar to the print() statement I used at the beginning of the movie_year() function, telling me that I’m inside of this function call. Inside of the loop that checks the case on the letters, I put in a .debug() statement, and you’ll notice that the message here uses the %s C-style mechanism for injecting content inside of a string.

07:30 I’ll come back to why you use that in a minute. Finally, I also send another .info() setting so that I can see what’s being returned by the method. Easy enough. Let’s see this in action.

07:46 Importing the method, calling the method.

07:52 It gets called. The first four statements on the screen are from the system stdout—that’s the logger calling logger.info(), logger.debug() twice inside of the loop, and logger.info() again at the end.

08:07 The 2 is the actual return value of this function.

08:12 I commented earlier that these things can be filtered. If you look at line 4, I’ve changed the logging level to INFO. If I rerun the call now, only those .debug() methods with the INFO setting are getting printed out.

08:28 This allows you to set different levels of what’s being captured in your logs.

08:33 This brings me back to the %s mechanism. Although generally it’s recommended to use f-strings when formatting, debug logs are the exception. You’ll notice in line 12 that I’m passing in the string for formatting and then the attributes into the logging call itself, rather than the usual mechanism of putting the string, %, and then the parameters.

08:56 The creation of the string actually happens inside of the function. You can’t do this with a formatted string. The reason this is important is if the filters are turned on, the string doesn’t get created, as string operations can be expensive.

09:12 That means when the debugger is set to INFO, this .debug() call doesn’t actually create the string by passing the letter object into this string and creating it. This skips a whole bunch of processing.

09:24 If I had used an f-string here, it would have created the fully impregnated string to pass it as an argument into the debugger. You can significantly reduce the amount of processing necessary on your .debug() calls by using the old style C formatting. Python’s logging module is very powerful, and it does this through being very configurable.

09:43 The basicConfig() mechanism that I showed you is the easiest way to get up and going, but you can fine-tune how your logging experience works. To give you a small taste of that, here’s an example piece of code that combines the ANSI escape sequences used in earlier lessons with the configuration of a logger.

10:02 This format string uses the ANSI escape to bold the function name inside of the message to make it easier to read on a terminal. In order to use this, you need to set the logging level, add a StreamHandler, and then add a Formatter, and all of that gets passed into the root.addHandler().

10:21 The basicConfig() call that you saw earlier does a version of these four steps for you. You’ve made it so far—only one lesson left. In this last lesson, I’ll talk about where you can get more information on some of the topics I’ve talked about, and useful libraries you can take advantage of that can help you do some of the things I’ve shown you.

Daniel Faria on June 10, 2020

Beautiful!

nj8456 on July 7, 2020

Yu guessed my feedback

cmadsoneng on Jan. 17, 2021

There’s another bug in movie_year(). Years such as 2008 fail w/o this change. For digits 5-8, we want the logic for the following characters to align with the logic that is in place for digit <= 3.

Instead of:

output.append( roman[divisor * 5] + roman[digit] * (divisor - 5) )

It should be:

output.append( roman[divisor * 5] + roman[divisor] * (digit - 5) )

Christopher Trudeau RP Team on Jan. 19, 2021

Hi cmadsoneng,

Yep you’re right, good catch! Kind of makes my point that debugging is hard, huh? This is why in the real world I write lots of unit tests :)

The sample code has been updated.

Become a Member to join the conversation.