Formatting Limitations

Christopher Trudeau

Exploring Python T-Strings Christopher Trudeau 07:07

Transcript
Discussion

00:00 In the previous lesson, I reviewed the different ways of formatting strings in Python. In this lesson, I’ll quickly run you through two limitations of the format mechanisms, so you’ll better understand why t-strings got created.

00:13 An f-string gets processed inline by the interpreter. That means that any values used inside of it have to be in scope and usable, and in the case of using an f-string as an argument to a function call, the f-string gets processed and turned into a string before being passed into the function.

00:30 Sometimes, this is the opposite of the behavior you want. For example, doing the string processing might be expensive and you might want to delay its occurrence for performance reasons, and sometimes you want to be able to see what is going on inside of the string before it gets processed.

00:46 The second case is particularly important when dealing with user input. Users can be unhelpful and sometimes malicious, and they might give your program something it can’t handle.

00:56 If you put user input inside of an f-string without cleaning it first, you could run into trouble.

01:03 Let’s start by drilling down on when an f-string gets processed, known as eager evaluation.

01:10 I’m creating a little logging function that prints some content to the screen, but only if it’s turned on. If enabled is True, then print the message.

01:21 Quick test,

01:26 and there you go. To play with an f-string, I’ll create a variable, and when I log it,

01:37 it gets printed out.

01:43 The problem I’ve been talking about is this function call: note that the f-string is getting processed before the function gets called. That means the string is being interpolated and 42 is getting embedded into it before it’s getting passed into my_log.

01:58 This is the eager part. It evaluates it as soon as it can. In this situation, it enabled False. That’s a waste of time. It would be ideal if it didn’t work this way as you’re paying the cost of string construction, even though you aren’t using it. To get around this, it would be nice to move the string processing into the log function and only calling it if enabled is True to save on performance.

02:22 In fact, that’s why Python’s logger still uses C-style formatting. To demonstrate, let me set the logger up, importing.

02:32 I want to print out the results, so I need to configure the logger with standard out, which is in the sys module.

02:46 The basicConfig call is probably the quickest way to set up a logger. Here, I’m using stdout as an output stream, so the contents will get printed to the screen and I’m setting the logging level to ERROR.

02:58 Any log messages below the ERROR level don’t get printed. This is a fancier version of my enabled equals true from above. Now, I’ll instantiate the logger

03:12 and finally, I’ll use it.

03:19 The logging calls, error in this case, takes one or more arguments with the first argument being a string message and the rest being used as part of a format call.

03:29 Note that although this is a C-style string, it isn’t using the percent operator to populate the template immediately. That’s because the logger itself is making that call for you inside of the function call.

03:40 The advantage is it only does this if the log level is sufficiently high. The output’s a little muddled as the default logger shows the log level, which is the all caps ERROR, the source of the log, which is root, and then finally, the populated message.

04:01 The debug method is a lower level than error. This is like enabled equals false in my earlier example. Since the string interpolation happens inside the logger, and the first thing the logger does is check the error level, the string interpolation never gets called.

04:17 You just can’t do this with an f-string. Now, the example here is quite simple, but imagine if you’re making loads and loads of logging calls a second, or if the things being logged are large, complex objects that are expensive to stringify.

04:30 If you use an f-string, you pay that cost. Even if you aren’t going to log anything. Spoiler alert: f-strings don’t quite solve this problem. They kind of walk up next to it and smile at it, but they don’t quite get all the way there.

04:43 More on that in a later lesson.

04:47 Another potential problem with f-strings being eagerly evaluated is when you need to do some processing beforehand. One example of this situation is what’s known as a SQL injection attack.

04:59 On the screen here, I’ve got a SELECT statement that returns any row in the movie table of my database where the title is Batman. If I want to accept user input for the title, a naive way of doing that would be to construct this same thing with an f-string.

05:14 Makes sense, right? The result of this string being interpolated when title is Batman is the same as the SELECT statement above. We’re good, right? Right? Not right.

05:25 What if user, the big meanie, gives you this as the title. Semicolon is used in SQL to separate statements. When this title gets interpolated, you have a SELECT where title is empty and a drop table statement.

05:39 Generally speaking, you don’t want your users to be able to mangle your SQL in a way that is damaging to your database.

05:46 It saddens me that SQL injection is still a thing. It’s quite preventable if you know what you’re doing. Just so I’m clear, in case the giant red X here was too subtle, never, never, ever do it this way.

06:00 Most databases have a way to parameterize your calls. The syntax varies a bit, but the idea is kind of like the C-style formatting. You put a placeholder in the string, in this case, it’s a question mark, and then pass parameters to your SQL execution, letting the library create the corresponding SQL statement before running it.

06:18 If meanie user puts in drop tables, that will get treated as a string being looked up rather than as part of the SQL statement. The good news is, at least if you’re running Python, it’s harder and harder to make this mistake accidentally.

06:31 The reason this code is on a slide rather than me demonstrating it is both sqlite and SQLAlchemy libraries now have checks for this. If you attempt to use two different SQL statements in the same call, it errors out.

06:43 That said, this problem exists anywhere where user input gets used. They can muck with your results, SQL, HTML, regexes, all sorts of places. You have to clean your data first.

06:55 But it’s kind of easy to forget the cleaning data step. So Python introduced a new kind of string formatting that allows a developer to introspect on what’s being processed.

07:04 Next up, t-strings.

Become a Member to join the conversation.