Security Risks
Here are additional resources about security risks:
00:00
In the previous lesson, I showed you how to use the built-in compile()
function to create compiled objects and pass them to eval()
. In this lesson, I’m going to show you more about the security risks and how to try and avoid them and how you’re going to fail at that.
00:16
eval()
allows the execution of arbitrary code.
00:22
Yeah, I know that was a little cheesy, but it kind of makes my point. You’ve got to be very careful with this. Now this doesn’t mean don’t use eval()
.
00:30 What this means is don’t use it with untrusted input. You need to know the source of the strings you’re evaluating. If they are things that a user can change, you are asking for trouble.
00:44
Now you might think to yourself, “Well, I can restrict it. They can’t import, so they can only do what I want them to.” Well, unfortunately, although import
is a statement, __import__()
is a function, which means you can create an eval()
string to import subprocess
using __import__()
, and then—as I showed you in a previous lesson—use getoutput()
.
01:08 In this case, I’m showing the contents of one of my personal files in my private directory. This is why you have to be careful where you run this. If the user is sending you a string, they can do things that are nefarious.
01:23
You can try to get a handle on this by using the globals
and locals
keywords. But remember, as I showed you in a previous lesson, an empty dictionary for globals
isn’t good enough.
01:33
Even if you use an empty dictionary for globals
, all the builtins are added by default inside of the global context. __import__()
is a built-in function, so it will be available.
01:47
You can override the __builtins__
global dictionary to wipe out anything that’s in the builtin listing and stop certain kinds of execution.
01:56
Here’s an example. On the right-hand side there, I’ve overridden "__builtins__"
and when I tried to do the evil show Chris’s private file call, I will get a NameError
because the function’s no longer there.
02:10
You can take this a step even further if you use the compile()
function and introspect what’s being called.
02:17
In an attempt to improve the security, I’ve defined a function here called strict_eval()
. It takes two parameters—an expression to execute and a dictionary mapping the names to functions of things that will be allowed to execute. On line 4, I call the built-in function compile()
, compiling the expression. On lines 6 through 8, I iterate through all of the names defined inside of the code.
02:43
If one of the names does not map into the allowed
dictionary, I raise a NameError
. Assuming all of that passes, then I call eval()
with the compiled code
object and an overridden "__builtins__"
on global, passing in the allowed
dictionary as locals
.
03:01 Let me show you this in practice.
03:09
To keep it simple, I’ll define only one function inside of the allowed
dictionary—sum()
.
03:18 The straight expression’s fine, no problem. It evaluates.
03:25
An expression with sum()
in it works as expected.
03:31
And here, I’ve attempted to use something that’s not in the allowed list. Even though the len()
(length) function is a built-in, because strict_eval()
is limiting only to those things in the allowed
dictionary, the NameError
is being raised.
03:46 You might think that’s good enough. I’ve managed to restrict what can be executed and what can’t be executed. Well, unfortunately, it’s still not safe. Let me show you why. Everything in Python is an object. Why do I bring that up?
04:01 Well, objects have methods. And if you have an object, you can get at the methods on that object and those methods can be problematic. Just a quick word of warning. The next little bit is kind of messy.
04:16 When I say everything in Python is an object, I mean everything—even an empty string. Let me show you.
04:25
all objects have .__class__
defined on them, and the .__base__
indicates the class of the object that this object inherits from.
04:33
All objects inherit from the object
object. What I’ve done here is taken an empty string, got the string class, got its parent, and that parent is the base object
class in Python.
04:47
Okay, no big deal, right? Well, let me show you just how much is defined on this object
.
05:00
What I’ve done here is create a list comprehension looping through all of the subclasses of that base class and printing out their names. As you can see, there’s a lot of stuff here, and a lot of them correspond to built-in functions. If you skim through, you’ll see things like 'type'
, 'super'
, 'float'
, 'str'
, 'range'
, dictionaries—all sorts of stuff. Let me just scroll down to the bottom.
05:31 Like I said, there’s a lot there. Now instead of a list comprehension, let me create a dictionary. What I’m going to do inside of this dictionary is map the names of the functions to the actual functions themselves.
05:50 This is very similar to the list comprehension but it’s a dictionary instead. And now I can dereference something in this dictionary and call it as a function.
06:01
Here, I’ve looked up the reference to the range()
function and then I’ve executed it, calling 'range'
, 1
to 11
. range()
returns a range
object, but just to prove this works, let me convert it to a list.
06:17
With only access to an empty string, I’ve been able to call the range()
function successfully. And this is how things get problematic when you try to block functions in eval()
.
06:28
You can always still get at these base classes and therefore get at dangerous functions. range()
isn’t, but there is some dangerous stuff in here.
06:39
Even if you use the compile()
function and inspect .co_names
, you still aren’t 100% safe. Once you can get at an object, you can create arbitrary bytecode and have that bytecode executed.
06:52
There’s an older blog post by Ned Batchelder that describes this problem in Python 2.7. It has not been fixed in Python 3. It’s a fundamental flaw in the security of eval()
.
07:05
Let me show you a quick version of the exploit that Ned does and how it can cause your system to fail. Let me grab strict_eval()
out of the file again.
07:20
And here I’ll eval something simple. Great! I’ve got 5
. I believe I promised you messy. Now here’s some messy. Don’t worry about parsing every last little bit of that.
07:32
I’ll walk you through the important part in a second. What you really need to understand just for the moment is this is a string, all it uses is lambda
—which is a Python keyword and therefore can’t be blocked by eval()
—and it then uses that trick I showed you earlier of the subclasses to get at something called the code()
function. The code()
function allows you to build arbitrary bytecode.
07:57
In this case, I’m building the bytecode b"KABOOM"
. What happens if I execute the bytecode b"KABOOM"
?
08:06
Python crashes. So even with strict_eval()
, even with me being very careful about what code is executed and what code isn’t executed, there are dangers here.
08:18 Let me try to explain how that worked. One of Python’s super powers is that it integrates very closely with the C programming language. Python isn’t known as the speediest language, so if you want to do something high-performance, Python might not be the best way to do that.
08:35 Except the C API gives you the ability to mix and match. Intensive libraries like Pandas or NumPy write the hard bits—the efficient bits—in C, and then Python calls into them.
08:48 And this is how you can get fairly high-performing scripts, mixing both the best worlds of Python’s easier-to-write environment and C’s speed on the machines.
09:00
Underlying the CPython implementation are a series of objects that represent the bytecode that’s being executed when you run your Python, and this is necessary for the C API to work. Hidden away in this library are objects called PyCodeObject
representing the actual bytecode being run. PyCode_New()
creates new versions of these objects. Inside of Ned’s dangerous lambda that I showed you before, there are three key lines.
09:28
The first thing his lambda did was map items out of the subclasses into a dictionary that can be callable, similar to how I showed you with range()
before.
09:37
Here, Ned is calling one of those functions, which is the code()
function. That code()
function creates a PyCodeObject
.
09:46 It has 14 parameters in Python 3.9, and in this case, I don’t really care what most of them are. The key one is the seventh. The seventh is a chunk of binary that is the bytecode to execute.
10:00
What Ned’s done here is executed the bytecode b"KABOOM"
. That’s not actually valid bytecode, so it causes the interpreter to crash. Because you can create a code object with any bytecode, that means you can run anything in Python if you know how to generate the bytecode.
10:19
It doesn’t matter how clever you think you are by trying to block things with string eval, by checking the .co_names
, there are always ways around it.
10:29
So it comes back to the fundamental lesson—do not use eval()
or exec()
or any other dynamic code mechanism with code from users you don’t trust.
10:41
Now that doesn’t mean you shouldn’t take advantage of these great features. You just need to know when the right time is and when you shouldn’t. If you’re interested on how PyCodeObject
works and PyCode_New()
and other pieces like it or the C API, you can read more about that here.
10:58
There is a safe version of eval()
. It’s extremely restricted and it’s part of the ast
library. It’s called literal_eval()
.
11:07
It only supports the creation of literals—numbers, lists, tuples, strings, et cetera. This is a handy way of constructing some string into an actual literal in your code. So if your user types in 15.02
and you want to change that into a float, literal_eval()
is a convenient way of doing that. This is extremely restrictive though.
11:29
You’re not going to be able to do anything else with this but get those literals into Python’s space. Safe it is, but limited it is as well. Okay. That’s enough doom and gloom. I hope you’re frightened. Now let’s actually go and use something. In the next lesson, I’ll show you how to build a little calculator that evaluates math expressions using eval()
.
Become a Member to join the conversation.