Here are additional resources about security risks:
In the previous lesson, I showed you how to use the built-in
compile() function to create compiled objects and pass them to
eval(). In this lesson, I’m going to show you more about the security risks and how to try and avoid them and how you’re going to fail at that.
Now you might think to yourself, “Well, I can restrict it. They can’t import, so they can only do what I want them to.” Well, unfortunately, although
import is a statement,
__import__() is a function, which means you can create an
eval() string to import
__import__(), and then—as I showed you in a previous lesson—use
01:08 In this case, I’m showing the contents of one of my personal files in my private directory. This is why you have to be careful where you run this. If the user is sending you a string, they can do things that are nefarious.
Here’s an example. On the right-hand side there, I’ve overridden
"__builtins__" and when I tried to do the evil show Chris’s private file call, I will get a
NameError because the function’s no longer there.
In an attempt to improve the security, I’ve defined a function here called
strict_eval(). It takes two parameters—an expression to execute and a dictionary mapping the names to functions of things that will be allowed to execute. On line 4, I call the built-in function
compile(), compiling the expression. On lines 6 through 8, I iterate through all of the names defined inside of the code.
If one of the names does not map into the
allowed dictionary, I raise a
NameError. Assuming all of that passes, then I call
eval() with the compiled
code object and an overridden
"__builtins__" on global, passing in the
allowed dictionary as
And here, I’ve attempted to use something that’s not in the allowed list. Even though the
len() (length) function is a built-in, because
strict_eval() is limiting only to those things in the
allowed dictionary, the
NameError is being raised.
03:46 You might think that’s good enough. I’ve managed to restrict what can be executed and what can’t be executed. Well, unfortunately, it’s still not safe. Let me show you why. Everything in Python is an object. Why do I bring that up?
04:01 Well, objects have methods. And if you have an object, you can get at the methods on that object and those methods can be problematic. Just a quick word of warning. The next little bit is kind of messy.
What I’ve done here is create a list comprehension looping through all of the subclasses of that base class and printing out their names. As you can see, there’s a lot of stuff here, and a lot of them correspond to built-in functions. If you skim through, you’ll see things like
'range', dictionaries—all sorts of stuff. Let me just scroll down to the bottom.
05:31 Like I said, there’s a lot there. Now instead of a list comprehension, let me create a dictionary. What I’m going to do inside of this dictionary is map the names of the functions to the actual functions themselves.
Here, I’ve looked up the reference to the
range() function and then I’ve executed it, calling
range() returns a
range object, but just to prove this works, let me convert it to a list.
I’ll walk you through the important part in a second. What you really need to understand just for the moment is this is a string, all it uses is
lambda—which is a Python keyword and therefore can’t be blocked by
eval()—and it then uses that trick I showed you earlier of the subclasses to get at something called the
code() function. The
code() function allows you to build arbitrary bytecode.
08:18 Let me try to explain how that worked. One of Python’s super powers is that it integrates very closely with the C programming language. Python isn’t known as the speediest language, so if you want to do something high-performance, Python might not be the best way to do that.
Underlying the CPython implementation are a series of objects that represent the bytecode that’s being executed when you run your Python, and this is necessary for the C API to work. Hidden away in this library are objects called
PyCodeObject representing the actual bytecode being run.
PyCode_New() creates new versions of these objects. Inside of Ned’s dangerous lambda that I showed you before, there are three key lines.
What Ned’s done here is executed the bytecode
b"KABOOM". That’s not actually valid bytecode, so it causes the interpreter to crash. Because you can create a code object with any bytecode, that means you can run anything in Python if you know how to generate the bytecode.
Now that doesn’t mean you shouldn’t take advantage of these great features. You just need to know when the right time is and when you shouldn’t. If you’re interested on how
PyCodeObject works and
PyCode_New() and other pieces like it or the C API, you can read more about that here.
It only supports the creation of literals—numbers, lists, tuples, strings, et cetera. This is a handy way of constructing some string into an actual literal in your code. So if your user types in
15.02 and you want to change that into a float,
literal_eval() is a convenient way of doing that. This is extremely restrictive though.
You’re not going to be able to do anything else with this but get those literals into Python’s space. Safe it is, but limited it is as well. Okay. That’s enough doom and gloom. I hope you’re frightened. Now let’s actually go and use something. In the next lesson, I’ll show you how to build a little calculator that evaluates math expressions using
Become a Member to join the conversation.