If you’d like a laugh, then check out xkcd: Exploits of a Mom.
SQLAlchemy: Core Text
00:00 In the previous lesson, I introduced you to writing SQL to manipulate a relational database. In this lesson, I’ll show you how to use SQLAlchemy to accomplish the same things in your Python code.
00:13 SQLAlchemy is a popular third-party library for accessing and managing databases. It is made up of two parts: Core and the ORM. The core is a Python abstraction of SQL, where the ORM uses object-oriented modeling to represent database rows.
00:31 One of the advantages of using a library like SQLAlchemy is it abstracts away the connection mechanisms. You initialize the SQLAlchemy engine the same way, regardless of what kind of database you’re connecting to.
00:43 This means you can use any number of the supported databases in the same fashion. This includes SQLite, Postgresql, MySQL, Oracle, MS-SQL server, and many more.
00:58 Let’s go into the REPL and turn some lead into gold. The first thing you have to do is connect to your database. SQLAlchemy does this through an abstraction called an engine.
01:10 Let me import in the engines factory … and now I’ll create an engine.
The URL-style parameter here indicates what I’m connecting to. Instead of
http, you specify the type of database. For me, that’s
The path part indicates where the database is. For a database that used a server, this would have a hostname and a port. Because SQLite uses files, this is a file reference. I’m connecting to the
books.db file in the
data/ subdirectory. The engine factory uses lazy loading, so the connection hasn’t been established yet.
echo parameter turns on SQL echoing. Everything I do through this connection will print out the SQL equivalent. It’ll be a bit noisy, but for our purposes, you’ll be able to see exactly what is going on.
02:17 I mentioned in the first lesson that SQLAlchemy is going through an adaptation phase heading for the 2.0 release. The 1.4 version supports both the old dialect and the new dialect of SQLAlchemy.
future=True argument here says to enforce the use of the new mechanism. That way, when 2.0 is released, your code will work with it with fewer changes. Let’s use this engine to connect.
Everything from here on will be done through my
conn object. Now I’ll write a query. For this lesson, I’ll be solely using the
text query object.
02:55 This allows you to write raw SQL directly to the engine.
There you go. The
.execute() method on the connection object runs a query. The
text object takes a string as an argument containing raw SQL.
And in this case, I’m running a
select statement. The method returns a
result object. The muted logger info here is because I initialized the engine with
You can see that it created a transaction—that’s the
BEGIN—then did a
select. The database was able to run the query in 0.00024 seconds. Pretty snappy. Let’s look at the results.
result.all() returns a list of tuples, and the database I’m using was populated with two authors in the previous lesson, good old Isaac and Pearl. All right, let’s make some changes now. To do an insert, I’ll need to execute a query.
Like before, I’m using a text object, so I can put in raw SQL. There’s something new here though. Notice that in the
VALUES clause of the
INSERT, I’m using
:ln. The colon indicates that these are placeholder values.
SQLAlchemy sees these and knows to populate them with the parameters that you passed to
text(). This next line is those parameters.
I’m passing in a list of dictionaries where each dictionary has an
"ln" key. These get mapped to the
:ln placeholders in the query.
By passing in a list of dictionaries, I am able to insert more than one author at a time. Let me just close the call … and you see a few things. First off, you get the
INSERT debug info showing
"Stephen"—I speak as if they’re close friends—being inserted. Next, because I didn’t capture the result of the
.execute(), The REPL shows a
05:25 If I were writing real code, I’d check to make sure nothing went wrong here. Let’s query for our authors again.
And there you go. There are now four authors in the database. Remember that
BEGIN statement several queries ago? I’m still inside of the same transaction.
These all get grouped together as part of the database. Calling
.commit() closes the transaction off and makes all the changes permanent. So what are transactions good for? Well, if you get partway through a group of queries, you can undo all of them by rolling back the transaction. To demonstrate that, I’ll start by inserting another author.
Same as you’ve seen before … And now there’s
'Not', 'Anauthor' cluttering up my data. Calling
.rollback() instead of
.commit() … and everything has been undone to the point of the last commit.
06:55 I can show that by running the query again …
'Not', 'Anauthor' is gone. Handy that, you can undo. Let’s play with some query results some more.
but this time, instead of calling
.all(), I’ll iterate over the
result object. Each item in the
result object iterator corresponds to a row in the database, or it does this time because that was the kind of
select I did.
I can access the columns in a result by using the columns’ names:
row.first_name, for example. Note that the
result object is temporal. Accessing the items in it consumes them. That’s why calling
.all() now shows an empty list.
08:08 This might take a little getting used to, but it’s actually a good thing because it means you can iterate partway through a result, do something in your code, and then go back and not have to remember where you were. This becomes particularly important when you start to get into things like pagination and larger sets of data.
08:35 Although I like to think of the results of a SQL query as being a row in the database, this is a faulty mental abstraction. Oftentimes that is exactly what you’re getting back, but really you’re creating a dataset with specific names.
08:50 If you use a join, you’ll end up with parts of data from different tables.
09:06 Here, I’ve been specific about the fields I was interested in, so when I iterate through the results, I can treat the columns coming back as parts of a tuple.
09:30 I can also use indexes into those same tuples to get at their values. Okay, one more one more time.
.mappings() method on the
result object returns the rows as dictionaries instead of tuples. This can be useful if you wanted a dictionary format anyhow—for example, if you were going to serialize this data to JSON.
Just like with the
INSERT statement, you can parameterize a
WHERE is saying that I’m looking for authors whose last name is greater than capital
"C". Because I used a parameter to define this, I could actually have passed in the dictionary
"ln":"C" at runtime.
There’s the result of the query. Don’t forget that this is doing string comparison. Something that starts with capital
"Clancy", and is bigger than a single letter, like
"Clancy", is actually greater than just capital
11:02 Capital C, capital C. I’ve got a Nine Inch Nails song going through my head. Anyhow…
Since you specify the paramaters to a statement as part of the
.execute() method, that means you can pre-prepare a statement and reuse it. Here, I’ve kept the text object in a variable called
11:30 This is what it looks like. Not particularly helpful. But I can take this statement and bind parameters to it. Now I can execute …
11:50 and you can see that this was just another way of getting to the same place, but now I can reuse that statement and bind new parameters.
"B" it is. I used to stand for something. Trent Reznor’s an angry, angry dude. What was I saying? All done here. So like a good boy, I’ll clean up my connections by calling
12:28 I have a very important warning about writing raw SQL. You might be tempted to think, Hey, this is all text. I can just concatenate or use f-strings! And that would be a very bad idea. Doing it this way leaves you open to what’s called an SQL injection attack.
A clever person might pass you a
last_name that would break your code or worse, do malicious things. You may have noticed that all the SQL calls have ended with a semicolon (
If I pass in a
last_name here that has a semicolon and then, say, a
DELETE statement, you’d be running both your
INSERT and my
DELETE, and that’s probably not what you want. Parameters solve this problem.
13:08 They properly escape anything you pass in, so you should always use parameters. SQL injection has been around for a long time. I remember finding a vulnerability in a forms product in 1996.
13:21 You’d think us coders would learn, but no, the wrong way is still pretty common. The OWASP Top 10 vulnerabilities list dropped SQL injection to spot three in 2021, which is down from spot one in previous years.
13:36 I’m not sure whether that’s because the coding community is finally waking up to these kinds of exploits or whether it’s just that spots one and two actually just surpassed it.
13:46 Parameters only, no dynamic strings! Got it? Good. And if you want a laugh on this, go look up xkcd and and Little Bobby Tables to see an example.
SQL’s core has two ways of doing queries. You’ve seen the direct use of SQL through the
text object. Next up, you’ll see the functions and objects that abstract this away.
Hi @knizami, just above the comment tab, and below sub-title, is a drop-down labelled “Supporting Material”. Inside of that you’ll find the slides and sample code that go along with the course.
Christopher, the download has Markdown files of the script your using but doesn’t include any of the sample code or data files.
Thanks Brendan. Looks like the wrong thing got attached. I’ll get somebody on it.
Hi @Brendan @knzami and Mr. Trudeau, I’ve corrected the attached file and it is now the correct code.zip file. Sorry for the confusion.
I am following this course, by trying the code snippets in a ipynb notebook. however, once a command for result.all() or any other sql command is for the resultset is invoked, It cannot be re-run because it returns a blank. Is this the expected behaviour or can this be changed so that the result variable persists the data that was queried?
That’s expected behavior. A
resultset is context aware, that way if you read part of it you can go back and read the rest. The consequence of that is if you read all of it, it will be empty.
Being able to re-play actually only makes sense in the case of a SELECT, for any other query you’d be double-inserting, or trying to delete something that was already gone.
You can stash the results away in a list and then mess with the list if that suits your needs. Otherwise, you’ll need to re-run the query.
Become a Member to join the conversation.
knizami on July 15, 2022
is there somewhere you can download the sample code and sample database covered in this tutorial?