Relational Databases

SQLite and SQLAlchemy in Python: Moving Your Data Beyond Flat Files Christopher Trudeau 13:53

To download SQLite, click here.

00:00 In the previous lesson, I covered flat file storage. In this lesson, I’ll introduce you to databases and SQL. A relational database gets its name from the fact that it stores the relationships between things. It does this through treating data as a series of tuples. This is kind of like the row in the CSV file you saw in the last lesson, except a row can contain references to another row or in another table altogether. The relationships are identified through the use of keys.

00:31 There are two main types. Primary keys are a unique identifier for a tuple. These are usually integers and typically are auto-incremented by the database engine, so you don’t have to think about it.

00:45 The second kind of key is a foreign key—foreign, in this case, meaning from another table. This would be the primary key for another object. A tuple containing a foreign key is indicating a relationship with the tuple for which the value is its primary key.

01:04 Consider the following example. This is a table with authors. PK is short for primary key. Each row in our table is a tuple of the primary key, first name, and last name. I can add a different table of books.

01:20 Same idea: each tuple has a primary key, but this time it also has a foreign key pointing to the author table. It and Dead Zone are by author number one, the king of horror, Stephen.

01:35 Foreign key doesn’t have to be to a different table. In fact, it can be to another row in the same table. This new column allows me to indicate that Stephen and Richard are the same person.

01:47 This isn’t actually a very good design, as means an author can only have one alias, but it makes my point about foreign keys being able to be on the same table. In a future lesson, I’ll address how you would do this properly.

02:01 As far as I know, every relational database out there supports some dialect of the structured query language, or SQL to its friends. This language is a little different from what you might be used to in coding.

02:13 It’s what’s known as a declarative language. That means you write the final state of what you want, and the computer makes it so. You’re describing what you want rather than how to get it.

02:24 There’s a core set of concepts to SQL that pretty much get implemented by all databases. And then many databases add to the language features that only work with them.

02:34 This is what is known in the industry as vendor lock-in. For the most part, I’ll be sticking to the generic stuff in this course. The beauty of SQL is you don’t have to think about the underlying format.

02:46 The database engine is responsible for that. You can say create a table, and the database makes you a table. In this course, I’ll be using SQLite as my database engine.

02:59 This is a small, single-file, self-contained engine, which doesn’t require a server. According to its website, it’s the most-used engine in the world, and whether or not you know it, you’re probably using it. It gets embedded in applications all the time, and there’s a good chance it’s on your phone right now.

03:18 SQLite comes with a command-line tool that works like a Python REPL, allowing you to interact directly with your database. I’ll be starting with that before moving on SQLAlchemy in a future lesson.

03:31 Your operating system may or may not come with SQLite. You can grab either the source code or a binary at the link here if you’re planning to follow along and it doesn’t come by default in whatever you’re using.

03:45 The command-line tool that I mentioned for SQLite is called sqlite3. When you run it, you give it the name of the file that it will be using as its storage mechanism.

03:58 Once inside, you get prompted. There are two kinds of things you can do here. You can run dot commands or enter SQL directly. The dot commands are built-in utilities provided by SQLite, whereas the SQL will be the language you use to do things in the database.

04:17 And there is your first SQL lesson. The comment symbol in SQL is two dashes (--). Let’s create a table for our authors.

04:29 The CREATE command takes two arguments: what to create—a table in this case—and what to call it, author. The parenthesis is then used to give the command parameters.

04:41 When creating a table, the parameters are the columns in the table.

04:51 My first column is the primary key, which I’m calling author_id. It is an integer, which can’t be empty, and as I said, it’s the primary key.

05:03 The second column is the author’s first_name. There are several ways of storing text in a database. Remember when I mentioned SQL dialects? Well, here’s your first gotcha.

05:13 SQLite doesn’t care about string length. No matter what kind of text declaration you use, or how much space you specify, you’ll get the same underlying thing. On other database systems, you may have to specify how big this field is. Here, I’m being lazy, knowing that it’s SQLite and it doesn’t care.

05:35 The third column is the last_name and then a closing parenthesis and a semicolon to say that I’m done. The table has been created. Doesn’t tell you so, but you can ask.

05:49 .tables is a SQLite command that lists the tables in the database. So far, I’ve only created one. There it is, named author. With a table in place, let’s do something with it.

06:06 SELECT is probably the most-used SQL command, and its purpose is to select some data out of some tables. The star (*) here indicates that I want all of the columns from the table and everything to the right of the FROM is what table to look for data in.

06:21 Why didn’t it do anything? Well, there’s no data in the author table. So there’s nothing to select. Let’s fix that.

06:39 INSERT puts data into a table. Here, I’ve inserted into the author table. The contents of the first parentheses are the columns in the author table being populated. I’m specifying all of them, not counting the primary key, which is automatic.

06:56 But if your table has a column that allows empty values or has a default, you don’t necessarily have to specify it. The content to the right of the VALUES keyword matches the tuples specifying the column names.

07:10 So 'Isaac' is the first name, and 'Asimov' the last name. No response from this SQL statement? That’s good. That means it works.

07:20 Let me just insert one more … and now I’ve got two authors. I’ll rerun the SELECT.

07:36 And there’s the results. The pipe character (|) here separates the columns in the answer. Isaac has primary key 1, while Pearl has primary key 2.

07:47 The data doesn’t usually line up like this. It’s just a coincidence that Isaac and Pearl have the same number of letters. Let’s insert someone else.

08:03 And selecting again… See what I mean about the data not lining up? Oh, oops. It isn’t Tim. It’s Tom. Let’s do something about that.

08:22 The UPDATE statement allows you change a row. The SET part says what fields to change, while the WHERE clause indicates those rows that should be changed.

08:32 I’m being very specific here by using a primary key in the query. An update can apply to multiple rows by having a more generic WHERE clause.

08:41 I could do something like WHERE author_id>1 and both Pearl and Tim would become Tom. And the update worked.

08:53 Tom it is. Let’s try something more specific than SELECT *.

09:03 That’s all the first names. This gives you a fair amount of flexibility. The output doesn’t have to be in the same order as the table columns. Let me put Tim back in there.

09:27 There he is. And now I’ll fix it.

09:36 The DELETE statement allows you to remove some data. Like with the UPDATE, the WHERE clause specifies what rows are acted upon, which means both Tom and Tim are gone.

09:52 Authors alone are boring. Let’s add some books.

10:14 Up until now, I’ve been sticking to the SQL convention of using all caps. SQL doesn’t actually care. I’m lazy and a Python developer, so I probably won’t keep doing that.

10:24 The colorization here is a feature of the tool I’m using to record this, not part of SQLite, so you might want stick with capitals to make it easier to read.

10:33 The highlighting here helps identify the keywords without having to get a cramp from holding the shift key. Using .tables shows that I now have both author and book tables.

10:46 Note the foreign key in the book table specified by using the references keyword. author_id integer references author says that the book table contains a foreign key to the table author. Now I’ll add a book.

11:12 And there it is. The first one is the primary key of the book. The second one is the foreign key to the author. I had to set it to Asimov’s primary key. This establishes the relationship between this book row and the Asimov row in the author table.

11:32 This would be the relational part of the relational database. I’ll add another book.

11:48 And there they are. Now let’s do something fancier. I’ll be selecting from multiple tables. This is called a join. When you select multiple tables, you can alias the table.

12:08 This line uses a as an alias for the author table. You’ll see in a second where that gets declared. The double pipe symbol (||) is a text join.

12:17 So the first column of the results of the SELECT will be the author’s first name, then a space, then the author’s last name. The as statement here creates a name for this column.

12:34 I’m still choosing what to output at this point. This is the second column, which will be the title of the book. Now I’m telling SELECT where to get the a info—that’s the author table—and to join that with the book table, known as b in the query.

12:56 The on clause specifies how to correlate the data between the tables participating in the join. In this case, I’m saying to find all the books and match their author IDs with all the authors.

13:14 The end result is some data in tabular format with the author’s name in the first column and their book in the second column. Think back to the CSV file in the previous lesson. Ignoring the publisher part, you now have enough knowledge to create a SELECT statement with a join that could produce the data you found in the CSV. That’s enough for now.

13:37 To leave the SQLite command-line tool, you can press Control + D. You’ve had a whirlwind intro to SQL. Next up, I’ll add some Python to your Python course and use SQLAlchemy to do some SQL in your Python code.

Become a Member to join the conversation.