Reviewing SQL

Christopher Trudeau

Starting With DuckDB and Python Christopher Trudeau 07:53

Transcript
Discussion

00:00 In the previous lesson, I gave an overview of the course. In this lesson, I’ll give a quick overview of the SQL language used by relational databases.

00:09 SQL is short for Structured Query Language, and sometimes gets pronounced SQL and sometimes sequel. I’m inconsistent myself, so if one or the other bothers you, I’ll likely annoy you one way or the other.

00:21 In this course, it’s a domain-specific language for relational databases. It’s quite in-depth, but you can get fairly far with just some basics. The create, read, update, and delete, that’s CRUD operations, are used to manipulate data in the database, or can be used to manipulate the database itself, like managing tables and rules, etc.

00:46 SQL is a declarative language that makes it a little different from Python, which is mostly a procedural language. Instead of specifying the steps you want of an operation, you declare the result you’re looking for and the execution of that statement produces your desired outcome.

01:02 With simpler statements, this is kind of an academic thing. With more complex statements, it can get a little hairy.

01:08 SQL has a lot of dialects. The basic stuff I’ll be using in this course will work in pretty much every database out there, but more advanced SQL tends to have flavors, and you need to be careful.

01:19 Just because you know how to talk to SQLite doesn’t mean that Oracle will listen to you that exact same way.

01:27 The fundamental unit of a relational database is a table. The table is like a spreadsheet consisting of rows and columns of data. Unlike your spreadsheet, each column in a database table has a data type.

01:39 If you try to put the wrong kind of thing in a column, you’ll get an error. That’s actually a good thing that prevents some of the sloppiness you can encounter in a spreadsheet.

01:49 A database can contain more than one table, and the actions you perform can affect multiple tables. There are four basic operations that are common to all dialects of SQL.

02:00 These are the CRUD operations for the data in a database. SELECT is for reading data, INSERT is for creating data, UPDATE is for updating, and DELETE is for solving complex mathematical problems or deleting data.

02:15 I can’t remember which. As I mentioned in the overview, I’m going to stick with SELECT for the purpose of this course, so I’ll show you a few variations on it to get us going.

02:26 Consider the table on the screen here. It’s named _actors_, and has three columns. The first and last_name columns contain strings, while the age column contains a number.

02:35 Different databases have different ways of storing strings, and most of them don’t call it a string, but I’ll stick with that terminology for now. The _actors_ table currently has three rows specifying three different actors. Inside of a database shell or in a program, you could execute a SELECT statement to read rows from the database.

02:57 The SELECT command needs two things: the name of the table to read from, and the name of those columns you wish to get data out of. Let me show you an example.

03:10 It’s common practice in SQL to capitalize the keywords, although in most databases, this isn’t necessary. Our statement here has two keywords: SELECT to tell the database I want to read something, and the FROM keyword that specifies which table I want to read that something from.

03:27 The last_name and age values are the names of the columns that I want the query to return.

03:33 The end result is three rows of data corresponding to the three rows in the database table. Depending on the tool you’re using, the result can be formatted in different ways.

03:42 Most command-line database REPLs show comma-separated values. They may or may not show the column names that go with this. In some databases, it’s an option you can turn on to control the result.

03:54 If you were going to do the same thing in a programming language, the database API would return the result in some sort of object format, possibly a tuple of tuples or something more complex like an iterable.

04:08 When selecting data from a table, it’s quite common to want all the columns. To save you typing, SQL has a shortcut. You can use the star character in place of the column names.

04:20 Let me show you what that looks like.

04:25 Same table as before, but this time because I used the wildcard, the result contains all three columns from the table.

04:34 Most of the time you only want specific rows, not everything out of a table. You constrain which rows you want using the WHERE clause. A WHERE clause specifies a comparison operation similar to an if statement.

04:47 Only rows that meet the criteria in the WHERE get returned in the results. Here’s an example. Note that you’re still specifying the columns and table for this SELECT, but this time you’re adding a WHERE component.

05:03 The criteria for this WHERE says any row where the age is greater than 50 should be returned.

05:09 Note in all three examples so far that the SQL statement has ended in a semicolon. Some databases don’t care about this and will figure it out if you leave it off, but most can be finicky about it.

05:20 It’s best practice to remember it. You can mumble under your breath throughout the rest of this course as I ignore best practice because DuckDB doesn’t care.

05:30 I’ll only be using the SELECT statement in this course, but I’d be remiss if I didn’t show you the CUD part of CRUD. See how well-behaved I am?

05:38 I could have stuck a cow sound effect after that last bit. An INSERT statement creates a new row in a table. Like with SELECT, you have to specify what table you’re inserting into.

05:49 This statement takes three keywords: INSERT INTO and VALUES. INSERT INTO always gets used together. You see, the folks who invented SQL wanted to make it read like a natural language.

06:01 I kid you not. They thought that non-programmers would use this stuff. In a later lesson, when I show you a table join, you’ll understand why that’s funny.

06:10 Within the INSERT statement, after the table name, you specify what columns your new data will be going into. This might seem like overkill because the order of columns in the table are specific, but some columns support default values, so you could insert without specifying everything. After the VALUES keyword is another grouping.

06:29 This one contains the actual data. Now, having done it here, but there are ways of inserting multiple rows at a time as well.

06:37 To change an existing row in the database, you use the UPDATE statement. This statement also has three keywords in it. Like with other statements, you specify what table you’re updating first and then the SET keyword specifies what change you’ll be making.

06:53 In this case, it’s the spelling of Skarsgård with the correct accent over the a.

06:58 Then like with a SELECT, you use a WHERE clause to specify which rows get updated. Most of the time, this is a single row, but it doesn’t have to be.

07:08 If there were multiple Skarsgårds in our database, all of them would be affected by this statement.

07:14 And finally, the D in CRUD is for DELETE. DELETE FROM is like INSERT INTO, and the WHERE clause specifies the criteria used to determine which rows to delete.

07:25 Like with the UPDATE example, if there were multiple Lunas, they’d all be removed.

07:30 Most of the time, your table will have a sequence column. Think of it like a counter that’s automatically updated by the database. This gives each row a unique ID, making the UPDATE and DELETE clauses easier to write as the WHERE can specify the ID of a single row if that’s what you intend.

07:48 Next up, what you came here for. Let the duck puns begin.

Become a Member to join the conversation.