Many-to-Many Relationships
00:00 In the previous lesson, I showed you the ORM module in SQLAlchemy. In this lesson, I’ll introduce some more complicated relationships to your data. So far, you’ve seen foreign keys used to create one-to-many relationships.
00:16 The key on the book row in the database points to the author, allowing many books to be associated with a single author. This kind of relationship is pretty common, but not the only kind. What if you wanted a book to have multiple authors? That would require a many-to-many relationship, and this is accomplished with a secondary table.
00:38 This new table contains nothing but mappings between objects and is usually made up of only sets of foreign keys. To illustrate this concept, I’m going to introduce some new information into the database: the publisher.
00:54 Each book publisher has a relationship with multiple books and multiple authors. Before showing you a many-to-many relationship, I’m going to take a quick detour into ERDs, or entity relationship diagrams.
01:08
These are a way of visualizing the tables and intertable relationships in a database. There are many tools out there that can help you draw these diagrams, most of which will even generate the corresponding SQL CREATE
statements.
01:22
This diagram illustrates the author
and book
tables being used in the previous lesson. The author has an ID field and the first and last name fields.
01:33 The book has an ID field, a title, and a foreign key to the author table. The line between the boxes indicates where the book’s foreign key points to. Let’s use an ERD and add the publisher relationship.
01:51
On the right-hand side here, you have the same author and book tables as I just showed you, while on the left-hand side, I’ve added the publisher. The publisher
table is where the name of the publisher is stored.
02:03
There’s a single row in this table for each publisher. If you were only interested in the publisher-to-book relationship, you could add a foreign key on book
that pointed to the publisher, but then you’d not be able to easily query all the authors that belonged to a publisher.
02:21
Instead, I’ve created two secondary tables for the many-to-many relationship. The author_publisher
table represents the relationship between publishers and authors, while the book_publisher
table maps the relationship between the books and the publishers.
02:39 Both these many-to-many tables consist of just the foreign key relationships. Most ORMs will provide additional abstractions with these kinds of relationships to make it easier to go back and forth between your interrelated objects.
02:55 Let’s go see how SQLAlchemy does just that. SQLAlchemy can get a little finicky about having relationships between models in different files. It is possible, but the code needs some extra steps.
03:10 Likewise, it can handle changes to tables, and there are third-party libraries to help you manage with this, but it can get a bit messy. In order to keep the code clearer, as I just want to teach this to you, this example is going to use a new database.
03:26
I’m still going to be talking about authors and books, and the objects are going to be similar, but they’re going to be new ones. You won’t be using models.py
in this lesson, but published.py
instead.
03:38
And that’s what’s on the screen now, publish.py
. The import
lines are similar to before, and like before, I’m instantiating a base class to be used for inheritance.
03:50
The first difference is the manual creation of the author_publisher
and book_publisher
tables. Instead of using an object to abstract them, I’m just directly creating a table object.
04:02 You don’t need objects explicitly for the many-to-many relationships, as they’ll described within the ORM object themselves. But you do need to have the tables to be order to store the data. Let me just scroll down here.
04:20
This is the new version of the Author
class. The column declarations are the same as before, and you still have the back reference to the book
foreign key.
04:30 What has been added is a new relationship for the many-to-many table with the publisher. Notice that specifying this kind of relationship is slightly different.
04:40
The backwards reference for a foreign key uses backref
, while the many-to-many relationship requires two things: the argument named secondary
that specifies the table where the relationship will be stored, and the argument named back_populates
to indicate the name of the back reference on the corresponding object. In this case, it is saying there will be an .authors
attribute on the Publisher
object.
05:06 I’ll scroll down a bit more.
05:13
Here’s the Book
object, same kind of thing going on here. It has a new publisher’s relationship attribute declared that links the book_publisher
table and defines the name of the relationship for the Publisher
object. Scrolling down a little more.
05:31
And finally, this is the Publisher
object. This represents the actual publisher, and so like our other two ORM objects, it has its own key, and this time a name. It then declares a relationship to the author and book objects through the use of the secondary tables defined at the top of the file. There you have it: three ORM objects, a one-to-many relationship between authors and books, and two many-to-many relationships between the publisher and the author and the publisher and the books. Let’s go into the REPL and play with this.
06:15 This should be familiar.
06:27
And as a reminder, I’m using a brand-new database here, so the URL is specifying a different file. I’m creating published.db
. And the key part of that last sentence was create
.
06:44
SQLAlchemy will generate the necessary create
statements for your database if you want. You can do that by getting the base class object that is the parent of each of your ORM objects, where everything has been registered when you inherited from them.
07:00
I’ve imported it from the published
module here so I can create the tables.
07:10
Ooh, that was a lot. Let me just back up here. Because everything inherits from Base
, and if you remember, even the table objects took Base
’s metadata as an argument, Base
knows about all the things that need to be created. Using .create_all()
goes through all of the registered objects and issues create
statements for each one. When I scroll back down here, you’ll see the tail end of several create
statements in the debug for the book
and book_publisher
tables. Isn’t that beautiful?
07:44 If you’re careful about your object declarations, you never have to write a single line of SQL, not even to get your tables created. Okay, with the tables in place, let’s do some querying.
08:02 Like before, I still need a session.
08:15 And also like before, I need to reference the ORM objects themselves.
08:27
I’ve instantiated a new Author
object. I haven’t put it in the database yet. Calling .add()
adds it to the database.
08:41
Calling .commit()
makes the addition permanent. Like in the previous lesson, I can use the .books
relationship on king
to see his books.
08:52
It’s empty because I haven’t added any yet. You’ll probably recall that you can add a book using the .append()
method of the .books
relationship. Let me do that.
09:08 And of course it still doesn’t have IDs, because I haven’t committed yet. If I want to add a publisher, I do something similar as to how I added an author.
09:26
And like with the author relationship, the publisher’s .books
relationship is also empty. Nothing’s been added yet.
09:38
Like adding a book to king
, I add a relationship to the publisher using the .append()
method. This time, though, instead of creating a new object, like I did with Author
, I’m using the actual book that was already created and is listed inside of king
—his first book, well the first book in this database.
09:58 You get what I mean. SQLAlchemy takes care of all the keys to make sure everything points to each other correctly and there’s only one instance of the book.
10:10 Looking at the publisher’s books, you can now see a scary clown hiding down in the sewer. Running the commit … closes off the transaction, and now when I look at the list of books, the IDs have been set. Let’s add another publisher.
10:41 Turns out that King’s It was published both by Random House and by one of its subsidiaries, Penguin. I’m capturing that info here by using the many-to-many relationship.
10:53 Let’s finish this off by committing it.
11:00
And there you go. Penguin now points to the same book, and king
only has one copy of the book, as it should be. Think back to where all this got started, the CSV file. In the CSV file, this same information was captured through repetition.
11:19 An error in the author’s name would mean changing multiple lines, and there would be no way to distinguish between two authors with the same name. Here, Each thing in the real world gets a single object, whether that’s a person, a book, or a publishing house, and the relationships between them are handled correctly.
11:42
As a reference was defined on the Book
object, you can deep dive from king
to his book to that book’s list of publishers. Likewise, you can go the other direction and see the authors associated with a publisher.
11:59 Did you expect that to be empty? Unfortunately, SQL doesn’t know that we want this relationship populated and that it’s cross-referencing something. In your code, you would be responsible for creating both parts of this relationship.
12:13
Your best bet would be to have a method on the Publisher
object that took a book to add and would properly tie in the author if they weren’t already there.
12:22 I’ll leave that to you as an exercise. That’s teacher speak for I was too lazy to do it myself. Wow, you’ve covered a lot. Last up, I’ll summarize the course and point you at some places where you can learn some more.
Become a Member to join the conversation.