MongoDB is a document-oriented and NoSQL database solution that provides great scalability and flexibility along with a powerful querying system. With MongoDB and Python, you can develop many different types of database applications quickly. So if your Python application needs a database that’s just as flexible as the language itself, then MongoDB is for you.
In this tutorial, you’ll learn:
- What MongoDB is
- How to install and run MongoDB
- How to work with MongoDB databases
- How to use the low-level PyMongo driver to interface with MongoDB
- How to use the high-level MongoEngine object-document mapper (ODM)
Throughout this tutorial, you’ll write a couple of examples that will demonstrate the flexibility and power of MongoDB and its great Python support. To download the source code for those examples, click the link below:
Get the Source Code: Click here to get the source code you’ll use to learn about using MongoDB with Python in this tutorial.
Using SQL vs NoSQL Databases
For decades, SQL databases were one of the only choices for developers looking to build large and scalable database systems. However, the increasing need for storing complex data structures led to the birth of NoSQL databases. This new kind of database system allows developers to store heterogeneous and structureless data efficiently.
In general, NoSQL database systems store and retrieve data in a much different way from SQL relational database management systems (RDBMSs).
When it comes to choosing from the currently available database technologies, you might need to decide between using SQL or NoSQL systems. Both of them have specific features that you should consider when choosing one or the other. Here are some of their more substantial differences:
Property | SQL Databases | NoSQL Databases |
---|---|---|
Data model | Relational | Nonrelational |
Structure | Table-based, with columns and rows | Document based, key-value pairs, graph, or wide-column |
Schema | A predefined and strict schema in which every record (row) is of the same nature and possesses the same properties | A dynamic schema or schemaless which means that records don’t need to be of the same nature |
Query language | Structured Query Language (SQL) | Varies from database to database |
Scalability | Vertical | Horizontal |
ACID transactions | Supported | Supported, depending on the specific NoSQL database |
Ability to add new properties | Need to alter the schema first | Possible without disturbing anything |
There are many other differences between the two types of databases, but those mentioned above are some of the more important ones to know about.
When choosing a database, you should consider its strengths and weaknesses carefully. You also need to consider how the database fits into your specific scenario and your application’s requirements. Sometimes the right solution is to use a combination of SQL and NoSQL databases to handle different aspects of a broader system.
Some common examples of SQL databases include:
NoSQL database examples include:
In recent years, SQL and NoSQL databases have even begun to merge. For example, database systems, such as PostgreSQL, MySQL, and Microsoft SQL Server now support storing and querying JSON data, much like NoSQL databases. With this, you can now achieve many of the same results with both technologies. But you still don’t get many of the NoSQL features, such as horizontal scaling and the user-friendly interface.
With this brief background on SQL and NoSQL databases, you can focus on the main topic of this tutorial: the MongoDB database and how to use it in Python.
Managing NoSQL Databases With MongoDB
MongoDB is a document-oriented database classified as NoSQL. It’s become popular throughout the industry in recent years and integrates extremely well with Python. Unlike traditional SQL RDBMSs, MongoDB uses collections of documents instead of tables of rows to organize and store data.
MongoDB stores data in schemaless and flexible JSON-like documents. Here, schemaless means that you can have documents with a different set of fields in the same collection, without the need for satisfying a rigid table schema.
You can change the structure of your documents and data over time, which results in a flexible system that allows you to quickly adapt to requirement changes without the need for a complex process of data migration. However, the trade-off in changing the structure of new documents is that exiting documents become inconsistent with the updated schema. So this is a topic that needs to be managed with care.
Note: JSON stands for JavaScript Object Notation. It’s a file format with a human-readable structure consisting of key-value pairs that can be nested arbitrarily deep.
MongoDB is written in C++ and actively developed by MongoDB Inc. It runs on all major platforms, such as macOS, Windows, Solaris, and most Linux distributions. In general, there are three main development goals behind the MongoDB database:
- Scale well
- Store rich data structures
- Provide a sophisticated query mechanism
MongoDB is a distributed database, so high availability, horizontal scaling, and geographic distribution are built into the system. It stores data in flexible JSON-like documents. You can model these documents to map the objects in your applications, which makes it possible to work with your data effectively.
MongoDB provides a powerful query language that supports ad hoc queries, indexing, aggregation, geospatial search, text search, and a lot more. This presents you with a powerful tool kit to access and work with your data. Finally, MongoDB is freely available and has great Python support.
Reviewing MongoDB’s Features
So far, you’ve learned what MongoDB is and what its main goals are. In this section, you’ll learn about some of MongoDB’s more important features. As for the database management side, MongoDB offers the following features:
- Query support: You can use many standard query types, such as matching (
==
), comparison (<
,>
), and regular expressions. - Data accommodation: You can store virtually any kind of data, be it structured, partially structured, or even polymorphic.
- Scalability: It handles more queries just by adding more machines to the server cluster.
- Flexibility and agility: You can develop applications with it quickly.
- Document orientation and schemalessness: You can store all the information regarding a data model in a single document.
- Adjustable schema: You can change the schema of the database on the fly, which reduces the time needed to provide new features or fix existing problems.
- Relational database functionalities: You can perform actions common to relational databases, like indexing.
As for the operations side, MongoDB provides a few tools and features that you won’t find in other database systems:
- Scalability: Whether you need a stand-alone server or complete clusters of independent servers, you can scale MongoDB to whatever size you need it to be.
- Load-balancing support: MongoDB will automatically move data across various shards.
- Automatic failover support: If your primary server goes down, then a new primary will be up and running automatically.
- Management tools: You can track your machines using the cloud-based MongoDB Management Service (MMS).
- Memory efficiency: Thanks to the memory-mapped files, MongoDB is often more efficient than relational databases.
All these features are quite useful. For example, if you take advantage of the indexing feature, then much of your data will be kept in memory for quick retrieval. Even without indexing specific document keys, MongoDB caches quite a bit of data using the least recently used technique.
Installing and Running MongoDB
Now that you’re familiar with MongoDB, it’s time to get your hands dirty and start using it. But first, you need to install it on your machine. MongoDB’s official site provides two editions of the database server:
- The community edition offers the flexible document model along with ad hoc queries, indexing, and real-time aggregation to provide powerful ways to access and analyze your data. This edition is freely available.
- The enterprise edition offers the same features as the community edition, plus other advanced features related to security and monitoring. This is the commercial edition, but you can use it free of charge for an unlimited time for evaluation and development purposes.
If you’re on Windows, then you can read through the installation tutorial for complete instructions. In general, you can go to the download page, select the Windows platform in the Available Downloads box, choose the .msi
installer that fits your current system, and click Download.
Run the installer and follow the on-screen instructions on the installation wizard. This page also provides information on how to run MongoDB as a Windows service.
If you’re on macOS, then you can use Homebrew to install MongoDB on your system. See the installation tutorial to get the complete guide. Also, make sure to follow the instruction to run MongoDB as a macOS service.
If you’re on Linux, then the installation process will depend on your specific distribution. For a detailed guide on how to install MongoDB on different Linux systems, go to the installation tutorial page and select the tutorial that matches your current operating system. Make sure you run the MongoDB daemon, mongod
, at the end of the installation.
Finally, you can also install MongoDB using Docker. This is handy if you don’t want to clutter your system with another installation. If you prefer this installation option, then you can read through the official tutorial and follow its directions. Note that previous knowledge of how to use Docker would be required in this case.
With the MongoDB database installed and running on your system, you can start working with real databases using the mongo
shell.
Creating MongoDB Databases With the mongo
Shell
If you’ve followed the installation and running instructions, then you should already have an instance of MongoDB running on your system. Now you can start creating and testing your own databases. In this section, you’ll learn how to use the mongo
shell to create, read, update, and delete documents on a database.
Running the mongo
Shell
The mongo
shell is an interactive JavaScript interface to MongoDB. You can use this tool to query and manipulate your data as well as to perform administrative operations. Since it’s a JavaScript interface, you won’t use the familiar SQL language to query the database. Instead, you’ll use JavaScript code.
To launch the mongo
shell, open your terminal or command line and run the following command:
$ mongo
This command takes you to the mongo
shell. At this point, you’ll probably see a bunch of messages with information on the shell’s version and on the sever address and port. Finally, you’ll be presented with the shell prompt (>
) to enter queries and commands.
You can pass the database address as an argument to the mongo
command. You can also use several options, such as specifying the host and port to access a remote database, and so on. For more details on how to use the mongo
command, you can run mongo --help
.
Establishing a Connection
When you run the mongo
command without arguments, it launches the shell and connects to the default local server provided by the mongod
process at mongod://127.0.0.1:27017
. This means you’re connected to the local host through port 27017
.
By default, the mongo
shell starts the session by establishing a connection to the test
database. You can access the current database through the db
object:
> db
test
>
In this case, db
holds a reference to test
, which is the default database. To switch databases, issue the command use
, providing a database name as an argument.
For example, say you want to create a website to publish Python content, and you’re planning to use MongoDB to store your tutorials and articles. In that case, you can switch to the site’s database with the following command:
> use rptutorials
switched to db rptutorials
This command switches your connection to the rptutorials
database. MongoDB doesn’t create the physical database file on the file system until you insert real data into the database. So in this case, rptutorials
won’t show up in your current database list:
> show dbs
admin 0.000GB
config 0.000GB
local 0.000GB
>
The mongo
shell provides a lot of features and options. It allows you to query and manipulate your data and also to manage the database server itself.
Instead of using a standardized query language such as SQL, the mongo
shell uses the JavaScript programming language and a user-friendly API. This API allows you to play around with your data, which is the topic for the next section.
Creating Collections and Documents
A MongoDB database is a physical container for collections of documents. Each database gets its own set of files on the file system. These files are managed by the MongoDB server, which can handle several databases.
In MongoDB, a collection is a group of documents. Collections are somewhat analogous to tables in a traditional RDBMS, but without imposing a rigid schema. In theory, each document in a collection can have a completely different structure or set of fields.
In practice, documents in a collection commonly share a similar structure to allow uniform retrieval, insertion, and update processes. You can enforce a uniform document structure by using document validation rules during updates and insertions.
Allowing different document structures is a key feature of MongoDB collections. This feature provides flexibility and allows adding new fields to documents without the need for modifying a formal table schema.
To create a collection using the mongo
shell, you need to point db
to your target database and then create the collections using the dot notation:
> use rptutorials
switched to db rptutorials
> db
rptutorials
> db.tutorial
rptutorials.tutorial
In this example, you use the dot notation to create tutorial
as a collection in rptutorials
, which is your current database. It’s important to note that MongoDB creates databases and collections lazily. In other words, they’re physically created only after you insert the first document.
Once you have a database and a collection, you can start inserting documents. Documents are the unit of storage in MongoDB. In an RDBMS, this would be equivalent to a table row. However, MongoDB’s documents are way more versatile than rows because they can store complex information, such as arrays, embedded documents, and even arrays of documents.
MongoDB stores documents in a format called Binary JSON (BSON), which is a binary representation of JSON. MongoDB’s documents are composed of field-and-value pairs and have the following structure:
{
field1 → value1,
field2 → value2,
field3 → value3,
...
fieldN → valueN
}
The value of a field can be any BSON data type, including other documents, arrays, and arrays of documents. In practice, you’ll specify your documents using the JSON format.
When you’re building a MongoDB database application, probably your most important decision is about the structure of documents. In other words, you’ll have to decide which fields and values your documents will have.
In the case of the tutorials for your Python site, your documents might be structured like this:
{
"title": "Reading and Writing CSV Files in Python",
"author": "Jon",
"contributors": [
"Aldren",
"Geir Arne",
"Joanna",
"Jason"
],
"url": "https://realpython.com/python-csv/"
}
A document is essentially a set of property names and their values. The values can be simple data types, such as strings and numbers, but they can also be arrays such as contributors
in the above example.
MongoDB’s document-oriented data model naturally represents complex data as a single object. This allows you to work with data objects holistically, without the need for looking at several places or tables.
If you were using a traditional RDBMS to store your tutorials, then you would probably have a table to store your tutorials and another table to store your contributors. Then you’d have to set up a relationship between both tables so you could retrieve the data later on.
Working With Collections and Documents
So far, you know the basics of how to run and use the mongo
shell. You also know how to create your own documents using the JSON format. Now it’s time to learn how to insert documents into your MongoDB database.
To insert a document into a database using the mongo
shell, you first need to choose a collection and then call .insertOne()
on the collection with your document as an argument:
> use rptutorials
switched to db rptutorials
> db.tutorial.insertOne({
... "title": "Reading and Writing CSV Files in Python",
... "author": "Jon",
... "contributors": [
... "Aldren",
... "Geir Arne",
... "Joanna",
... "Jason"
... ],
... "url": "https://realpython.com/python-csv/"
... })
{
"acknowledged" : true,
"insertedId" : ObjectId("600747355e6ea8d224f754ba")
}
With the first command, you switch to the database you want to use. The second command is a JavaScript method call that inserts a simple document into the selected collection, tutorial
. Once you hit Enter, you get a message on your screen that informs you about the newly inserted document and its insertedId
.
Just like relational databases need a primary key to uniquely identify every row in a table, MongoDB documents need to have an _id
field that uniquely identifies the document. MongoDB allows you to enter a custom _id
as long as you guarantee its uniqueness. However, a widely accepted practice is to allow MongoDB to automatically insert an _id
for you.
Similarly, you can add several documents in one go using .insertMany()
:
> tutorial1 = {
... "title": "How to Iterate Through a Dictionary in Python",
... "author": "Leodanis",
... "contributors": [
... "Aldren",
... "Jim",
... "Joanna"
... ],
... "url": "https://realpython.com/iterate-through-dictionary-python/"
... }
> tutorial2 = {
... "title": "Python 3's f-Strings: An Improved String Formatting Syntax",
... "author": "Joanna",
... "contributors": [
... "Adriana",
... "David",
... "Dan",
... "Jim",
... "Pavel"
... ],
... "url": "https://realpython.com/python-f-strings/"
... }
> db.tutorial.insertMany([tutorial1, tutorial2])
{
"acknowledged" : true,
"insertedIds" : [
ObjectId("60074ff05e6ea8d224f754bb"),
ObjectId("60074ff05e6ea8d224f754bc")
]
}
Here, the call to .insertMany()
takes a list of tutorials and inserts them into the database. Again, the shell output shows information about the newly inserted documents and their automatically added _id
fields.
The mongo
shell also provides methods to perform read, update, and delete operations on the database. For example, you can use .find()
to retrieve the documents in a collection:
> db.tutorial.find()
{ "_id" : ObjectId("600747355e6ea8d224f754ba"),
"title" : "Reading and Writing CSV Files in Python",
"author" : "Jon",
"contributors" : [ "Aldren", "Geir Arne", "Joanna", "Jason" ],
"url" : "https://realpython.com/python-csv/" }
...
> db.tutorial.find({author: "Joanna"})
{ "_id" : ObjectId("60074ff05e6ea8d224f754bc"),
"title" : "Python 3's f-Strings: An Improved String Formatting Syntax (Guide)",
"author" : "Joanna",
"contributors" : [ "Adriana", "David", "Dan", "Jim", "Pavel" ],
"url" : "https://realpython.com/python-f-strings/" }
The first call to .find()
retrieves all the documents in the tutorial
collection. On the other hand, the second call to .find()
retrieves those tutorials that are authored by Joanna.
With this background knowledge on how to use MongoDB through its mongo
shell, you’re ready to start using MongoDB with Python. The next few sections will walk you through different options for using MongoDB databases in your Python applications.
Using MongoDB With Python and PyMongo
Now that you know what MongoDB is and how to create and manage databases using the mongo
shell, you can start using MongoDB, but this time with Python. MongoDB provides an official Python driver called PyMongo.
In this section, you’ll go through some examples that’ll help you get a feeling of how to use PyMongo to create your own database applications with MongoDB and Python.
Each module within PyMongo is responsible for a set of operations on the database. You’ll have modules for at least the following tasks:
- Establishing database connections
- Working with databases
- Working with collections and documents
- Manipulating the cursor
- Working with data encryption
In general, PyMongo provides a rich set of tools that you can use to communicate with a MongoDB server. It provides functionality to query, retrieve results, write and delete data, and run database commands.
Installing PyMongo
To start using PyMongo, you first need to install it in your Python environment. You can use a virtual environment, or you can use your system-wide Python installation, although the first option is preferred. PyMongo is available on PyPI, so the quickest way to install it is with pip
. Fire up your terminal and run the following command:
$ pip install pymongo==3.11.2
After a few downloads and other related steps, this command installs PyMongo on your Python environment. Note that if you don’t supply a specific version number, then pip
will install the latest available version.
Note: For a complete guide on how to install PyMongo, check out the Installing/Upgrading page of its official documentation.
Once you’re done with the installation, you can start a Python interactive session and run the following import:
>>> import pymongo
If this runs without raising an exception in the Python shell, then your installation works just fine. If not, then carefully perform the steps again.
Establishing a Connection
To establish a connection to a database, you need to create a MongoClient
instance. This class provides a client for a MongoDB instance or server. Each client object has a built-in connection pool, which by default handles up to a hundred connections to the server.
Get back to your Python interactive session and import MongoClient
from pymongo
. Then create a client object to communicate with your currently running MongoDB instance:
>>> from pymongo import MongoClient
>>> client = MongoClient()
>>> client
MongoClient(host=['localhost:27017'], ..., connect=True)
The code above establishes a connection to the default host (localhost
) and port (27017
). MongoClient
takes a set of arguments that allows you to specify custom host, port, and other connection parameters. For example, to provide a custom host and port, you can use the following code:
>>> client = MongoClient(host="localhost", port=27017)
This is handy when you need to provide a host
and port
that differ from MongoDB’s default setup. You can also use the MongoDB URI format:
>>> client = MongoClient("mongodb://localhost:27017")
All these instances of MongoClient
provide the same client setup to connect your current MongoDB instance. Which one you should use just depends on how explicit you want to be in your code.
Once you’ve instantiated MongoClient
, you can use its instance to refer to that specific database connection, just like you did with the mongo
shell’s db
object in the above section.
Working With Databases, Collections, and Documents
Once you have a connected instance of MongoClient
, you can access any database managed by the specified MongoDB server. To define which database you want to use, you can use the dot notation just like you did in the mongo
shell:
>>> db = client.rptutorials
>>> db
Database(MongoClient(host=['localhost:27017'], ..., connect=True), 'rptutorials')
In this case, rptutorials
is the name of the database you’ll be working with. If the database doesn’t exist, then MongoDB creates it for you, but only when you perform the first operation on the database.
You can also use dictionary-style access if the name of the database isn’t a valid Python identifier:
>>> db = client["rptutorials"]
This statement is handy when the name of your database isn’t a valid Python identifier. For example, if your database is called rp-tutorials
, then you need to use dictionary-style access.
Note: When you use the mongo
shell, you have access to the database through the db
global object. When you use PyMongo, you can assign the database to a variable called db
to get similar behavior.
Storing data in your database using PyMongo is similar to what you did with the mongo
shell in the above sections. But first, you need to create your documents. In Python, you use dictionaries to create documents:
>>> tutorial1 = {
... "title": "Working With JSON Data in Python",
... "author": "Lucas",
... "contributors": [
... "Aldren",
... "Dan",
... "Joanna"
... ],
... "url": "https://realpython.com/python-json/"
... }
Once you’ve created the document as a dictionary, you need to specify which collection you want to use. To do that, you can use the dot notation on the database object:
>>> tutorial = db.tutorial
>>> tutorial
Collection(Database(..., connect=True), 'rptutorials'), 'tutorial')
In this case, tutorial
is an instance of Collection
and represents a physical collection of documents in your database. You can insert documents into tutorial
by calling .insert_one()
on it with a document as an argument:
>>> result = tutorial.insert_one(tutorial1)
>>> result
<pymongo.results.InsertOneResult object at 0x7fa854f506c0>
>>> print(f"One tutorial: {result.inserted_id}")
One tutorial: 60084b7d87eb0fbf73dbf71d
Here, .insert_one()
takes tutorial1
, inserts it into the tutorial
collection and returns an InsertOneResult
object. This object provides feedback on the inserted document. Note that since MongoDB generates the ObjectId
dynamically, your output won’t match the ObjectId
shown above.
If you have many documents to add to the database, then you can use .insert_many()
to insert them in one go:
>>> tutorial2 = {
... "title": "Python's Requests Library (Guide)",
... "author": "Alex",
... "contributors": [
... "Aldren",
... "Brad",
... "Joanna"
... ],
... "url": "https://realpython.com/python-requests/"
... }
>>> tutorial3 = {
... "title": "Object-Oriented Programming (OOP) in Python 3",
... "author": "David",
... "contributors": [
... "Aldren",
... "Joanna",
... "Jacob"
... ],
... "url": "https://realpython.com/python3-object-oriented-programming/"
... }
>>> new_result = tutorial.insert_many([tutorial2, tutorial3])
>>> print(f"Multiple tutorials: {new_result.inserted_ids}")
Multiple tutorials: [
ObjectId('6008511c87eb0fbf73dbf71e'),
ObjectId('6008511c87eb0fbf73dbf71f')
]
This is faster and more straightforward than calling .insert_one()
multiple times. The call to .insert_many()
takes an iterable of documents and inserts them into the tutorial
collection in your rptutorials
database. The method returns an instance of InsertManyResult
, which provides information on the inserted documents.
To retrieve documents from a collection, you can use .find()
. Without arguments, .find()
returns a Cursor
object that yields the documents in the collection on demand:
>>> import pprint
>>> for doc in tutorial.find():
... pprint.pprint(doc)
...
{'_id': ObjectId('600747355e6ea8d224f754ba'),
'author': 'Jon',
'contributors': ['Aldren', 'Geir Arne', 'Joanna', 'Jason'],
'title': 'Reading and Writing CSV Files in Python',
'url': 'https://realpython.com/python-csv/'}
...
{'_id': ObjectId('6008511c87eb0fbf73dbf71f'),
'author': 'David',
'contributors': ['Aldren', 'Joanna', 'Jacob'],
'title': 'Object-Oriented Programming (OOP) in Python 3',
'url': 'https://realpython.com/python3-object-oriented-programming/'}
Here, you run a loop on the object that .find()
returns and print successive results, using pprint.pprint()
to provide a user-friendly output format.
You can also use .find_one()
to retrieve a single document. In this case, you can use a dictionary that contains fields to match. For example, if you want to retrieve the first tutorial by Jon, then you can do something like this:
>>> import pprint
>>> jon_tutorial = tutorial.find_one({"author": "Jon"})
>>> pprint.pprint(jon_tutorial)
{'_id': ObjectId('600747355e6ea8d224f754ba'),
'author': 'Jon',
'contributors': ['Aldren', 'Geir Arne', 'Joanna', 'Jason'],
'title': 'Reading and Writing CSV Files in Python',
'url': 'https://realpython.com/python-csv/'}
Note that the tutorial’s ObjectId
is set under the _id
key, which is the unique document identifier that MongoDB automatically adds when you insert a document into your database.
PyMongo also provides methods to replace, update, and delete documents from a database. If you want to dive deeper into these features, then take a look at the documentation for Collection
.
Closing Connections
Establishing a connection to a MongoDB database is typically an expensive operation. If you have an application that constantly retrieves and manipulates data in a MongoDB database, then you probably don’t want to be opening and closing the connection all the time since this might affect your application’s performance.
In this kind of situation, you should keep your connection alive and only close it before exiting the application to clear all the acquired resources. You can close the connection by calling .close()
on the MongoClient
instance:
>>> client.close()
Another situation is when you have an application that occasionally uses a MongoDB database. In this case, you might want to open the connection when needed and close it immediately after use for freeing the acquired resources. A consistent approach to this problem would be to use the with
statement. Yes, MongoClient
implements the context manager protocol:
>>> import pprint
>>> from pymongo import MongoClient
>>> with MongoClient() as client:
... db = client.rptutorials
... for doc in db.tutorial.find():
... pprint.pprint(doc)
...
{'_id': ObjectId('600747355e6ea8d224f754ba'),
'author': 'Jon',
'contributors': ['Aldren', 'Geir Arne', 'Joanna', 'Jason'],
'title': 'Reading and Writing CSV Files in Python',
'url': 'https://realpython.com/python-csv/'}
...
{'_id': ObjectId('6008511c87eb0fbf73dbf71f'),
'author': 'David',
'contributors': ['Aldren', 'Joanna', 'Jacob'],
'title': 'Object-Oriented Programming (OOP) in Python 3',
'url': 'https://realpython.com/python3-object-oriented-programming/'}
If you use the with
statement to handle your MongoDB client, then at the end of the with
code block, the client’s .__exit__()
method gets called, which at the same time closes the connection by calling .close()
.
Using MongoDB With Python and MongoEngine
While PyMongo is a great and powerful Python driver for interfacing with MongoDB, it’s probably a bit too low-level for many of your projects. With PyMongo, you’ll have to write a lot of code to consistently insert, retrieve, update, and delete documents.
One library that provides a higher abstraction on top of PyMongo is MongoEngine. MongoEngine is an object-document mapper (ODM), which is roughly equivalent to an SQL-based object-relational mapper (ORM). MongoEngine provides a class-based abstraction, so all the models you create are classes.
Installing MongoEngine
There are a handful of Python libraries to help you work with MongoDB. MongoEngine, however, is a popular one that provides a nice set of features, flexibility, and community support. MongoEngine is available on PyPI. You can install it using the following pip
command:
$ pip install mongoengine==0.22.1
Once you’ve installed MongoEngine into your Python environment, you’re ready to start working with MongoDB databases using Python’s object-oriented features. The next step is to connect to your running MongoDB instance.
Establishing a Connection
To establish a connection with your database, you need to use mongoengine.connect()
. This function takes several arguments. However, in this tutorial, you’ll use only three of them. Within your Python interactive session, type the following code:
>>> from mongoengine import connect
>>> connect(db="rptutorials", host="localhost", port=27017)
MongoClient(host=['localhost:27017'], ..., read_preference=Primary())
Here, you first set the database name db
to "rptutorials"
, which is the name of the database you want to work in. Then you provide a host
and a port
to connect to your current MongoDB instance. Since you’re using the default host
and port
, you can omit these two parameters and just use connect("rptutorials")
.
Working With Collections and Documents
To create documents with MongoEngine, you first need to define what data you want the documents to have. In other words, you need to define a document schema. MongoEngine encourages you to define a document schema to help you reduce coding errors and to allow you to define utility or helper methods.
Similar to ORMs, ODMs like MongoEngine provide a base or model class for you to define a document schema. In ORMs, that class is equivalent to a table, and its instances are equivalent to rows. In MongoEngine, the class is equivalent to a collection, and its instances are equivalent to documents.
To create a model, you need to subclass Document
and provide the required fields as class attributes. To continue with the blog example, here’s how you can create a model for your tutorials:
>>> from mongoengine import Document, ListField, StringField, URLField
>>> class Tutorial(Document):
... title = StringField(required=True, max_length=70)
... author = StringField(required=True, max_length=20)
... contributors = ListField(StringField(max_length=20))
... url = URLField(required=True)
With this model, you tell MongoEngine that you expect a Tutorial
document to have a .title
, an .author
, a list of .contributors
, and a .url
. The base class, Document
, uses that information along with the field types to validate the input data for you.
Note: One of the more difficult tasks with database models is data validation. How do you make sure that the input data conforms to your format requirements? That’s one of the reasons for you to have a coherent and uniform document schema.
MongoDB is said to be a schemaless database, but that doesn’t mean it’s schema free. Having documents with a different schema within the same collection can lead to processing errors and inconsistent behavior.
For example, if you try to save a Tutorial
object without a .title
, then your model throws an exception and lets you know. You can take this even further and add more restrictions, such as the length of the .title
, and so on.
There are a few general parameters that you can use to validate fields. Here are some of the more commonly used parameters:
db_field
specifies a different field name.required
ensures that the field is provided.default
provides a default value for a given field if no value is given.unique
ensures that no other document in the collection has the same value for this field.
Each specific field type also has its own set of parameters. You can check the documentation for a complete guide to the available field types.
To save a document to your database, you need to call .save()
on a document object. If the document already exists, then all the changes will be applied to the existing document. If the document doesn’t exist, then it’ll be created.
Here’s an example of creating and saving a tutorial into your sample tutorials database:
>>> tutorial1 = Tutorial(
... title="Beautiful Soup: Build a Web Scraper With Python",
... author="Martin",
... contributors=["Aldren", "Geir Arne", "Jaya", "Joanna", "Mike"],
... url="https://realpython.com/beautiful-soup-web-scraper-python/"
... )
>>> tutorial1.save() # Insert the new tutorial
<Tutorial: Tutorial object>
By default, .save()
inserts the new document into a collection named after the model class, Tutorial
, except using lowercase letters. In this case, the collection name is tutorial
, which matches the collection you’ve been using to save your tutorials.
PyMongo performs data validation when you call .save()
. This means that it checks the input data against the schema you declared in the Tutorial
model class. If the input data violates the schema or any of its constraints, then you get an exception, and the data isn’t saved into the database.
For example, here’s what happens if you try to save a tutorial without providing a .title
:
>>> tutorial2 = Tutorial()
>>> tutorial2.author = "Alex"
>>> tutorial2.contributors = ["Aldren", "Jon", "Joanna"]
>>> tutorial2.url = "https://realpython.com/convert-python-string-to-int/"
>>> tutorial2.save()
Traceback (most recent call last):
...
mongoengine.errors.ValidationError: ... (Field is required: ['title'])
In this example, first note that you can also build a Tutorial
object by assigning values to its attributes. Second, since you don’t provide a .title
for the new tutorial, .save()
raises a ValidationError
telling you that the .title
field is required. Having automatic data validation is a great feature that will save you some headaches.
Each Document
subclass has an .objects
attribute that you can use to access the documents in the associated collection. For example, here’s how you can print the .title
of all your current tutorials:
>>> for doc in Tutorial.objects:
... print(doc.title)
...
Reading and Writing CSV Files in Python
How to Iterate Through a Dictionary in Python
Python 3's f-Strings: An Improved String Formatting Syntax (Guide)
Working With JSON Data in Python
Python's Requests Library (Guide)
Object-Oriented Programming (OOP) in Python 3
Beautiful Soup: Build a Web Scraper With Python
The for
loop iterates over all your tutorials and prints their .title
data to the screen. You can also use .objects
to filter your documents. For example, say you want to retrieve the tutorials authored by Alex. In that case, you can do something like this:
>>> for doc in Tutorial.objects(author="Alex"):
... print(doc.title)
...
Python's Requests Library (Guide)
MongoEngine is well suited to manage your MongoDB databases for just about any type of application. Its features make it ideal for creating efficient and scalable programs using a high-level approach. If you’re looking for more information about MongoEngine, be sure to check out its user guide.
Conclusion
If you need a robust, scalable, and flexible database solution, then MongoDB might be a good option for you. MongoDB is a mature and popular NoSQL database with great Python support. With a good understanding of how to access MongoDB with Python, you’ll be ready to create database applications that scale well and provide excellent performance.
With MongoDB, you also have the benefit of a human-readable and highly-flexible data model, so you can adapt to requirement changes quickly.
In this tutorial, you learned:
- What MongoDB and NoSQL databases are
- How to install and run MongoDB on your system
- How to create and work with MongoDB databases
- How to interface with MongoDB in Python using the PyMongo driver
- How to use the MongoEngine object-document mapper to work with MongoDB
The examples you coded in this tutorial are available for download. To get their source code, click the link below:
Get the Source Code: Click here to get the source code you’ll use to learn about using MongoDB with Python in this tutorial.