Introduction to MongoDB and Python

Python and MongoDB: Connecting to NoSQL Databases

by Leodanis Pozo Ramos Mar 01, 2021 databases intermediate

MongoDB is a document-oriented and NoSQL database solution that provides great scalability and flexibility along with a powerful querying system. With MongoDB and Python, you can develop many different types of database applications quickly. So if your Python application needs a database that’s just as flexible as the language itself, then MongoDB is for you.

In this tutorial, you’ll learn:

  • What MongoDB is
  • How to install and run MongoDB
  • How to work with MongoDB databases
  • How to use the low-level PyMongo driver to interface with MongoDB
  • How to use the high-level MongoEngine object-document mapper (ODM)

Throughout this tutorial, you’ll write a couple of examples that will demonstrate the flexibility and power of MongoDB and its great Python support. To download the source code for those examples, click the link below:

Using SQL vs NoSQL Databases

For decades, SQL databases were one of the only choices for developers looking to build large and scalable database systems. However, the increasing need for storing complex data structures led to the birth of NoSQL databases. This new kind of database system allows developers to store heterogeneous and structureless data efficiently.

In general, NoSQL database systems store and retrieve data in a much different way from SQL relational database management systems (RDBMSs).

When it comes to choosing from the currently available database technologies, you might need to decide between using SQL or NoSQL systems. Both of them have specific features that you should consider when choosing one or the other. Here are some of their more substantial differences:

Property SQL Databases NoSQL Databases
Data model Relational Nonrelational
Structure Table-based, with columns and rows Document based, key-value pairs, graph, or wide-column
Schema A predefined and strict schema in which every record (row) is of the same nature and possesses the same properties A dynamic schema or schemaless which means that records don’t need to be of the same nature
Query language Structured Query Language (SQL) Varies from database to database
Scalability Vertical Horizontal
ACID transactions Supported Supported, depending on the specific NoSQL database
Ability to add new properties Need to alter the schema first Possible without disturbing anything

There are many other differences between the two types of databases, but those mentioned above are some of the more important ones to know about.

When choosing a database, you should consider its strengths and weaknesses carefully. You also need to consider how the database fits into your specific scenario and your application’s requirements. Sometimes the right solution is to use a combination of SQL and NoSQL databases to handle different aspects of a broader system.

Some common examples of SQL databases include:

NoSQL database examples include:

In recent years, SQL and NoSQL databases have even begun to merge. For example, database systems, such as PostgreSQL, MySQL, and Microsoft SQL Server now support storing and querying JSON data, much like NoSQL databases. With this, you can now achieve many of the same results with both technologies. But you still don’t get many of the NoSQL features, such as horizontal scaling and the user-friendly interface.

With this brief background on SQL and NoSQL databases, you can focus on the main topic of this tutorial: the MongoDB database and how to use it in Python.

Managing NoSQL Databases With MongoDB

MongoDB is a document-oriented database classified as NoSQL. It’s become popular throughout the industry in recent years and integrates extremely well with Python. Unlike traditional SQL RDBMSs, MongoDB uses collections of documents instead of tables of rows to organize and store data.

MongoDB stores data in schemaless and flexible JSON-like documents. Here, schemaless means that you can have documents with a different set of fields in the same collection, without the need for satisfying a rigid table schema.

You can change the structure of your documents and data over time, which results in a flexible system that allows you to quickly adapt to requirement changes without the need for a complex process of data migration. However, the trade-off in changing the structure of new documents is that exiting documents become inconsistent with the updated schema. So this is a topic that needs to be managed with care.

MongoDB is written in C++ and actively developed by MongoDB Inc. It runs on all major platforms, such as macOS, Windows, Solaris, and most Linux distributions. In general, there are three main development goals behind the MongoDB database:

  1. Scale well
  2. Store rich data structures
  3. Provide a sophisticated query mechanism

MongoDB is a distributed database, so high availability, horizontal scaling, and geographic distribution are built into the system. It stores data in flexible JSON-like documents. You can model these documents to map the objects in your applications, which makes it possible to work with your data effectively.

MongoDB provides a powerful query language that supports ad hoc queries, indexing, aggregation, geospatial search, text search, and a lot more. This presents you with a powerful tool kit to access and work with your data. Finally, MongoDB is freely available and has great Python support.

Reviewing MongoDB’s Features

So far, you’ve learned what MongoDB is and what its main goals are. In this section, you’ll learn about some of MongoDB’s more important features. As for the database management side, MongoDB offers the following features:

  • Query support: You can use many standard query types, such as matching (==), comparison (<, >), and regular expressions.
  • Data accommodation: You can store virtually any kind of data, be it structured, partially structured, or even polymorphic.
  • Scalability: It handles more queries just by adding more machines to the server cluster.
  • Flexibility and agility: You can develop applications with it quickly.
  • Document orientation and schemalessness: You can store all the information regarding a data model in a single document.
  • Adjustable schema: You can change the schema of the database on the fly, which reduces the time needed to provide new features or fix existing problems.
  • Relational database functionalities: You can perform actions common to relational databases, like indexing.

As for the operations side, MongoDB provides a few tools and features that you won’t find in other database systems:

  • Scalability: Whether you need a stand-alone server or complete clusters of independent servers, you can scale MongoDB to whatever size you need it to be.
  • Load-balancing support: MongoDB will automatically move data across various shards.
  • Automatic failover support: If your primary server goes down, then a new primary will be up and running automatically.
  • Management tools: You can track your machines using the cloud-based MongoDB Management Service (MMS).
  • Memory efficiency: Thanks to the memory-mapped files, MongoDB is often more efficient than relational databases.

All these features are quite useful. For example, if you take advantage of the indexing feature, then much of your data will be kept in memory for quick retrieval. Even without indexing specific document keys, MongoDB caches quite a bit of data using the least recently used technique.

Installing and Running MongoDB

Now that you’re familiar with MongoDB, it’s time to get your hands dirty and start using it. But first, you need to install it on your machine. MongoDB’s official site provides two editions of the database server:

  1. The community edition offers the flexible document model along with ad hoc queries, indexing, and real-time aggregation to provide powerful ways to access and analyze your data. This edition is freely available.
  2. The enterprise edition offers the same features as the community edition, plus other advanced features related to security and monitoring. This is the commercial edition, but you can use it free of charge for an unlimited time for evaluation and development purposes.

If you’re on Windows, then you can read through the installation tutorial for complete instructions. In general, you can go to the download page, select the Windows platform in the Available Downloads box, choose the .msi installer that fits your current system, and click Download.

Run the installer and follow the on-screen instructions on the installation wizard. This page also provides information on how to run MongoDB as a Windows service.

If you’re on macOS, then you can use Homebrew to install MongoDB on your system. See the installation tutorial to get the complete guide. Also, make sure to follow the instruction to run MongoDB as a macOS service.

If you’re on Linux, then the installation process will depend on your specific distribution. For a detailed guide on how to install MongoDB on different Linux systems, go to the installation tutorial page and select the tutorial that matches your current operating system. Make sure you run the MongoDB daemon, mongod, at the end of the installation.

Finally, you can also install MongoDB using Docker. This is handy if you don’t want to clutter your system with another installation. If you prefer this installation option, then you can read through the official tutorial and follow its directions. Note that previous knowledge of how to use Docker would be required in this case.

With the MongoDB database installed and running on your system, you can start working with real databases using the mongo shell.

Creating MongoDB Databases With the mongo Shell

If you’ve followed the installation and running instructions, then you should already have an instance of MongoDB running on your system. Now you can start creating and testing your own databases. In this section, you’ll learn how to use the mongo shell to create, read, update, and delete documents on a database.

Running the mongo Shell

The mongo shell is an interactive JavaScript interface to MongoDB. You can use this tool to query and manipulate your data as well as to perform administrative operations. Since it’s a JavaScript interface, you won’t use the familiar SQL language to query the database. Instead, you’ll use JavaScript code.

To launch the mongo shell, open your terminal or command line and run the following command:

$ mongo

This command takes you to the mongo shell. At this point, you’ll probably see a bunch of messages with information on the shell’s version and on the sever address and port. Finally, you’ll be presented with the shell prompt (>) to enter queries and commands.

You can pass the database address as an argument to the mongo command. You can also use several options, such as specifying the host and port to access a remote database, and so on. For more details on how to use the mongo command, you can run mongo --help.

Establishing a Connection

When you run the mongo command without arguments, it launches the shell and connects to the default local server provided by the mongod process at mongod://127.0.0.1:27017. This means you’re connected to the local host through port 27017.

By default, the mongo shell starts the session by establishing a connection to the test database. You can access the current database through the db object:

> db
test
>

In this case, db holds a reference to test, which is the default database. To switch databases, issue the command use, providing a database name as an argument.

For example, say you want to create a website to publish Python content, and you’re planning to use MongoDB to store your tutorials and articles. In that case, you can switch to the site’s database with the following command:

> use rptutorials
switched to db rptutorials

This command switches your connection to the rptutorials database. MongoDB doesn’t create the physical database file on the file system until you insert real data into the database. So in this case, rptutorials won’t show up in your current database list:

> show dbs
admin          0.000GB
config         0.000GB
local          0.000GB
>

The mongo shell provides a lot of features and options. It allows you to query and manipulate your data and also to manage the database server itself.

Instead of using a standardized query language such as SQL, the mongo shell uses the JavaScript programming language and a user-friendly API. This API allows you to play around with your data, which is the topic for the next section.

Creating Collections and Documents

A MongoDB database is a physical container for collections of documents. Each database gets its own set of files on the file system. These files are managed by the MongoDB server, which can handle several databases.

In MongoDB, a collection is a group of documents. Collections are somewhat analogous to tables in a traditional RDBMS, but without imposing a rigid schema. In theory, each document in a collection can have a completely different structure or set of fields.

In practice, documents in a collection commonly share a similar structure to allow uniform retrieval, insertion, and update processes. You can enforce a uniform document structure by using document validation rules during updates and insertions.

Allowing different document structures is a key feature of MongoDB collections. This feature provides flexibility and allows adding new fields to documents without the need for modifying a formal table schema.

To create a collection using the mongo shell, you need to point db to your target database and then create the collections using the dot notation:

> use rptutorials
switched to db rptutorials
> db
rptutorials
> db.tutorial
rptutorials.tutorial

In this example, you use the dot notation to create tutorial as a collection in rptutorials, which is your current database. It’s important to note that MongoDB creates databases and collections lazily. In other words, they’re physically created only after you insert the first document.

Once you have a database and a collection, you can start inserting documents. Documents are the unit of storage in MongoDB. In an RDBMS, this would be equivalent to a table row. However, MongoDB’s documents are way more versatile than rows because they can store complex information, such as arrays, embedded documents, and even arrays of documents.

MongoDB stores documents in a format called Binary JSON (BSON), which is a binary representation of JSON. MongoDB’s documents are composed of field-and-value pairs and have the following structure:

{
   field1 → value1,
   field2 → value2,
   field3 → value3,
   ...
   fieldN → valueN
}

The value of a field can be any BSON data type, including other documents, arrays, and arrays of documents. In practice, you’ll specify your documents using the JSON format.

When you’re building a MongoDB database application, probably your most important decision is about the structure of documents. In other words, you’ll have to decide which fields and values your documents will have.

In the case of the tutorials for your Python site, your documents might be structured like this:

{
    "title": "Reading and Writing CSV Files in Python",
    "author": "Jon",
    "contributors": [
        "Aldren",
        "Geir Arne",
        "Joanna",
        "Jason"
    ],
    "url": "https://realpython.com/python-csv/"
}

A document is essentially a set of property names and their values. The values can be simple data types, such as strings and numbers, but they can also be arrays such as contributors in the above example.

MongoDB’s document-oriented data model naturally represents complex data as a single object. This allows you to work with data objects holistically, without the need for looking at several places or tables.

If you were using a traditional RDBMS to store your tutorials, then you would probably have a table to store your tutorials and another table to store your contributors. Then you’d have to set up a relationship between both tables so you could retrieve the data later on.

Working With Collections and Documents

So far, you know the basics of how to run and use the mongo shell. You also know how to create your own documents using the JSON format. Now it’s time to learn how to insert documents into your MongoDB database.

To insert a document into a database using the mongo shell, you first need to choose a collection and then call .insertOne() on the collection with your document as an argument:

> use rptutorials
switched to db rptutorials

> db.tutorial.insertOne({
...     "title": "Reading and Writing CSV Files in Python",
...     "author": "Jon",
...     "contributors": [
...         "Aldren",
...         "Geir Arne",
...         "Joanna",
...         "Jason"
...     ],
...     "url": "https://realpython.com/python-csv/"
... })
{
    "acknowledged" : true,
    "insertedId" : ObjectId("600747355e6ea8d224f754ba")
}

With the first command, you switch to the database you want to use. The second command is a JavaScript method call that inserts a simple document into the selected collection, tutorial. Once you hit Enter, you get a message on your screen that informs you about the newly inserted document and its insertedId.

Just like relational databases need a primary key to uniquely identify every row in a table, MongoDB documents need to have an _id field that uniquely identifies the document. MongoDB allows you to enter a custom _id as long as you guarantee its uniqueness. However, a widely accepted practice is to allow MongoDB to automatically insert an _id for you.

Similarly, you can add several documents in one go using .insertMany():

> tutorial1 = {
...     "title": "How to Iterate Through a Dictionary in Python",
...     "author": "Leodanis",
...     "contributors": [
...         "Aldren",
...         "Jim",
...         "Joanna"
...     ],
...     "url": "https://realpython.com/iterate-through-dictionary-python/"
... }

> tutorial2 = {
...      "title": "Python 3's f-Strings: An Improved String Formatting Syntax",
...      "author": "Joanna",
...      "contributors": [
...          "Adriana",
...          "David",
...          "Dan",
...          "Jim",
...          "Pavel"
...      ],
...      "url": "https://realpython.com/python-f-strings/"
... }

> db.tutorial.insertMany([tutorial1, tutorial2])
{
    "acknowledged" : true,
    "insertedIds" : [
        ObjectId("60074ff05e6ea8d224f754bb"),
        ObjectId("60074ff05e6ea8d224f754bc")
    ]
}

Here, the call to .insertMany() takes a list of tutorials and inserts them into the database. Again, the shell output shows information about the newly inserted documents and their automatically added _id fields.

The mongo shell also provides methods to perform read, update, and delete operations on the database. For example, you can use .find() to retrieve the documents in a collection:

> db.tutorial.find()
{ "_id" : ObjectId("600747355e6ea8d224f754ba"),
"title" : "Reading and Writing CSV Files in Python",
"author" : "Jon",
"contributors" : [ "Aldren", "Geir Arne", "Joanna", "Jason" ],
"url" : "https://realpython.com/python-csv/" }
    ...

> db.tutorial.find({author: "Joanna"})
{ "_id" : ObjectId("60074ff05e6ea8d224f754bc"),
"title" : "Python 3's f-Strings: An Improved String Formatting Syntax (Guide)",
"author" : "Joanna",
"contributors" : [ "Adriana", "David", "Dan", "Jim", "Pavel" ],
"url" : "https://realpython.com/python-f-strings/" }

The first call to .find() retrieves all the documents in the tutorial collection. On the other hand, the second call to .find() retrieves those tutorials that are authored by Joanna.

With this background knowledge on how to use MongoDB through its mongo shell, you’re ready to start using MongoDB with Python. The next few sections will walk you through different options for using MongoDB databases in your Python applications.

Using MongoDB With Python and PyMongo

Now that you know what MongoDB is and how to create and manage databases using the mongo shell, you can start using MongoDB, but this time with Python. MongoDB provides an official Python driver called PyMongo.

In this section, you’ll go through some examples that’ll help you get a feeling of how to use PyMongo to create your own database applications with MongoDB and Python.

Each module within PyMongo is responsible for a set of operations on the database. You’ll have modules for at least the following tasks:

In general, PyMongo provides a rich set of tools that you can use to communicate with a MongoDB server. It provides functionality to query, retrieve results, write and delete data, and run database commands.

Installing PyMongo

To start using PyMongo, you first need to install it in your Python environment. You can use a virtual environment, or you can use your system-wide Python installation, although the first option is preferred. PyMongo is available on PyPI, so the quickest way to install it is with pip. Fire up your terminal and run the following command:

$ pip install pymongo==3.11.2

After a few downloads and other related steps, this command installs PyMongo on your Python environment. Note that if you don’t supply a specific version number, then pip will install the latest available version.

Once you’re done with the installation, you can start a Python interactive session and run the following import:

>>>
>>> import pymongo

If this runs without raising an exception in the Python shell, then your installation works just fine. If not, then carefully perform the steps again.

Establishing a Connection

To establish a connection to a database, you need to create a MongoClient instance. This class provides a client for a MongoDB instance or server. Each client object has a built-in connection pool, which by default handles up to a hundred connections to the server.

Get back to your Python interactive session and import MongoClient from pymongo. Then create a client object to communicate with your currently running MongoDB instance:

>>>
>>> from pymongo import MongoClient
>>> client = MongoClient()
>>> client
MongoClient(host=['localhost:27017'], ..., connect=True)

The code above establishes a connection to the default host (localhost) and port (27017). MongoClient takes a set of arguments that allows you to specify custom host, port, and other connection parameters. For example, to provide a custom host and port, you can use the following code:

>>>
>>> client = MongoClient(host="localhost", port=27017)

This is handy when you need to provide a host and port that differ from MongoDB’s default setup. You can also use the MongoDB URI format:

>>>
>>> client = MongoClient("mongodb://localhost:27017")

All these instances of MongoClient provide the same client setup to connect your current MongoDB instance. Which one you should use just depends on how explicit you want to be in your code.

Once you’ve instantiated MongoClient, you can use its instance to refer to that specific database connection, just like you did with the mongo shell’s db object in the above section.

Working With Databases, Collections, and Documents

Once you have a connected instance of MongoClient, you can access any database managed by the specified MongoDB server. To define which database you want to use, you can use the dot notation just like you did in the mongo shell:

>>>
>>> db = client.rptutorials
>>> db
Database(MongoClient(host=['localhost:27017'], ..., connect=True), 'rptutorials')

In this case, rptutorials is the name of the database you’ll be working with. If the database doesn’t exist, then MongoDB creates it for you, but only when you perform the first operation on the database.

You can also use dictionary-style access if the name of the database isn’t a valid Python identifier:

>>>
>>> db = client["rptutorials"]

This statement is handy when the name of your database isn’t a valid Python identifier. For example, if your database is called rp-tutorials, then you need to use dictionary-style access.

Storing data in your database using PyMongo is similar to what you did with the mongo shell in the above sections. But first, you need to create your documents. In Python, you use dictionaries to create documents:

>>>
>>> tutorial1 = {
...     "title": "Working With JSON Data in Python",
...     "author": "Lucas",
...     "contributors": [
...         "Aldren",
...         "Dan",
...         "Joanna"
...     ],
...     "url": "https://realpython.com/python-json/"
... }

Once you’ve created the document as a dictionary, you need to specify which collection you want to use. To do that, you can use the dot notation on the database object:

>>>
>>> tutorial = db.tutorial
>>> tutorial
Collection(Database(..., connect=True), 'rptutorials'), 'tutorial')

In this case, tutorial is an instance of Collection and represents a physical collection of documents in your database. You can insert documents into tutorial by calling .insert_one() on it with a document as an argument:

>>>
>>> result = tutorial.insert_one(tutorial1)
>>> result
<pymongo.results.InsertOneResult object at 0x7fa854f506c0>

>>> print(f"One tutorial: {result.inserted_id}")
One tutorial: 60084b7d87eb0fbf73dbf71d

Here, .insert_one() takes tutorial1, inserts it into the tutorial collection and returns an InsertOneResult object. This object provides feedback on the inserted document. Note that since MongoDB generates the ObjectId dynamically, your output won’t match the ObjectId shown above.

If you have many documents to add to the database, then you can use .insert_many() to insert them in one go:

>>>
>>> tutorial2 = {
...     "title": "Python's Requests Library (Guide)",
...     "author": "Alex",
...     "contributors": [
...         "Aldren",
...         "Brad",
...         "Joanna"
...     ],
...     "url": "https://realpython.com/python-requests/"
... }

>>> tutorial3 = {
...     "title": "Object-Oriented Programming (OOP) in Python 3",
...     "author": "David",
...     "contributors": [
...         "Aldren",
...         "Joanna",
...         "Jacob"
...     ],
...     "url": "https://realpython.com/python3-object-oriented-programming/"
... }

>>> new_result = tutorial.insert_many([tutorial2, tutorial3])

>>> print(f"Multiple tutorials: {new_result.inserted_ids}")
Multiple tutorials: [
   ObjectId('6008511c87eb0fbf73dbf71e'),
   ObjectId('6008511c87eb0fbf73dbf71f')
]

This is faster and more straightforward than calling .insert_one() multiple times. The call to .insert_many() takes an iterable of documents and inserts them into the tutorial collection in your rptutorials database. The method returns an instance of InsertManyResult, which provides information on the inserted documents.

To retrieve documents from a collection, you can use .find(). Without arguments, .find() returns a Cursor object that yields the documents in the collection on demand:

>>>
>>> import pprint

>>> for doc in tutorial.find():
...     pprint.pprint(doc)
...
{'_id': ObjectId('600747355e6ea8d224f754ba'),
 'author': 'Jon',
 'contributors': ['Aldren', 'Geir Arne', 'Joanna', 'Jason'],
 'title': 'Reading and Writing CSV Files in Python',
 'url': 'https://realpython.com/python-csv/'}
    ...
{'_id': ObjectId('6008511c87eb0fbf73dbf71f'),
 'author': 'David',
 'contributors': ['Aldren', 'Joanna', 'Jacob'],
 'title': 'Object-Oriented Programming (OOP) in Python 3',
 'url': 'https://realpython.com/python3-object-oriented-programming/'}

Here, you run a loop on the object that .find() returns and print successive results, using pprint.pprint() to provide a user-friendly output format.

You can also use .find_one() to retrieve a single document. In this case, you can use a dictionary that contains fields to match. For example, if you want to retrieve the first tutorial by Jon, then you can do something like this:

>>>
>>> import pprint

>>> jon_tutorial = tutorial.find_one({"author": "Jon"})

>>> pprint.pprint(jon_tutorial)
{'_id': ObjectId('600747355e6ea8d224f754ba'),
 'author': 'Jon',
 'contributors': ['Aldren', 'Geir Arne', 'Joanna', 'Jason'],
 'title': 'Reading and Writing CSV Files in Python',
 'url': 'https://realpython.com/python-csv/'}

Note that the tutorial’s ObjectId is set under the _id key, which is the unique document identifier that MongoDB automatically adds when you insert a document into your database.

PyMongo also provides methods to replace, update, and delete documents from a database. If you want to dive deeper into these features, then take a look at the documentation for Collection.

Closing Connections

Establishing a connection to a MongoDB database is typically an expensive operation. If you have an application that constantly retrieves and manipulates data in a MongoDB database, then you probably don’t want to be opening and closing the connection all the time since this might affect your application’s performance.

In this kind of situation, you should keep your connection alive and only close it before exiting the application to clear all the acquired resources. You can close the connection by calling .close() on the MongoClient instance:

>>>
>>> client.close()

Another situation is when you have an application that occasionally uses a MongoDB database. In this case, you might want to open the connection when needed and close it immediately after use for freeing the acquired resources. A consistent approach to this problem would be to use the with statement. Yes, MongoClient implements the context manager protocol:

>>>
>>> import pprint
>>> from pymongo import MongoClient

>>> with MongoClient() as client:
...     db = client.rptutorials
...     for doc in db.tutorial.find():
...         pprint.pprint(doc)
...
{'_id': ObjectId('600747355e6ea8d224f754ba'),
 'author': 'Jon',
 'contributors': ['Aldren', 'Geir Arne', 'Joanna', 'Jason'],
 'title': 'Reading and Writing CSV Files in Python',
 'url': 'https://realpython.com/python-csv/'}
    ...
{'_id': ObjectId('6008511c87eb0fbf73dbf71f'),
 'author': 'David',
 'contributors': ['Aldren', 'Joanna', 'Jacob'],
 'title': 'Object-Oriented Programming (OOP) in Python 3',
 'url': 'https://realpython.com/python3-object-oriented-programming/'}

If you use the with statement to handle your MongoDB client, then at the end of the with code block, the client’s .__exit__() method gets called, which at the same time closes the connection by calling .close().

Using MongoDB With Python and MongoEngine

While PyMongo is a great and powerful Python driver for interfacing with MongoDB, it’s probably a bit too low-level for many of your projects. With PyMongo, you’ll have to write a lot of code to consistently insert, retrieve, update, and delete documents.

One library that provides a higher abstraction on top of PyMongo is MongoEngine. MongoEngine is an object-document mapper (ODM), which is roughly equivalent to an SQL-based object-relational mapper (ORM). MongoEngine provides a class-based abstraction, so all the models you create are classes.

Installing MongoEngine

There are a handful of Python libraries to help you work with MongoDB. MongoEngine, however, is a popular one that provides a nice set of features, flexibility, and community support. MongoEngine is available on PyPI. You can install it using the following pip command:

$ pip install mongoengine==0.22.1

Once you’ve installed MongoEngine into your Python environment, you’re ready to start working with MongoDB databases using Python’s object-oriented features. The next step is to connect to your running MongoDB instance.

Establishing a Connection

To establish a connection with your database, you need to use mongoengine.connect(). This function takes several arguments. However, in this tutorial, you’ll use only three of them. Within your Python interactive session, type the following code:

>>>
>>> from mongoengine import connect
>>> connect(db="rptutorials", host="localhost", port=27017)
MongoClient(host=['localhost:27017'], ..., read_preference=Primary())

Here, you first set the database name db to "rptutorials", which is the name of the database you want to work in. Then you provide a host and a port to connect to your current MongoDB instance. Since you’re using the default host and port, you can omit these two parameters and just use connect("rptutorials").

Working With Collections and Documents

To create documents with MongoEngine, you first need to define what data you want the documents to have. In other words, you need to define a document schema. MongoEngine encourages you to define a document schema to help you reduce coding errors and to allow you to define utility or helper methods.

Similar to ORMs, ODMs like MongoEngine provide a base or model class for you to define a document schema. In ORMs, that class is equivalent to a table, and its instances are equivalent to rows. In MongoEngine, the class is equivalent to a collection, and its instances are equivalent to documents.

To create a model, you need to subclass Document and provide the required fields as class attributes. To continue with the blog example, here’s how you can create a model for your tutorials:

>>>
>>> from mongoengine import Document, ListField, StringField, URLField

>>> class Tutorial(Document):
...     title = StringField(required=True, max_length=70)
...     author = StringField(required=True, max_length=20)
...     contributors = ListField(StringField(max_length=20))
...     url = URLField(required=True)

With this model, you tell MongoEngine that you expect a Tutorial document to have a .title, an .author, a list of .contributors, and a .url. The base class, Document, uses that information along with the field types to validate the input data for you.

For example, if you try to save a Tutorial object without a .title, then your model throws an exception and lets you know. You can take this even further and add more restrictions, such as the length of the .title, and so on.

There are a few general parameters that you can use to validate fields. Here are some of the more commonly used parameters:

  • db_field specifies a different field name.
  • required ensures that the field is provided.
  • default provides a default value for a given field if no value is given.
  • unique ensures that no other document in the collection has the same value for this field.

Each specific field type also has its own set of parameters. You can check the documentation for a complete guide to the available field types.

To save a document to your database, you need to call .save() on a document object. If the document already exists, then all the changes will be applied to the existing document. If the document doesn’t exist, then it’ll be created.

Here’s an example of creating and saving a tutorial into your sample tutorials database:

>>>
>>> tutorial1 = Tutorial(
...     title="Beautiful Soup: Build a Web Scraper With Python",
...     author="Martin",
...     contributors=["Aldren", "Geir Arne", "Jaya", "Joanna", "Mike"],
...     url="https://realpython.com/beautiful-soup-web-scraper-python/"
... )

>>> tutorial1.save()  # Insert the new tutorial
<Tutorial: Tutorial object>

By default, .save() inserts the new document into a collection named after the model class, Tutorial, except using lowercase letters. In this case, the collection name is tutorial, which matches the collection you’ve been using to save your tutorials.

PyMongo performs data validation when you call .save(). This means that it checks the input data against the schema you declared in the Tutorial model class. If the input data violates the schema or any of its constraints, then you get an exception, and the data isn’t saved into the database.

For example, here’s what happens if you try to save a tutorial without providing a .title:

>>>
>>> tutorial2 = Tutorial()
>>> tutorial2.author = "Alex"
>>> tutorial2.contributors = ["Aldren", "Jon", "Joanna"]
>>> tutorial2.url = "https://realpython.com/convert-python-string-to-int/"
>>> tutorial2.save()
Traceback (most recent call last):
  ...
mongoengine.errors.ValidationError: ... (Field is required: ['title'])

In this example, first note that you can also build a Tutorial object by assigning values to its attributes. Second, since you don’t provide a .title for the new tutorial, .save() raises a ValidationError telling you that the .title field is required. Having automatic data validation is a great feature that will save you some headaches.

Each Document subclass has an .objects attribute that you can use to access the documents in the associated collection. For example, here’s how you can print the .title of all your current tutorials:

>>>
>>> for doc in Tutorial.objects:
...     print(doc.title)
...
Reading and Writing CSV Files in Python
How to Iterate Through a Dictionary in Python
Python 3's f-Strings: An Improved String Formatting Syntax (Guide)
Working With JSON Data in Python
Python's Requests Library (Guide)
Object-Oriented Programming (OOP) in Python 3
Beautiful Soup: Build a Web Scraper With Python

The for loop iterates over all your tutorials and prints their .title data to the screen. You can also use .objects to filter your documents. For example, say you want to retrieve the tutorials authored by Alex. In that case, you can do something like this:

>>>
>>> for doc in Tutorial.objects(author="Alex"):
...     print(doc.title)
...
Python's Requests Library (Guide)

MongoEngine is well suited to manage your MongoDB databases for just about any type of application. Its features make it ideal for creating efficient and scalable programs using a high-level approach. If you’re looking for more information about MongoEngine, be sure to check out its user guide.

Conclusion

If you need a robust, scalable, and flexible database solution, then MongoDB might be a good option for you. MongoDB is a mature and popular NoSQL database with great Python support. With a good understanding of how to access MongoDB with Python, you’ll be ready to create database applications that scale well and provide excellent performance.

With MongoDB, you also have the benefit of a human-readable and highly-flexible data model, so you can adapt to requirement changes quickly.

In this tutorial, you learned:

  • What MongoDB and NoSQL databases are
  • How to install and run MongoDB on your system
  • How to create and work with MongoDB databases
  • How to interface with MongoDB in Python using the PyMongo driver
  • How to use the MongoEngine object-document mapper to work with MongoDB

The examples you coded in this tutorial are available for download. To get their source code, click the link below:

🐍 Python Tricks 💌

Get a short & sweet Python Trick delivered to your inbox every couple of days. No spam ever. Unsubscribe any time. Curated by the Real Python team.

Python Tricks Dictionary Merge

About Leodanis Pozo Ramos

Leodanis Pozo Ramos Leodanis Pozo Ramos

Leodanis is an industrial engineer who loves Python and software development. He's a self-taught Python developer with 6+ years of experience. He's an avid technical writer with a growing number of articles published on Real Python and other sites.

» More about Leodanis

Each tutorial at Real Python is created by a team of developers so that it meets our high quality standards. The team members who worked on this tutorial are:

Master Real-World Python Skills With Unlimited Access to Real Python

Join us and get access to hundreds of tutorials, hands-on video courses, and a community of expert Pythonistas:

Level Up Your Python Skills »

Master Real-World Python Skills
With Unlimited Access to Real Python

Join us and get access to hundreds of tutorials, hands-on video courses, and a community of expert Pythonistas:

Level Up Your Python Skills »

What Do You Think?

Real Python Comment Policy: The most useful comments are those written with the goal of learning from or helping out other readers—after reading the whole article and all the earlier comments. Complaints and insults generally won’t make the cut here.

What’s your #1 takeaway or favorite thing you learned? How are you going to put your newfound skills to use? Leave a comment below and let us know.

Keep Learning

Related Tutorial Categories: databases intermediate