The following is a collaboration piece between Cam Linke, co-founder of Startup Edmonton, and the folks at Real Python.
Updated 02/22/2015: Added Python 3 support.
In this section we’re going to get our database set up to store the results of our word counts. Along the way we’ll set up a Postgres database, add SQLAlchemy to our app for use as an ORM, and use Alembic for our data migrations.
Remember, here’s what we’re building: A Flask app that calculates word-frequency pairs based on the text from a given URL. This is a full-stack tutorial.
- Part One: Setup a local development environment and then deploy both a staging environment and a production environment on Heroku.
- Part Two: Setup a PostgreSQL database along with SQLAlchemy and Alembic to handle migrations. (current)
- Part Three: Add in the back-end logic to scrape and then process the counting of words from a webpage using the requests, BeautifulSoup, and Natural Language Toolkit (NLTK) libraries.
- Part Four: Implement a Redis task queue to handle the text processing.
- Part Five: Setup Angular on the front-end to continuously poll the back-end to see if the request is done.
- Part Six: Push to the staging server on Heroku – setting up Redis, detailing how to run two processes (web and worker) on a single Dyno.
- Part Seven: Update the front-end to make it more user-friendly.
- Part Eight: Add the D3 library into the mix to graph a frequency distribution and histogram.
Need the code? Grab it from the repo.
Tools we’ll use in this part:
- Postgres – http://www.postgresql.org/
- Psycopg2 – http://initd.org/psycopg/
- SQLAlchemy – http://www.sqlalchemy.org/
- Alembic – http://alembic.readthedocs.org/en/latest/
- Flask-Migrate – http://flask-migrate.readthedocs.org/en/latest/
To get started install Postgres on your local computer if you don’t have it already. Since Heroku uses Postgres it will be good for us to develop locally on the same database. If you don’t have Postgres installed, Postgres.app is an easy way to get up and running quick for Mac OSX users. Once you have Postgres installed and running, create a database called wordcount_dev to use as our local development database. In order to use our newly created database in the Flask app we’re going to need to install a few things:
1 2 3
Psycopg is is a Python adapter for Postgres, SQLAlchemy is an awesome Python ORM, and Flask-Migrate will install both that extension and Alembic which we’ll use for our database migrations.
If you’re on Mavericks and having trouble installing psycopg2 check out this Stack Overflow article.
Add the following line to the
Config() class in your config.py file to set your app to use the newly created database in development (local), staging, and production:
Also make sure to add the following import:
Your config file should now look like this:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27
Now when our config is loaded into our app the appropriate database will be connected to it as well.
Similar to how we added an environment variable in the last post, we are going to add a
DATABASE_URL variable to our postactivate file. Using VIM you can do this in the following way:
Open your file in VIM:
Press ‘i’ on your keyboard to insert text and add the following line to your file:
Now hit escape, type ‘:’ then ‘wq’, and press enter to save and close VIM.
Restart your environment:
In your app.py file import SQLAlchemy and connect to the database:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
Set up a basic model to hold the results of the wordcount by adding a models.py file:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
What we’re doing here is creating a table to store the results of our word counts. We first import the database connection that we created in our app.py file as well as JSON from SQLAlchemy’s PostgreSQL dialects. JSON columns are fairly new to Postgres and are not available in every database supported by SQLAlchemy so we need to import it specifically.
Next we create a
Result() class and assign it a table name of
results. We then set the attributes that we want to store for a result – the ‘id’ of the result we stored, the ‘url’ that we counted the words from, a full list of words that we counted, and a list of words that we counted minus stop words (more on this later).
We then create an
__init__() method that will run the first time we create a new result and, finally, a
__repr__() method to represent the object when we query for it.
We are going to use Alembic and Flask-Migrate to migrate our database to the latest version. Alembic is migration library for SQLAlchemy and could be used without Flask-Migrate if you want. However Flask-Migrate does help with some of the setup and makes things easier.
Create a new file called manage.py:
1 2 3 4 5 6 7 8 9 10 11 12 13 14
In order to use Flask-Migrate we need to import
Manager as well as
MigrateCommand to our manage.py file. We also import
db so we have access to them within the script.
First we set our config to get our environment – based on the environment variable – and create a migrate instance with
db as the arguments and set up a
manager command to initialize a
Manager instance for our app. Finally we add the
db command to our manager so that we can run our migrations from the command line.
In order to run our migrations initialize Alembic:
1 2 3 4 5 6 7 8 9
After you run the database initialization you will see a new folder called “migrations” in the project. This holds the setup necessary for Alembic to run migrations on the project. Inside of “migrations” you will see that it has a folder called “versions”, containing the migration scripts as they are created.
Let’s create our first migration by running the
1 2 3 4 5
Now you’ll notice in your “versions” folder there is a migration file. This file is autogenerated by Alembic based on the model. You could generate (or edit) this file yourself; however, for a lot of cases the autogenerated file will do.
Now we’ll apply our upgrades to our database using the
db upgrade command:
1 2 3 4
Our database is now ready for us to use in our app.
Finally, let’s apply the migrations to our Heroku databases. First, though, we need to add the details of our staging and production databases to our config.py file. To check if you have a database set up on your staging server run:
1 2 3
Make sure to replace
wordcount-stagewith the name of your staging app.
Since we don’t see anything about a database, we need to add the Postgres addon to the staging server. To do so, run the following command to add the Postgres addon to your Heroku app:
1 2 3 4 5 6 7
Now when we run Heroku config again we should see the connection settings for our URL.
1 2 3
Next we need to commit the changes that you’ve made to git and push to your staging server:
Run the migrations that we created to migrate our staging database. We do this by using the
heroku run command to run python scripts within our Heroku app. We will use this to run the same
db upgrade command from our manage.py file.
1 2 3 4 5
Notice how we only ran the upgrade, not the
migratecommands like before. We already have our migration setup and ready to go, we only need to run it on our Heroku database.
Let’s now do the same for our production site. Set up a database for your production app on Heroku, just like you did for staging. Push your changes to your production site. Notice how you don’t have to make any changes to the config file – it’s setting the database based on the newly created
DATABASE_URL environment variable.
1 2 3
Now both our staging and production sites have their databases set up and are migrated – and ready to go!
When you apply a new migration to your production database, there could be down time. If this is an issue, you can setup database replication by adding a “follower” (commonly known as a slave) database. For more on this, check out the official Heroku documentation.
Remember in Part 1, when we tested the environment variables to make sure the right environment was being detected by adding a print statement to app.py –
print(os.environ['APP_SETTINGS'])? Well, let’s do the same thing, but test the Database URIs by adding a
Now let’s test.
Commit and push again to staging and production. Now let’s test it out…
1 2 3
1 2 3
The URIs for the staging and production should match the URIs displayed when we ran the
heroku config commands. Test this out again:
That’s it for part 2. I hope database migrations make better sense now. Please comment below with questions. In Part 3 we’re going to build the word counting functionality and have it sent to a request queue to deal with the longer running wordcount processing. See you next time. Cheers!
Don’t forget to remove the print statement from the config file when done, commit, and then push back up to your various environments.