OCR (Optical Character Recognition) has become a common Python tool. With the advent of libraries such as Tesseract and Ocrad, more and more developers are building libraries and bots that use OCR in novel, interesting ways. A trivial example is a basic OCR tool used to extract text from screenshots so you don’t have to re-type the text later on.
We’ll start by developing the Flask back-end layer to serve the results of the OCR engine. From there you can just hit the endpoint and serve the results to the end user in the manner that suits you. All of this is covered in detail by the tutorial. We’ll also add a bit of back-end code to generate an HTML form as well as the front-end code to consume the API. This will not be covered by the tutorial, but you will have access to the code.
Let’s get to it.
First, we have to install some dependencies. As always, configuring your environment is 90% of the fun.
This post has been tested on Ubuntu version 14.04 but it should work for 12.x and 13.x versions as well. If you’re running OSX, you can use VirtualBox, Docker (check out the Dockerfile along with an install guide are included) or a droplet on DigitalOcean (recommended!) to create the appropriate environment.
NOTE: You can also use the _run.sh shell script to quickly install the dependencies along with Leptonica and Tesseract. If you go this route, skip down to the Web-server time! section. But please consider manually building these libraries if you have not done so before (for learning purposes).
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
sudo apt-get update is short for “make sure we have the latest package listings”. We then grab a number of libraries that allow us to toy with images – i.e.,
libpng, etc. Beyond that, we grab
Python 2.7, our programming language of choice, along with the
python-imaging library for interaction with all these pieces.
Speaking of images, we need ImageMagick as well if we want to toy with (edit) the images before we throw them in programmatically.
Building Leptonica and Tesseract
Again, if you ran the shell script, these are already installed, so proceed to the Web-server time! section
Now, time for Leptonica, finally!
1 2 3 4 5 6 7 8
If this is your first time playing with tar, here’s what’s happening:
- Grab the binary for Leptonica (via
- Unzip the tarball
cdinto the new unpacked directory
configurebash scripts to set up the application
maketo build it
- Install it with
makeafter the build
- Create the necessary links with
Boom! Now we have Leptonica. On to Tesseract!
And now to download and build Tesseract…
1 2 3 4 5 6 7 8 9
The process here mirrors the Leptonica one almost perfectly. So to keep this DRY, see the Leptonica explanation for more information.
We need to set up an environment variable to source our Tesseract data:
Finally, let’s get the Tesseract english language packages that are relevant:
1 2 3 4
BOOM! We now have Tesseract. We can use the CLI to test. Feel free to read the docs if you want to play. However, we need a Python wrapper to truly achieve our end goal. So the next step is to set up a Flask server along with a basic API that accepts POST requests:
- Accept an image URL
- Run the character recognition on the image
Now, on to the fun stuff. First, we need to build a way to interface with Tesseract via Python. We COULD use
popen but that just feels wrong/unPythonic. Instead, we can use a very minimal, but functional Python package wrapping Tesseract – pytesseract.
1 2 3 4 5 6 7 8
NOTE: The Flask Boilerplate (maintained by Real Python) is a wonderful library for getting a simple, Pythonic server running. We customized this for our base application. Check out the Flask Boilerplate repository for more info.
Let’s make an OCR Engine
Now, we need to make a class using pytesseract to intake and read images. Create a new file called ocr.py in the “flask_server” directory and add the following code:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
So, in our main method,
process_image(), we sharpen the image to crisp up the text.
Sweet! A working module to toy with.
Optional: Building a CLI tool for your new OCR Engine
Making a CLI is a great proof of concept, and a fun breather after doing so much configuration. So lets take a stab at making one. Create a new file within “flask_server” called cli.py and then add the following code:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29
This is really quite simple. Line by line we look at the text output from our engine, and output it to STDOUT. Test it out (
python flask_server/cli.py) with a few image urls, or play with your own ascii art for a good time.
Back to the server
Now that we have an engine, we need to get ourselves some output! Add the following route handler and view function to app.py:
1 2 3 4 5 6 7 8 9 10 11 12 13
Make sure to update the imports:
1 2 3 4 5 6
Also, add the API version number:
Now, as you can see, we just add in the JSON response of the Engine’s
process_image() method, passing it in a file object using
Image from PIL to install. And, yes – For the time being, this currently only works with .jpg images.
NOTE: You will not have
PILitself installed; this runs off of
Pillowand allows us to do the same thing. This is because the PIL library was at one time forked, and turned into
Pillow. The community has strong opinions on this matter. Consult Google for insight – and drama.
Run your app:
Then in another terminal tab run:
1 2 3 4
With the back-end API done along with the OCR Engine, we can now add a basic front-end to consume the API and add the results to the DOM via AJAX and jQuery. Again, this is not covered by this tutorial, but you can grab the code from the repository.
Test this out with some sample images:
Conclusion and next steps