You’ll dive into the history of PyPDF2 and consider another PDF module for Python. You can check out the following resources:
History of PyPDF2
The last official release for
pyPDF was in 2010. Then, after approximately one year had passed, a company called Phasit sponsored a fork of
pyPDF called—you can guess—
PyPDF2. The code for
PyPDF2 was written in such a way so as to be backwards compatible with
pyPDF, and it worked quite well.
Its final release was in 2016.
PyPDF3 was then created, but after only a short time and a few releases, it was renamed to
PyPDF4. All of these project packages do much the same thing, with the biggest difference between
PyPDF2 and up is that the later versions include support for Python 3.
There’s a different Python 3 fork of the original
pyPDF for Python 3, but this has not been maintained for a number of years now. While
PyPDF2 was recently abandoned, there is not yet full backwards compatibility between
So while most of the examples you will encounter throughout this course will work with
PyPDF4, some will not, which is why
PyPDF4 is not more heavily featured within the course. With this in mind, I do encourage you to swap out the
PyPDF2 imports for
PyPDF4 and just see what happens.
PyPDF2 is not the only package available to use in order to work with PDFs in Python.
pdfrw was created by Patrick Maupin and it is capable of many of the manipulations that
PyPDF2 can achieve, including most of the examples that this course covers. The notable exception to this, though, is PDF encryption.
The biggest difference to
pdfrw is that it integrates nicely with the ReportLab package, so you can take a pre-existing PDF and build a new one with ReportLab using some or all of the original PDF. There are links below the video if you wish to check out
pdfrw and ReportLab for yourself. Now, how do we install the
PyPDF2 package? Well, we do so by using the
pip install command within the Python shell.
Just like that. Just a side note: if you do happen to be using Anaconda rather than regular Python, instead of using the
pip install command within the Python shell, you can instead use the
conda install command. However, if you are like me—and as you can see, I like using the Thonny IDE—there is a better way that you can do that.
Using the package manager is also covered in the Real Python tutorial for Thonny. A link to this tutorial is available below the video. Now that you have managed to install the
PyPDF2 package, it is time to extract some information from a PDF. In order to do so, however, you’ll have to join me in the next part of this course. See you there.
Become a Member to join the conversation.