History of PyPDF2

You’ll dive into the history of PyPDF2 and consider another PDF module for Python. You can check out the following resources:

00:00 Welcome back to working with PDFs in Python. Let’s explore PyPDF2 and its history. Back in 2005, pyPDF was initially released.

00:11 The last official release for pyPDF was in 2010. Then, after approximately one year had passed, a company called Phasit sponsored a fork of pyPDF called—you can guess—PyPDF2. The code for PyPDF2 was written in such a way so as to be backwards compatible with pyPDF, and it worked quite well.

00:33 Its final release was in 2016. PyPDF3 was then created, but after only a short time and a few releases, it was renamed to PyPDF4. All of these project packages do much the same thing, with the biggest difference between pyPDF and PyPDF2 and up is that the later versions include support for Python 3.

00:56 There’s a different Python 3 fork of the original pyPDF called pyPDF for Python 3, but this has not been maintained for a number of years now.

01:08 There is not yet full backwards compatibility between PyPDF2 and PyPDF4. So while most of the examples you will encounter throughout this course will work with PyPDF4, some will not, which is why PyPDF4 is not more heavily featured within the course. With this in mind, I do encourage you to swap out the PyPDF2 imports for PyPDF4 and just see what happens. PyPDF2 is not the only package available to use in order to work with PDFs in Python.

01:38 pdfrw was created by Patrick Maupin and it is capable of many of the manipulations that PyPDF2 can achieve, including most of the examples that this course covers. The notable exception to this, though, is PDF encryption.

01:54 The biggest difference to pdfrw is that it integrates nicely with the ReportLab package, so you can take a pre-existing PDF and build a new one with ReportLab using some or all of the original PDF. There are links below the video if you wish to check out pdfrw and ReportLab for yourself. Now, how do we install the PyPDF2 package? Well, we do so by using the pip install command within the Python shell.

02:21 Just like that. Just a side note:

02:29 if you do happen to be using Anaconda rather than regular Python, instead of using the pip install command within the Python shell, you can instead use the conda install command. However, if you are like me—and as you can see, I like using the Thonny IDE—there is a better way that you can do that.

02:50 You’ll want to go to Tools > Manage packages…. In here, you can search for PyPDF2, Find package, and Install.

02:57 It will go through its setup phase, and there you go.

03:08 We now have PyPDF2 installed. So, in case you missed that, that’s Tools > Manage packages…,

03:15 and as you can see now that I’ve gone back into it, PyPDF2 is there and it says to Uninstall, so you know it’s there. Using the package manager is also covered in the Real Python tutorial for Thonny.

03:28 A link to this tutorial is available below the video. Now that you have managed to install the PyPDF2 package, it is time to extract some information from a PDF. In order to do so, however, you’ll have to join me in the next part of this course. See you there.

Become a Member to join the conversation.