History of PyPDF2
You’ll dive into the history of PyPDF2 and consider another PDF module for Python. You can check out the following resources:
00:00
Welcome back to working with PDFs in Python. Let’s explore PyPDF2
and its history. Back in 2005, pyPDF
was initially released.
00:11
The last official release for pyPDF
was in 2010. Then, after approximately one year had passed, a company called Phasit sponsored a fork of pyPDF
called—you can guess—PyPDF2
. The code for PyPDF2
was written in such a way so as to be backwards compatible with pyPDF
, and it worked quite well.
00:33
Its final release was in 2016. PyPDF3
was then created, but after only a short time and a few releases, it was renamed to PyPDF4
. All of these project packages do much the same thing, with the biggest difference between pyPDF
and PyPDF2
and up is that the later versions include support for Python 3.
00:56
There’s a different Python 3 fork of the original pyPDF
called pyPDF
for Python 3, but this has not been maintained for a number of years now.
01:08
There is not yet full backwards compatibility between PyPDF2
and PyPDF4
. So while most of the examples you will encounter throughout this course will work with PyPDF4
, some will not, which is why PyPDF4
is not more heavily featured within the course. With this in mind, I do encourage you to swap out the PyPDF2
imports for PyPDF4
and just see what happens. PyPDF2
is not the only package available to use in order to work with PDFs in Python.
01:38
pdfrw
was created by Patrick Maupin and it is capable of many of the manipulations that PyPDF2
can achieve, including most of the examples that this course covers. The notable exception to this, though, is PDF encryption.
01:54
The biggest difference to pdfrw
is that it integrates nicely with the ReportLab package, so you can take a pre-existing PDF and build a new one with ReportLab using some or all of the original PDF. There are links below the video if you wish to check out pdfrw
and ReportLab for yourself. Now, how do we install the PyPDF2
package? Well, we do so by using the pip install
command within the Python shell.
02:21 Just like that. Just a side note:
02:29
if you do happen to be using Anaconda rather than regular Python, instead of using the pip install
command within the Python shell, you can instead use the conda install
command. However, if you are like me—and as you can see, I like using the Thonny IDE—there is a better way that you can do that.
02:50 You’ll want to go to Tools > Manage packages…. In here, you can search for PyPDF2, Find package, and Install.
02:57 It will go through its setup phase, and there you go.
03:08
We now have PyPDF2
installed. So, in case you missed that, that’s Tools > Manage packages…,
03:15 and as you can see now that I’ve gone back into it, PyPDF2 is there and it says to Uninstall, so you know it’s there. Using the package manager is also covered in the Real Python tutorial for Thonny.
03:28
A link to this tutorial is available below the video. Now that you have managed to install the PyPDF2
package, it is time to extract some information from a PDF. In order to do so, however, you’ll have to join me in the next part of this course. See you there.
Become a Member to join the conversation.