How to Work With a PDF in Python

Andrew Stephen
Andrew Stephen 6 Lessons 31m intermediate

The Portable Document Format or PDF is a file format that can be used to present and exchange documents reliably across operating systems. While the PDF was originally invented by Adobe, it is now an open standard that is maintained by the International Organization for Standardization (ISO). You can work with a preexisting PDF in Python by using the PyPDF2 package.

PyPDF2 is a pure-Python package that you can use for many different types of PDF operations.

By the end of this course, you’ll know how to:

  • Extract document information from a PDF in Python
  • Rotate pages
  • Merge PDFs
  • Split PDFs
  • Add watermarks
  • Encrypt a PDF

What’s Included:

Downloadable Resources:

About Andrew Stephen

Andrew is an avid Pythonista and creates video tutorials for Real Python. He is a qualified robotics and mechatronics engineer who works for an engineering firm as a production engineer and loves his sport, music, gaming and learning.

» More about Andrew

Each tutorial at Real Python is created by a team of developers so that it meets our high quality standards. The team members who worked on this tutorial are:

Participant Comments

Avatar image for dthomas01

dthomas01 on April 14, 2020

I’m late to the party....really enjoyed this tutorial. Thought I would mention that PyPDF2 hangs in the middle of writing out the encrypted PDF file. Switching to the newer PyPDF4 you earlier mentioned solved that issue. I’m using Python 3.7 on Windows 10 Pro. The rest of the programs ran flawlessly. Very impressive and hope you keep up the good work, Andrew!

Avatar image for Alan ODannel

Alan ODannel on April 14, 2020

Very informative lesson. I’ll be able to put this to use in the near future.

Avatar image for sion

sion on March 23, 2020

Many thanks for an excellent and useful presentation. Some years ago I scraped PDF’s for this information. It was MESSY. Now, “never again” Thank you.

Avatar image for rgusaas

rgusaas on March 7, 2020

Ditto on excellent presentation. The ReportLab reference was a real eye opener. Greatly appreciated.

Perhaps another lesson on reading a PDF’s contents. I wrote a PDF reader that would split a 100+ Page invoice document into separate pages and pulled the account manager name, invoice number and job number for the output file naming convention. Seems that most of the world struggles with how to strip out contents or search the contents of PDF files.

Avatar image for mikesult

mikesult on March 1, 2020

Thank you Andrew for a great and very useful tutorial. I learned a lot about working with PDFs. I use pdf files as music charts quite a bit and these techniques will be very useful to split, merge and organize charts from pdf books. I appreciate your links to additional resources too.

← Browse All Courses