Locked learning resources

Join us and get access to thousands of tutorials and a community of expert Pythonistas.

Unlock This Lesson

Locked learning resources

This lesson is for members only. Join us and get access to thousands of tutorials and a community of expert Pythonistas.

Unlock This Lesson

How to Work With a PDF in Python (Summary)

The PyPDF2 package is quite useful and is usually pretty fast. You can use PyPDF2 to automate large jobs and leverage its capabilities to help you do your job better!

In this course, you learned how to do the following:

  • Extract metadata from a PDF
  • Rotate pages
  • Merge and split PDFs
  • Add watermarks
  • Add encryption

Also keep an eye on the newer PyPDF4 package as it will likely replace PyPDF2 soon. You might also want to check out pdfrw, which can do many of the same things that PyPDF2 can do.

If you’d like to learn more about working with PDFs in Python, then you should check out some of the following resources for more information:

Download

Course Slides (.pdf)

153.8 KB
Download

Sample Code (.zip)

2.7 KB
Download

Course Documents (.zip)

3.4 MB
Avatar image for mikesult

mikesult on March 1, 2020

Thank you Andrew for a great and very useful tutorial. I learned a lot about working with PDFs. I use pdf files as music charts quite a bit and these techniques will be very useful to split, merge and organize charts from pdf books. I appreciate your links to additional resources too.

Avatar image for fahmico

fahmico on March 5, 2020

Thank you for the tutorial! You explain very well.^_^ This is really worth to learn.

Avatar image for Andrew Stephen

Andrew Stephen RP Team on March 6, 2020

Hi @mikesult. Thanks for the feedback, glad you enjoyed the course and that you will be getting almost immediate real world use from what you have learnt.

Avatar image for Andrew Stephen

Andrew Stephen RP Team on March 6, 2020

Hi @fahmico, Thanks for the kind words. Glad you enjoyed it!

Avatar image for rgusaas

rgusaas on March 7, 2020

Ditto on excellent presentation. The ReportLab reference was a real eye opener. Greatly appreciated.

Perhaps another lesson on reading a PDF’s contents. I wrote a PDF reader that would split a 100+ Page invoice document into separate pages and pulled the account manager name, invoice number and job number for the output file naming convention. Seems that most of the world struggles with how to strip out contents or search the contents of PDF files.

Avatar image for sion

sion on March 23, 2020

Many thanks for an excellent and useful presentation. Some years ago I scraped PDF’s for this information. It was MESSY. Now, “never again” Thank you.

Avatar image for Alan ODannel

Alan ODannel on April 14, 2020

Very informative lesson. I’ll be able to put this to use in the near future.

Avatar image for dthomas01

dthomas01 on April 14, 2020

I’m late to the party....really enjoyed this tutorial. Thought I would mention that PyPDF2 hangs in the middle of writing out the encrypted PDF file. Switching to the newer PyPDF4 you earlier mentioned solved that issue. I’m using Python 3.7 on Windows 10 Pro. The rest of the programs ran flawlessly. Very impressive and hope you keep up the good work, Andrew!

Avatar image for Felix M

Felix M on May 24, 2020

Very informative course. Thank you!

Avatar image for andresfmesad

andresfmesad on Sept. 14, 2021

Very well explained! Is there a way to write a pandas dataframe to a PDF file and specify some format?

Avatar image for Hugh Tipping

Hugh Tipping on Sept. 14, 2021

Very happy with this presentation. It gives a solid foundation in starting to work with PDFs with enough outside reference material to keep me busy for a long time. Many thanks.

Become a Member to join the conversation.