How to Work With a PDF in Python (Summary)
The PyPDF2
package is quite useful and is usually pretty fast. You can use PyPDF2
to automate large jobs and leverage its capabilities to help you do your job better!
In this course, you learned how to do the following:
- Extract metadata from a PDF
- Rotate pages
- Merge and split PDFs
- Add watermarks
- Add encryption
Also keep an eye on the newer PyPDF4
package as it will likely replace PyPDF2
soon. You might also want to check out pdfrw
, which can do many of the same things that PyPDF2
can do.
If you’d like to learn more about working with PDFs in Python, then you should check out some of the following resources for more information:
- The
PyPDF2
website - The Github page for
PyPDF4
- The Github page for
pdfrw
- The ReportLab website
- The Github page for
PDFMiner
- Camelot: PDF Table Extraction for Humans
Congratulations, you made it to the end of the course! What’s your #1 takeaway or favorite thing you learned? How are you going to put your newfound skills to use? Leave a comment in the discussion section and let us know.
00:00
Welcome to the sixth and final part of the Real Python course on how to work with PDFs in Python. This course covered PyPDF2
history, an alternative PDF manipulation package called pdfrw
, and the installation of the PyPDF2
module. Extracting document metadata was then covered, followed by rotating pages, merging and splitting PDFs, adding watermarks, and encryption.
00:28
This was all done using the PyPDF2
module. Moving forward, you should also keep an eye out for PyPDF4
, which is likely to replace PyPDF2
soon. As mentioned earlier, checking out pdfrw
might also help you as it has much the same functionality and capabilities as PyPDF2
, barring encryption.
00:51
If you’d like to learn more about using Python to work with PDFs, you should check out the following resources, all of which are linked below the video. Again, if you download the slide presentation, each line is a link to itself, starting with the PyPDF2
website, the GitHub pages for PyPDF4
as well as pdfrw
, the ReportLab website, the GitHub page for PDFMiner
, which as mentioned earlier is a more robust option for extracting text from a PDF, and Camelot: PDF Table Extraction for Humans. Well done on completing the Real Python course for working with PDFs in Python. I’m Andrew Stephen, and thanks for joining me on this road of PDF manipulation. See you next time.
fahmico on March 5, 2020
Thank you for the tutorial! You explain very well.^_^ This is really worth to learn.
Andrew Stephen RP Team on March 6, 2020
Hi @mikesult. Thanks for the feedback, glad you enjoyed the course and that you will be getting almost immediate real world use from what you have learnt.
Andrew Stephen RP Team on March 6, 2020
Hi @fahmico, Thanks for the kind words. Glad you enjoyed it!
rgusaas on March 7, 2020
Ditto on excellent presentation. The ReportLab reference was a real eye opener. Greatly appreciated.
Perhaps another lesson on reading a PDF’s contents. I wrote a PDF reader that would split a 100+ Page invoice document into separate pages and pulled the account manager name, invoice number and job number for the output file naming convention. Seems that most of the world struggles with how to strip out contents or search the contents of PDF files.
sion on March 23, 2020
Many thanks for an excellent and useful presentation. Some years ago I scraped PDF’s for this information. It was MESSY. Now, “never again” Thank you.
Alan ODannel on April 14, 2020
Very informative lesson. I’ll be able to put this to use in the near future.
dthomas01 on April 14, 2020
I’m late to the party....really enjoyed this tutorial. Thought I would mention that PyPDF2 hangs in the middle of writing out the encrypted PDF file. Switching to the newer PyPDF4 you earlier mentioned solved that issue. I’m using Python 3.7 on Windows 10 Pro. The rest of the programs ran flawlessly. Very impressive and hope you keep up the good work, Andrew!
Felix M on May 24, 2020
Very informative course. Thank you!
andresfmesad on Sept. 14, 2021
Very well explained! Is there a way to write a pandas dataframe to a PDF file and specify some format?
Hugh Tipping on Sept. 14, 2021
Very happy with this presentation. It gives a solid foundation in starting to work with PDFs with enough outside reference material to keep me busy for a long time. Many thanks.
Become a Member to join the conversation.
mikesult on March 1, 2020
Thank you Andrew for a great and very useful tutorial. I learned a lot about working with PDFs. I use pdf files as music charts quite a bit and these techniques will be very useful to split, merge and organize charts from pdf books. I appreciate your links to additional resources too.