How to Work With a PDF in Python (Overview)
The Portable Document Format or PDF is a file format that can be used to present and exchange documents reliably across operating systems. While the PDF was originally invented by Adobe, it is now an open standard that is maintained by the International Organization for Standardization (ISO). You can work with a preexisting PDF in Python by using the PyPDF2
package.
PyPDF2
is a pure-Python package that you can use for many different types of PDF operations.
By the end of this course, you’ll know how to:
- Extract document information from a PDF in Python
- Rotate pages
- Merge PDFs
- Split PDFs
- Add watermarks
- Encrypt a PDF
00:00
Hello there. My name is Andrew from Real Python, and today I am going to take you through working with PDFs in Python using the PyPDF2
package. Through this course, you will learn a brief history of PyPDF2
and its other incarnations and be briefly introduced to a potential alternative, pdfrw
. Installation of the package will then be covered, and once that is covered, you will be shown various ways of manipulating PDFs—including extracting document information from a PDF, rotating pages, merging PDFs into a single file, splitting PDFs, adding watermarks, and finally, encrypting PDFs.
00:41
Some further reading will then be recommended if you wish to dive even further into PDF manipulation. Before exploring the PyPDF2
pure Python package, what is a PDF? Well, a PDF, or Portable Document Format, is a file format that can be used to reliably exchange documents across operating systems. While it was initially invented by Adobe, it is now an open standard document format, which is maintained by the International Organization for Standardization, or ISO.
01:15
Join me in the next video, where we will cover PyPDF2
’s history, cover pdfrw
as an alternative package, and the steps to install the PyPDF2
package. I’ll see you then.
Become a Member to join the conversation.
dungdh286 on Feb. 26, 2020
thank you, this course very helpfull and interesting !