Watermarking and Encrypting PDFs
In this lesson, you’ll learn how to protect your PDFs by adding watermarks and enabling password protection.
You can check out the following resources:
00:00 Welcome back to the Real Python course on how to work with PDFs in Python. This is part 5, where you will learn how to add a watermark to your PDFs, as well as encrypt PDFs. Firstly, what is a watermark?
00:14 A watermark is an identifying image or pattern on printed and digital documents. Some watermarks can only be seen in special lighting conditions. The reason that watermarking is important is because it allows you to protect your intellectual property, whether that be PDFs, images, or other original works. Watermarks may also at times be called overlays.
00:41
You can use PyPDF2
to add your watermark to a PDF file. You just need to have a PDF that only contains your watermark image or text. This is how you would use PyPDF2
in order to add a watermark to your PDF document.
00:59
Firstly, as always, you import PdfFileWriter
and PdfFileReader
from PyPDF2
, and then you define create_watermark()
.
01:11
create_watermark()
accepts three arguments: input_pdf
, which is the file path of the PDF to be watermarked, output
, which is the desired path of the watermarked PDF, and watermark
, which is the path of the PDF that contains the watermark.
01:31
What you do next is open the watermark
PDF and take the first page, as that is where the watermark should be. And that is achieved here in these first two lines. As you can see, it’s getting page 0
and adding it to the watermark_page
variable.
01:48
Then, you create a pdf_reader
object using the input_pdf
and the generic pdf_writer
object for writing out the watermarked PDF—these two lines here.
02:02
The next step is to iterate over the pages of the input PDF. You’ll need to call .mergePage()
—just there—and pass it the watermark_page
. Once you do that, it will overlay the watermark_page
on top of the current page
.
02:20
Then, you add that newly-merged page
to your pdf_writer
object, just here.
02:28
The final step is to then write out the newly-watermarked PDF to disk, and that’s it! And again, you have the if __name__ == '__main__':
block to selectively choose when the create_watermark()
method runs. And you have input_pdf
, which is the 'reportlab-sample.pdf'
, the output
, which is the 'watermarked_notebook.pdf'
, and watermark
, which is 'watermark.pdf'
.
03:01 So if we quickly run this—
03:07 success! It ran. And you can have a look right here
03:13 that you now have the ReportLab document that we had for the extraction lesson, and you can see the opaque text on the page here. Now, you can use pretty much anything you want as a watermark, but I would suggest that you use something opaque so that you don’t discourage people from reading the document, but it also means that it’s clearly protected.
03:40 And this is a good example of that, with the PRE-RELEASE COPY DO NOT DISTRIBUTE.
03:45
Now, let’s show you the final thing you will learn within the PyPDF2
module, which is encryption. At this stage, PyPDF2
only supports adding a user password and an owner password to a pre-existing PDF.
04:01
The difference between these two types of passwords is that a user password will only allow you to open the document, whereas an owner password will essentially give you admin privileges and allow you to set permissions on the PDF. Despite this, it would appear that PyPDF2
doesn’t allow you to actually set any document permissions, even though it allows you to add the owner password. Regardless, this example will show you how to add a password, which will inherently encrypt the PDF. Now, again, we start by importing PdfFileWriter
and PdfFileReader
from the PyPDF2
module, and then define the add_encryption()
method, which takes the input and output PDF paths, as well as the password that you wish to add to the PDF. As you can see here, input_pdf
, output_pdf
, and password
.
05:00 It then opens a PDF writer and a reader object, similar to earlier examples, just here.
05:09
And while we do this, we pass the input_pdf
to the reader object. Seeing as you will want to encrypt the entire input PDF, you will need to loop over all of the pages and add them to the writer, which is what you do just here on lines 9 and 10.
05:27
The last step is to call the .encrypt()
method, which takes the user_pwd
(user password), here, owner_pwd
(owner password), here—in this case, it’s None
, but you can of course change that to whatever you want—and whether or not 128-bit encryption should be added. The default state for this option is True
, just like you can see here. If it is set to False
, then 40-bit encryption will instead be applied.
05:56
We then have another context manager with the with open()
code block, which creates the encrypted PDF, quite similar to the adding a watermark example.
06:07
It’s just that in this case, it’s writing a protected PDF as opposed to a watermarked PDF. And then we can take a look at the final few lines. Yet again, we have the if __name__ == '__main__':
.
06:22
And if you take a look at what is being passed to the add_encryption()
method, in this case, we have input_pdf
, which is the 'reportlab-sample.pdf'
that we’ve been using on and off throughout this course, output_pdf
, which is 'reportlab-encrypted.pdf'
, which is the one we’re going to create with the password.
06:42
And finally, the password
, which is 'twofish'
—just a random password.
06:48 So now if we take a quick look at once this has run—
06:55
and it’s done! So now, as mentioned earlier, the output_pdf
name is 'reportlab-encrypted.pdf'
, so let’s take a look at that file. As you can see, the document is asking for a password in order to view it.
07:13
So, let’s try threefish
,
07:18 and it doesn’t work. Some other number of fish,
07:24
and it doesn’t work. So if we try the twofish
,
07:30 it works. We can now view the PDF.
07:33 Now, just as a side note, PDF encryption uses either RC4, which is Rivest Cipher 4, or AES, which is Advanced Encryption Standard, in order to encrypt the PDF.
07:47 This is according to pdflib.com. Please keep in mind that encrypting a PDF does not necessarily mean that it is secure. There are tools that can remove passwords from PDFs.
07:59 If you would like to learn more, the Carnegie Mellon University has an interesting paper on the topic. The link to this paper is below the video, and if you download the slides, the Carnegie Mellon University paper line is also a link. That concludes the tutorial portion of this course.
08:14 I hope that you will join me next time to review the content and go through some suggested further readings.
Chris Bailey RP Team on April 28, 2020
Hi @John B, The resources are now included in the “Supporting Material” drop down, just below the lesson video. The file name is “Course Documents”. Thanks for catching this.
Become a Member to join the conversation.
John B on April 28, 2020
I appreciate the work that has gone into this lesson and it is very helpful. I am unable to find watermark.pdf to complete the watermark lesson. Can you point me to the file?