Set Up Multiple Replacement Rules
There are a few more replacements that you need to make to the transcript to get it into a format acceptable for independent review. Shorten or remove the timestamps and replace the usernames with Agent and Client. Now that you are starting to have more strings to replace, chaining on
.replace() is going to get repetitive.
Start with a file named
transcript_multiple_replace.py and add the transcript as a multiline string. Okay, now let’s think a moment. The way
.replace() works is that you provide an argument pair.
So one idea could be to keep a list of tuples with two items in each tuple. The two items would correspond to the arguments that you need to pass into the
.replace() method, the string to replace, and the replacement string.
"BLASTED" is in uppercase, and the
"Blast" right now is with an uppercase letter at the start. Then you add the first part of the timestamp, so that’s
"2022-08-24" and then uppercase
"T", and an empty string (
"") to replace it.
02:39 In this version of your transcript-cleaning script, you created a list of replacement tables, which gives you a quick way to add replacements. You then iterate over the list of replacement tuples.
In each iteration, you call
.replace() on the string, populating the arguments with the old and new variables that have been unpacked from each replacement table. With this, you’ve made a big improvement in the overall readability of the transcript.
Maybe there is another agent or another client. Also, replacing the swear words won’t work if there’s another variation—for example, using
"ing" or with a different capitalization. And you are right.
03:58 These are valid concerns. But you know what? I got you covered, just as if I already knew what concerns you would have. What a coincidence. You’ll learn how to handle all of these concerns in the next lesson.
Become a Member to join the conversation.