Set Up Multiple Replacement Rules
00:00
There are a few more replacements that you need to make to the transcript to get it into a format acceptable for independent review. Shorten or remove the timestamps and replace the usernames with Agent and Client. Now that you are starting to have more strings to replace, chaining on .replace()
is going to get repetitive.
00:18 So let’s explore a better way to handle the replacements.
00:24
Start with a file named transcript_multiple_replace.py
and add the transcript as a multiline string. Okay, now let’s think a moment. The way .replace()
works is that you provide an argument pair.
00:38 The first argument is the string that you want to replace, and the second argument is the replacement string. That sounds a lot like a tuple, doesn’t it?
00:48
So one idea could be to keep a list of tuples with two items in each tuple. The two items would correspond to the arguments that you need to pass into the .replace()
method, the string to replace, and the replacement string.
01:04
We make a list named replacements
and add in the tuples. The first one contains "BLASTED"
and the huffing emoji. The second one contains "Blast"
and the huffing emoji.
01:16
The first "BLASTED"
is in uppercase, and the "Blast"
right now is with an uppercase letter at the start. Then you add the first part of the timestamp, so that’s "2022-08-24"
and then uppercase "T"
, and an empty string (""
) to replace it.
01:35
Then you need to add the "+00:00"
. You also replace this with an empty string. The "[support_tom]"
username, which you replace with "Agent "
.
01:46
Don’t forget the space behind the Agent because that will align the columns better. And the tuple "[johndoe]"
that you replace with "Client"
.
01:55
Note that Client doesn’t have a space at the end, so this way, "Client"
and "Agent "
with a space have both the same amount of characters and will align nicely in your clean transcript.
02:07
"[support_tom]"
and "[johndoe"]
both need to be in square brackets. Then you have a square bracket in line 17 to close the replacements
list.
02:17
With the list of replacements in place, you can iterate over the list and call .replace()
on the transcript
string. So, for old, new in replacements:
02:29
transcript = transcript.replace(old, new)
and then you print(transcript)
in the end.
02:39 In this version of your transcript-cleaning script, you created a list of replacement tables, which gives you a quick way to add replacements. You then iterate over the list of replacement tuples.
02:49
In each iteration, you call .replace()
on the string, populating the arguments with the old and new variables that have been unpacked from each replacement table. With this, you’ve made a big improvement in the overall readability of the transcript.
03:05 It’s also easier to add replacements if you need to. So let’s run the script and see what happens.
03:13
Once you’re in the terminal, you can run python
and then the name of your Python file, which is transcript_multiple_replace.py
.
03:21
And that’s a pretty clean transcript. You replace the swear words with the huffing emoji, and you replace the usernames with Agent
and Client
. Also, the timestamps are much more readable now.
03:34 Well done. Your script works perfectly with the provided transcript that you’ve got, but you may ask, what if you get a different transcript later that day?
03:45
Maybe there is another agent or another client. Also, replacing the swear words won’t work if there’s another variation—for example, using "ing"
or with a different capitalization. And you are right.
03:58 These are valid concerns. But you know what? I got you covered, just as if I already knew what concerns you would have. What a coincidence. You’ll learn how to handle all of these concerns in the next lesson.
Become a Member to join the conversation.