Loading video player…

Investigate the Transcript

Whenever you’re working with data, it’s always a good idea to inspect that data before writing any code. In this case, you’re working with the following chat transcript:

[support_tom] 2022-08-24T10:02:23+00:00 : What can I help you with?
[johndoe] 2022-08-24T10:03:15+00:00 : I CAN'T CONNECT TO MY BLASTED ACCOUNT
[support_tom] 2022-08-24T10:03:30+00:00 : Are you sure it's not your caps lock?
[johndoe] 2022-08-24T10:04:03+00:00 : Blast! You're right!

Do you notice any patterns?

00:00 Again, here is the scenario for this video course. A client named John Doe has filed a complaint, and the policy of your company is that you need to sanitize and simplify the transcript of this complaint before sending it for independent evaluation.

00:15 On this slide, you’re seeing the transcript that you’ll work with. Your task is to take care of the message sanitation. In other words, cleaning up this chat transcript. This transcript might be short, but it’s a perfect example of the chats that support agents handle on a regular basis. When you get data to work with, it’s a good idea to investigate the structure of the data.

00:41 What this means is that you have a look at the data and try to find patterns that the data shares. So before you continue, have a look at this transcript.

00:50 Can you spot the components that this transcript contains? If you want, you can pause this video for this. You can also find the chat transcript in the text below this video.

01:02 Every line of the transcript includes three components: a user identifier, an ISO timestamp, and a message. In particular, you have the user identifiers support_tom and johndoe, both in square brackets.

01:19 The timestamps all contain the date of the 24th of August 22. Then there is a letter, T, and then the time and time zone info. This timestamp format is in the ISO 8601 format.

01:34 That’s the international standard for date- and time-related data. We’ll deal with this format in a bit. For now, let’s just be happy that all the timestamps have the same format.

01:45 And then there are the messages. One message is in uppercase, and two of them contain a swear word. To sanitize the transcript, the first thing we’ll do is to take care of any swear words, and in the next lesson, you’ll learn how Python can help you with this.

Become a Member to join the conversation.