Choosing the Line Ending
00:00 In this lesson, you’ll learn when and how to choose the line ending for the files that you open in Python. When you open a file in text mode, Python gives you the option to read or write the file contents a few characters at a time, all at once into a Python string, or line by line.
00:21 Python knows where each line ends thanks to the so-called newline character, which is present at the end of each line. You don’t normally get to see this character because it’s a special kind of control character with no visual representation, although some text editors make it possible to mark line endings and other whitespace characters such as indentation. Under the surface, the whole text file is stored as one long sequence of characters, some of which are those special characters that indicate where to break the line.
00:56
When you ask Python to load the next line of text from a file, then you’ll get a string object that includes the newline character at the end. The last character in the string will be denoted with \n
, which is a special sequence of characters representing the newline character.
01:13 Even though that line ending is a non-printable character, meaning you won’t see it when printing the line, it actually counts toward the total length of the string. Therefore, you can strip the newline from the right end of the string if you’d like to disregard it, which is probably what you’ll want to do in most cases anyway.
01:32 Forgetting to strip the newline character is a common mistake when reading text files in Python, which can lead to surprising results, so keep that in mind.
01:42 So far, so good. Unfortunately, different operating systems use different newline characters to denote the end of the line in text files. This can cause problems when you share your file with someone else who uses a different operating system.
02:00 The three major types of newline characters that you can find in the wild are carriage return, line feed, and one immediately followed by the other.
02:10 The first line ending used to be common in Apple’s classic Mac OS X operating system, but that was a really long time ago. Today, macOS uses the same newline character as Unix or Linux distributions, such as Ubuntu.
02:26 The third line ending is specific to Microsoft Windows or DOS. Interestingly enough, the Windows-style newline is also used by some network protocols like HTTP to delimit parts of the message. For historical reasons, people keep using the names carriage return and line feed, which refer to control characters that were originally sent to a physical typewriter or a printer, instructing the device to move the paper up one line and return the carriage to the beginning of the line.
02:59
Carriage return has the ASCII code 13, which is often represented with a special sequence \r
, while the line feed has the ASCII code 10, represented with a \n
.
03:12
As you can see, the line ending can sometimes be represented with two special characters together—that is, \r\n
.
03:22 Historically, the existence of these three different types of newline characters has caused compatibility problems when transferring files across systems.
03:32 Fortunately, Python does a great job of handling different newline characters for you by automatically converting between the platform-specific and platform-agnostic representations of the newline character.
03:47 By default, Python enables the so-called universal newline handing mechanism to cope with this issue. So when you open a text file for reading, Python will recognize different newline characters and translate them to a line feed in the background.
04:04
Note that the universal newline in Python is always represented with a line feed character denoted with \n
.
04:12 The last line in the file may or may not be terminated by a newline character, depending on if it’s followed by a blank line or not. In this case, there was no blank line in the file, so the third line has no newline character at the end.
04:29
Conversely, you can stick to using only the line feed or a \n
everywhere in your string literals before writing them to a file, and Python will translate those line feeds to a suitable newline character of your platform. Thanks to the universal newline mechanism in Python, you can run essentially the same code on different platforms without worrying about the newline characters again. That being said, you can disable the universal newline and take control over the line ending if you absolutely need to.
05:01
The way to override the universal newline is by providing one of the few predefined values for the optional newline
parameter when you call the built-in open()
function or the Path.open()
method. In this case, you ask Python to write the Windows-style newline character regardless of which operating system this code will be running on.
05:25
Here’s a quick summary of the possible values. When you open a text file for reading, the newline
argument has a default value of None
.
05:34
This indicates that the universal newline mechanism is enabled, in which case Python will translate any type of newline it finds in the file to a line feed or \n
.
05:44
If you specify an empty string literal as the value for the newline
parameter, then the universal newline mechanism will still be enabled, but Python won’t do the translation for you anymore.
05:55 You’ll be able to determine where each line ends, but it’ll retain the original line ending instead of being replaced with the universal line feed. This can be convenient when you want to edit an existing file while preserving its original control characters. Finally, you can explicitly set one of the three newline characters to manually handle line endings. This will disable the universal newline mechanism, and Python will read the file as is without translating the newline characters.
06:28
The meaning of the newline
parameter is slightly different when you write content to a text file in Python. You can use it to control the translation process of the line feed character in your Python strings before saving them to the file.
06:42
The default value of None
will make Python translate \n
characters to your operating system’s default line separator. If the newline
parameter is either an empty string or \n
, then no translation takes place, which means that whatever newline your strings contain will be preserved.
07:01
Any other legal value—that is, \r
or the Windows-style newline—will result in translating the line feed in your strings into the requested newline. In conclusion, the newline
parameter is a powerful tool that you can use to control the way Python handles line endings in text files.
07:20
The default value of None
is usually your best choice, but you can override it to preserve the original line endings of an existing file or to control the translation process when you write content to a text file. Next up, you’ll explore the different modes you can read and write files in Python.
Become a Member to join the conversation.