Choosing the Line Ending
00:00 In this lesson, you’ll learn when and how to choose the line ending for the files that you open in Python. When you open a file in text mode, Python gives you the option to read or write the file contents a few characters at a time, all at once into a Python string, or line by line.
00:21 Python knows where each line ends thanks to the so-called newline character, which is present at the end of each line. You don’t normally get to see this character because it’s a special kind of control character with no visual representation, although some text editors make it possible to mark line endings and other whitespace characters such as indentation. Under the surface, the whole text file is stored as one long sequence of characters, some of which are those special characters that indicate where to break the line.
When you ask Python to load the next line of text from a file, then you’ll get a string object that includes the newline character at the end. The last character in the string will be denoted with
\n, which is a special sequence of characters representing the newline character.
01:13 Even though that line ending is a non-printable character, meaning you won’t see it when printing the line, it actually counts toward the total length of the string. Therefore, you can strip the newline from the right end of the string if you’d like to disregard it, which is probably what you’ll want to do in most cases anyway.
01:42 So far, so good. Unfortunately, different operating systems use different newline characters to denote the end of the line in text files. This can cause problems when you share your file with someone else who uses a different operating system.
02:10 The first line ending used to be common in Apple’s classic Mac OS X operating system, but that was a really long time ago. Today, macOS uses the same newline character as Unix or Linux distributions, such as Ubuntu.
02:26 The third line ending is specific to Microsoft Windows or DOS. Interestingly enough, the Windows-style newline is also used by some network protocols like HTTP to delimit parts of the message. For historical reasons, people keep using the names carriage return and line feed, which refer to control characters that were originally sent to a physical typewriter or a printer, instructing the device to move the paper up one line and return the carriage to the beginning of the line.
03:32 Fortunately, Python does a great job of handling different newline characters for you by automatically converting between the platform-specific and platform-agnostic representations of the newline character.
03:47 By default, Python enables the so-called universal newline handing mechanism to cope with this issue. So when you open a text file for reading, Python will recognize different newline characters and translate them to a line feed in the background.
04:12 The last line in the file may or may not be terminated by a newline character, depending on if it’s followed by a blank line or not. In this case, there was no blank line in the file, so the third line has no newline character at the end.
Conversely, you can stick to using only the line feed or a
\n everywhere in your string literals before writing them to a file, and Python will translate those line feeds to a suitable newline character of your platform. Thanks to the universal newline mechanism in Python, you can run essentially the same code on different platforms without worrying about the newline characters again. That being said, you can disable the universal newline and take control over the line ending if you absolutely need to.
The way to override the universal newline is by providing one of the few predefined values for the optional
newline parameter when you call the built-in
open() function or the
Path.open() method. In this case, you ask Python to write the Windows-style newline character regardless of which operating system this code will be running on.
05:55 You’ll be able to determine where each line ends, but it’ll retain the original line ending instead of being replaced with the universal line feed. This can be convenient when you want to edit an existing file while preserving its original control characters. Finally, you can explicitly set one of the three newline characters to manually handle line endings. This will disable the universal newline mechanism, and Python will read the file as is without translating the newline characters.
The meaning of the
newline parameter is slightly different when you write content to a text file in Python. You can use it to control the translation process of the line feed character in your Python strings before saving them to the file.
The default value of
None will make Python translate
\n characters to your operating system’s default line separator. If the
newline parameter is either an empty string or
\n, then no translation takes place, which means that whatever newline your strings contain will be preserved.
Any other legal value—that is,
\r or the Windows-style newline—will result in translating the line feed in your strings into the requested newline. In conclusion, the
newline parameter is a powerful tool that you can use to control the way Python handles line endings in text files.
The default value of
None is usually your best choice, but you can override it to preserve the original line endings of an existing file or to control the translation process when you write content to a text file. Next up, you’ll explore the different modes you can read and write files in Python.
Become a Member to join the conversation.