Reading and Writing CSV Data
00:00 In this lesson, you’ll learn how to read and write data using the comma-separated values file format in Python. The CSV file format is a common way to store tabular data, such as a database table or a spreadsheet like the one here, using a plain text file.
00:41 If that’s the case, then you’ll need to know up front what the individual columns in your file mean. Feel free to download and use this example CSV file. To do so, you can choose the Sample Code link from the Supporting Materials dropdown that you’ll find below this video. Now, because this CSV file format is relatively straightforward, it may feel tempting to try and read it using the file object’s methods that you learned in this course. However, it’s too easy to get it wrong because of the little edge cases and the fact that the CSV format had no unified standard for a number of years, so there are actually a couple of different dialects in use today.
You’re much better off using Python’s
csv module from the standard library, which can take care of all that and more. Let’s head over to IDLE to see it in action. First, you’re going to need to import the
csv module and the
Path object from
pathlib, although you could also use the built-in
open() function if you prefer. However, because you’re going to open the same file multiple times, using a separate
Path object will let you reuse it without having to type the filename again each time.
Then you’ll use the usual
with statement followed by a call to
path.open(). You want to keep the default read-only mode, set the encoding to UTF-8, and in this case, you also want to set the newline to an empty string in order to disable the universal newline translation.
On the other hand, if you don’t specify the
newline parameter with an empty string when you open a file, then on some systems, particularly Windows, you’ll end up with mixed-up newline characters, or you won’t be able to process values that contain newlines themselves.
02:55 The resulting CSV reader is iterable, which means that you can loop over it just like you would with a regular file object. However, instead of yielding lines or strings, the reader provides a sequence of rows, which are lists of strings.
03:28 This isn’t something you would easily implement yourself or likely even be aware of. The first row returned by the CSV reader contains column names, so with this approach, you have to recognize or know whether the file starts with a header or not.
So you have to do this conversion yourself. But the
csv module offers a slightly more convenient way of reading those data rows. Let’s open the same file again for reading, but without starting a new
with code block.
As long as your file starts with a header,
DictReader will detect the column names and let you reveal them through the
.fieldnames attribute. If the file doesn’t have a header, then you must pass an appropriate list of column names through the
fieldnames argument of the
DictReader initializer method right after the file object reference.
04:57 Moreover, every row is now a Python dictionary, whose keys are the column names, and values are the corresponding data points from the file. Therefore, instead of using a list index to access an element, you can refer to a field by the column name.
06:06 If the provided values are not strings, then Python will get their string representations before writing them to the file. Note that you could use tuples or other sequence types instead of lists if you’d like to.
These names are mandatory, even though it’s up to you whether to write the header row or not. If you do want it in the resulting file, then call
.writeheader(). Otherwise, the column names won’t be visible in the file. Like before, the writer object provides the two
07:56 This code may look a bit more cumbersome when you write it by hand, but these rows will typically be returned by some helper function or an object, so it’s not a big deal in practice. Also, the rows might come from another data source, such as a JSON document, which resembles a Python dict.
Then you can wrap the opened file with a
csv.reader() object, which allows you to loop over the file’s rows. In this case, the CSV reader will yield a sequence of string values corresponding to the individual fields in every row of the file.
Alternatively, if you know that your file comes with a header containing the column names, then you can use a more specialized
DictReader, which will turn each row into a Python dict with those column names as keys.
10:03 There’s so much more you can do with CSV files that wouldn’t fit in this short Python Basics video course. However, if you’re interested in this topic, then feel free to check out other video courses and tutorials about CSV files on Real Python. You can find the relevant links below this video.
10:21 The last lesson is the summary, which will give you a few little exercises to practice the knowledge you’ve gained and a couple of additional resources to continue your exploration of reading and writing files in Python.
Become a Member to join the conversation.