Reading and Writing CSV Data

Python Basics: Reading and Writing Files Bartosz Zaczyński 10:33

Transcript
Discussion

00:00 In this lesson, you’ll learn how to read and write data using the comma-separated values file format in Python. The CSV file format is a common way to store tabular data, such as a database table or a spreadsheet like the one here, using a plain text file.

00:18 Most office programs will let you import and export data using CSV files, where each table row becomes a separate line in the file, while the individual columns are delimited with a comma.

00:31 The first line in a CSV file may represent the consecutive column names, which is helpful. However, it’s not always necessary for a CSV file to contain the header row.

00:41 If that’s the case, then you’ll need to know up front what the individual columns in your file mean. Feel free to download and use this example CSV file. To do so, you can choose the Sample Code link from the Supporting Materials dropdown that you’ll find below this video. Now, because this CSV file format is relatively straightforward, it may feel tempting to try and read it using the file object’s methods that you learned in this course. However, it’s too easy to get it wrong because of the little edge cases and the fact that the CSV format had no unified standard for a number of years, so there are actually a couple of different dialects in use today.

01:21 You’re much better off using Python’s csv module from the standard library, which can take care of all that and more. Let’s head over to IDLE to see it in action. First, you’re going to need to import the csv module and the Path object from pathlib, although you could also use the built-in open() function if you prefer. However, because you’re going to open the same file multiple times, using a separate Path object will let you reuse it without having to type the filename again each time.

01:51 Assuming that you’ve downloaded the sample CSV file and placed it in the current working directory, you can specify the path object with a corresponding filename.

02:03 Then you’ll use the usual with statement followed by a call to path.open(). You want to keep the default read-only mode, set the encoding to UTF-8, and in this case, you also want to set the newline to an empty string in order to disable the universal newline translation.

02:23 This is the recommended practice when you work with the csv module in Python, and that’s because the csv module does its own newline conversion.

02:31 On the other hand, if you don’t specify the newline parameter with an empty string when you open a file, then on some systems, particularly Windows, you’ll end up with mixed-up newline characters, or you won’t be able to process values that contain newlines themselves.

02:46 Okay. With that out of the way, you can now create a CSV reader instance by wrapping your file object.

02:55 The resulting CSV reader is iterable, which means that you can loop over it just like you would with a regular file object. However, instead of yielding lines or strings, the reader provides a sequence of rows, which are lists of strings.

03:16 Each row consists of the values delimited with a comma. Notice that if there’s a comma in the value itself, then it gets correctly escaped and appears in the output as expected.

03:28 This isn’t something you would easily implement yourself or likely even be aware of. The first row returned by the CSV reader contains column names, so with this approach, you have to recognize or know whether the file starts with a header or not.

03:44 All the following rows contain data, but they’re not automatically converted to a suitable data type, such as a date object, integer, or floating-point value.

03:54 So you have to do this conversion yourself. But the csv module offers a slightly more convenient way of reading those data rows. Let’s open the same file again for reading, but without starting a new with code block.

04:08 This will make it easier to see how certain things work. And now, instead of creating a regular CSV reader, you’ll create a DictReader, which will wrap each row in a Python dictionary.

04:22 As long as your file starts with a header, DictReader will detect the column names and let you reveal them through the .fieldnames attribute. If the file doesn’t have a header, then you must pass an appropriate list of column names through the fieldnames argument of the DictReader initializer method right after the file object reference.

04:44 The nice thing about DictReader is that it already stripped the header, so when you start iterating over it, you’ll only be concerned with the actual data.

04:57 Moreover, every row is now a Python dictionary, whose keys are the column names, and values are the corresponding data points from the file. Therefore, instead of using a list index to access an element, you can refer to a field by the column name.

05:13 Don’t forget to close the file when you’re done with it. How about writing CSV files in Python? Well, you can use the corresponding writer or DictWriter objects from the csv module.

05:27 Let’s open a new file in the write-only mode with UTF-8 encoding and no universal newline translation.

05:41 The first thing you’ll do is create a CSV writer instance.

05:48 Once you have it, you can start writing rows by calling .writerow() with a list of string values.

06:06 If the provided values are not strings, then Python will get their string representations before writing them to the file. Note that you could use tuples or other sequence types instead of lists if you’d like to.

06:19 It’s also possible to write multiple rows at once with the help of the .writerows() method that takes a sequence of rows.

06:43 Just like the file object’s .write() method, the CSV reader’s .writerow() returns the number of characters written to the file.

06:51 However, .writerows() doesn’t return anything. Using DictWriter is quite similar to its DictReader counterpart. So start by opening the same people.csv file again.

07:03 Just note that using the write-only mode here will overwrite everything you’ve written to the file so far.

07:12 And create a new DictWriter instance with a file object followed by a list of column names.

07:20 These names are mandatory, even though it’s up to you whether to write the header row or not. If you do want it in the resulting file, then call .writeheader(). Otherwise, the column names won’t be visible in the file. Like before, the writer object provides the two .writerow() and .writerows() methods.

07:40 The only difference is that now, you need to provide each row as a Python dictionary with the right keys instead of a list or a tuple.

07:56 This code may look a bit more cumbersome when you write it by hand, but these rows will typically be returned by some helper function or an object, so it’s not a big deal in practice. Also, the rows might come from another data source, such as a JSON document, which resembles a Python dict.

08:17 To sum up, you can use Python’s csv module, available in the standard library, to read text files written in the comma-separated values file format.

08:26 One important detail you should remember is to open the file with the universal newline translation mechanism disabled. This will ensure portability across operating systems.

08:37 You can do this by specifying an empty string as a value for the newline argument, either in the open() function or the Path.open() method.

08:45 Then you can wrap the opened file with a csv.reader() object, which allows you to loop over the file’s rows. In this case, the CSV reader will yield a sequence of string values corresponding to the individual fields in every row of the file.

09:03 Alternatively, if you know that your file comes with a header containing the column names, then you can use a more specialized DictReader, which will turn each row into a Python dict with those column names as keys.

09:19 Writing a CSV file in Python is analogous. You start by creating a writer instance that wraps around the file object, and then you write either a single row or multiple rows in one go.

09:31 In both cases, you’re expected to pass a sequence of values. If a given value isn’t already a string, then Python will automatically convert it to its textual representation.

09:45 As before, you can use the specialized DictWriter, which lets you specify the column names as well as provide the rows as Python dictionaries instead of lists or tuples.

09:55 To include the header in the resulting file, you can call the .writeheader() method.

10:03 There’s so much more you can do with CSV files that wouldn’t fit in this short Python Basics video course. However, if you’re interested in this topic, then feel free to check out other video courses and tutorials about CSV files on Real Python. You can find the relevant links below this video.

10:21 The last lesson is the summary, which will give you a few little exercises to practice the knowledge you’ve gained and a couple of additional resources to continue your exploration of reading and writing files in Python.

Become a Member to join the conversation.