Working With CSV Files
Working with CSV files. You’ve already learned how to read and write CSV files. Now let’s dig a little deeper into the details. When you use
.to_csv() to save your DataFrame, you can provide an argument for the parameter
path_or_buf to specify the path, name, and extension of the target file.
path_or_buf is the first argument
.to_csv() will get.
It can be any string that represents a valid file path that includes the filename and its extension. You’ve already seen this in a previous example. However, if you omit
.to_csv() won’t create any files.
As you can see onscreen, now you have the string instead of a CSV file. You also have some missing values in your
DataFrame object. For example, the content for Russia and the independence days for several countries are not available. In data science and machine learning, you must handle missing values carefully, and pandas excels here.
This code produces the file
new-data.csv, where the missing values are no longer empty strings. You can see the contents of that file onscreen now. Note, the string missing in the file corresponds to the
nan values from the DataFrame.
When pandas reads files, it considers the empty string and a few others as missing values by default. If you don’t want this behavior, then you can pass
keep_default_na=False to the pandas
read_csv() function. To specify other labels for missing values, use the parameter
There are several other optional parameters that you can use with
sep denotes a value separator,
decimal indicates a decimal separator,
encoding sets the file encoding, and
header specifies whether you want to write column labels in the file.
The data is now separated with a semicolon, and because
header=False, the data is represented without the header row of column names. The pandas
read_csv() function has many additional options for managing missing data, working with dates and times, quoting, encoding, handling errors, and much more. For instance, if you have a file with one data column and want to get a
Series object instead of a
DataFrame, then you can pass
Become a Member to join the conversation.