Working With JSON Files
00:00 Working with JSON files. JSON stands for JavaScript Object Notation. JSON files are plaintext files used for data interchange, and humans can read them easily.
00:13
They follow the standard seen onscreen and use the .json
extension. Python and pandas work well with JSON files, as Python’s json
library offers built-in support for them.
00:26
You can save the data from your DataFrame to a JSON file with .to_json()
. This code produces the file data-columns.json
. You can see its contents onscreen now. It has one large dictionary with the column labels as keys and the corresponding inner dictionaries as values.
00:52
You can get a different file structure if you pass an argument for the optional parameter orient
as seen onscreen now.
01:05
The orient
parameter defaults to 'columns'
, but here it’s been set to 'index'
. You should get a new file data-index.json
, whose contents you can see onscreen now. You can see that it also has one large dictionary, but this time the row labels are the keys and the inner dictionaries are the values. There are a few more options for orient
.
01:29
One of them is 'records'
, as seen onscreen.
01:37
Again, this should create a new file data-records.json
, whose contents you can see onscreen. You can see it holds a list with one dictionary for each row, and the row labels are not written.
01:51
You can get another variation with orient='split'
.
02:01
The resulting file is data-split.json
, whose contents you can see onscreen now. data-split.json
contains one dictionary that holds the following list: the names of the columns, the labels of the rows, the inner lists that hold data values.
02:22
If you don’t provide the value for the optional parameter path_or_buf
that defines the file path, then .to_json()
will return a JSON string instead of writing the results to a file, exactly as you saw earlier on with .to_csv()
.
02:37
There are other optional parameters you can use, for instance, you can set index=False
to forgo saving row labels. You can manipulate precision with double_precision
, and dates with date_format
and date_unit
.
02:52
These last two parameters are particularly important when you have time series amongst your data. In this example, .to_datetime()
has been used to convert the values in the last column to datetime64
.
03:21
You can see the results of this onscreen now. In this file, the dates are represented as large integers. That’s because the default value of the optional parameter date_format
is 'epoch'
whenever orient
isn’t 'table'
.
03:38
This behavior expresses dates as an epoch in milliseconds relative to midnight on January 1, 1970. However, if you pass date_format='iso'
, then you’ll get the dates in the ISO 8601 format.
03:54
In addition, date_unit
decides the units of time.
04:15
You can see the contents of the JSON file produced by this code onscreen now. The dates in this file are in ISO 8601 format. You can load the data from a JSON file with read_json()
.
04:43
The parameter convert_dates
has a similar purpose as parse_dates
when you use it to read CSV files. The optional parameter orient
is very important because it specifies how pandas understands the structure of the file.
05:02
There are other optional parameters you can use as well. You can set the encoding with encoding
. You can use convert_dates
and keep_default_dates
to manipulate dates.
05:13
You can impact the precision with dtype
and precise_float
, and you can decode numeric data directly into NumPy arrays with numpy=True
.
05:24 Note that you might lose the order of rows and columns when using the JSON format to store your data. Next up, working with HTML files.
Brian V on March 1, 2023
same question, anyone can ans this?
Become a Member to join the conversation.
Cindy on July 4, 2022
Hi Darren,
Thank you for the lecture. I am wondering what is the function of the coding: df=pd.DataFrame (Data=Data). T? Thank you.