Working With JSON Files

Reading and Writing Files With pandas Darren Jones 05:35

00:00 Working with JSON files. JSON stands for JavaScript Object Notation. JSON files are plaintext files used for data interchange, and humans can read them easily.

00:13 They follow the standard seen onscreen and use the .json extension. Python and pandas work well with JSON files, as Python’s json library offers built-in support for them.

00:26 You can save the data from your DataFrame to a JSON file with .to_json(). This code produces the file data-columns.json. You can see its contents onscreen now. It has one large dictionary with the column labels as keys and the corresponding inner dictionaries as values.

00:52 You can get a different file structure if you pass an argument for the optional parameter orient as seen onscreen now.

01:05 The orient parameter defaults to 'columns', but here it’s been set to 'index'. You should get a new file data-index.json, whose contents you can see onscreen now. You can see that it also has one large dictionary, but this time the row labels are the keys and the inner dictionaries are the values. There are a few more options for orient.

01:29 One of them is 'records', as seen onscreen.

01:37 Again, this should create a new file data-records.json, whose contents you can see onscreen. You can see it holds a list with one dictionary for each row, and the row labels are not written.

01:51 You can get another variation with orient='split'.

02:01 The resulting file is data-split.json, whose contents you can see onscreen now. data-split.json contains one dictionary that holds the following list: the names of the columns, the labels of the rows, the inner lists that hold data values.

02:22 If you don’t provide the value for the optional parameter path_or_buf that defines the file path, then .to_json() will return a JSON string instead of writing the results to a file, exactly as you saw earlier on with .to_csv().

02:37 There are other optional parameters you can use, for instance, you can set index=False to forgo saving row labels. You can manipulate precision with double_precision, and dates with date_format and date_unit.

02:52 These last two parameters are particularly important when you have time series amongst your data. In this example, .to_datetime() has been used to convert the values in the last column to datetime64.

03:21 You can see the results of this onscreen now. In this file, the dates are represented as large integers. That’s because the default value of the optional parameter date_format is 'epoch' whenever orient isn’t 'table'.

03:38 This behavior expresses dates as an epoch in milliseconds relative to midnight on January 1, 1970. However, if you pass date_format='iso', then you’ll get the dates in the ISO 8601 format.

03:54 In addition, date_unit decides the units of time.

04:15 You can see the contents of the JSON file produced by this code onscreen now. The dates in this file are in ISO 8601 format. You can load the data from a JSON file with read_json().

04:43 The parameter convert_dates has a similar purpose as parse_dates when you use it to read CSV files. The optional parameter orient is very important because it specifies how pandas understands the structure of the file.

05:02 There are other optional parameters you can use as well. You can set the encoding with encoding. You can use convert_dates and keep_default_dates to manipulate dates.

05:13 You can impact the precision with dtype and precise_float, and you can decode numeric data directly into NumPy arrays with numpy=True.

05:24 Note that you might lose the order of rows and columns when using the JSON format to store your data. Next up, working with HTML files.

Cindy on July 4, 2022

Hi Darren,

Thank you for the lecture. I am wondering what is the function of the coding: df=pd.DataFrame (Data=Data). T? Thank you.

Brian V on March 1, 2023

same question, anyone can ans this?

Become a Member to join the conversation.