Working With JSON Data in Python
In this video, you’ll get some practice deserializing JSON data from a web API. Then, you’ll learn how to manipulate this extracted data to derive meaning from it.
If you’re following along, here’s the resource used in the video: jsonplaceholder.typicode.com/todos
00:00 Welcome back to our series on working with JSON data in Python. In this video, we’re going to work with a larger set of JSON data and we’ll see how we can manipulate the data to derive some meaning from it.
00:13
To get our JSON data I’ll use JSONPlaceholder, which is a public API that exposes large sets of JSON data for testing and prototyping purposes. Just like usual, we’ll start by importing the json
module but we’ll also need to import another module named requests
.
00:33
This will allow us to get JSON data from the JSONPlaceholder API in the form of a web request. Now we need to actually make the request. We’ll create a new variable called response
, which will hold the response from the web server.
00:48
We’ll get that using requests.get()
and we’ll pass in this web address right here, which contains a TODO list of 200 items formatted in JSON format. To actually obtain a Python list from this JSON data, we’ll create another variable called todos
and we’ll use the loads()
method from before, passing in response.text
.
01:12
response.text
will get us the content of our web request, which is a string containing all of the JSON data. And that’s why we’re using the loads()
method—because we’re reading from a string, not a file object. Just to show that this works, I’ll print out the first two items in our list with print(todos[:2])
.
01:35 If I right-click and run the code, we’ll see that on the right side of the screen, we have the first two TODO items that is exposed by the API. Each TODO item is a Python dictionary with keys and values.
01:50
Notice that each TODO object contains a 'userId'
representing the user assigned to this TODO; an 'id'
, which is used to label the TODO; a 'title'
describing it; and finally, whether or not the TODO is completed.
02:04
We’re going to make use of all of this data except for the 'title'
. Now, we don’t want to see the data—we want to actually manipulate it. So let’s just delete this print statement here.
02:15
We want to figure out which users have completed the most TODO items. We’ll start by creating a new dictionary called todos_by_user
, which will map each 'userId'
to the number of TODOs that they have completed.
02:31
Now, we actually have to compute that. So on a new line, we’ll type for todo in todos:
, which will iterate through our todos
list.
02:41
And we’ll say if todo["completed"]:
—so here we’re checking if the "completed"
key has a value of True
. And then we will wrap the following logic in a try
block. We’ll say, “Get the user’s count in a dictionary with this code right here.” And we’ll increment it by 1
.
02:59
Our dictionary is going to represent the users by their "userId"
, which is why we’re accessing the value of the "userId"
key in this todo
dictionary. But if the user is not already present in our dictionary, we’ll see a KeyError
, and so we’ll need to catch that by typing except KeyError:
and then we’ll create a new user in the dictionary, setting their completed TODOs count to 1
. At this point, we’ve got a dictionary that maps each 'userId'
to the number of items they’ve completed.
03:31 Now we have to determine what the highest number of completed items is, as well as who’s completed that many items. This code I’m typing now is going to create a new list of tuples with each tuple containing the person as well as how many items they’ve computed.
03:48 The tuples will be sorted in descending order by the number of items completed.
03:53
And now all we have to do is get the second value in the first tuple in the list, and this will represent the maximum number of items completed. We’ll type max_complete = top_users
, first index, second item. In order to determine which users have this completed count, I’ll first define a new list called users
, which will hold the users that we discover. Now, we’ll say for user, num_complete in top_users:
—so now we’re iterating over the list of tuples. We’ll type if num_complete < max_complete:
then break
. Otherwise, append a string representation of our user ID to the users
list.
04:45
Then we’ll create a new string that tells us what users have completed the most TODOs. We’ll call this max_users
and we’ll take the string " and "
padded with space and .join()
it with our list of users
. Finally, we’ll print the f-string f"user(s) {max_users} completed {max_complete} TODOs"
.
05:10
So now when we run this code, we’ll see user(s) 5 and 10 completed 12 TODOs
. If you’d like, you can verify that this is correct by heading to the web address at the top of our file and viewing the JSON data for yourself.
05:26
Now for our final task, let’s create a JSON file that contains all of the completed TODOs for each of these users. At this point, we have the data for all of the TODO items as well as the two users who have completed the most, so we can do this. When we’re done, we should have a JSON file containing only the TODO items completed by users 5
and 10
.
05:50
We’ll start by defining a new function keep()
, which will take a single todo
object as its parameter. Remember: this todo
object is now a Python dictionary representing a single TODO item, and we obtain that through deserialization.
06:08 This function will be used to filter out all of the TODO items that were not completed by either of the top users.
06:15
First, we have to figure out if the item is even completed, so we’ll say is_complete = todo["completed"]
.
06:24
Then, we’ll see if this item was assigned to one of the top two users, so we’ll write has_max_count = todo["userId"] in users
. users
is a list of users 5
and 10
, so we’re looking to see if this item was assigned to either of those people.
06:46
Finally, we’ll return is_complete and has_max_count
, since both of these Booleans need to be True
in order for the todo
item to stay in our list.
06:57
Now that we have our filtering function, we can write some JSON data to a file. Just like before we’ll say with open("filtered_data_file.json")
but now we’ll give it the argument "w"
to tell Python we want to write to this file. And we’ll give it the identifier data_file
. Inside the with
block, I want to obtain a list of only the items that were completed by users 5
and 10
, so I’ll say filtered_todos = list()
and then we’ll call the filter()
function with our filter function keep
and our todos
list, which contains all of the TODO items.
07:38
Now that we’ve got this list, we can say json.dump()
and we’ll keep it the list of filtered_todos
and the data_file
we’re writing to.
07:47
And finally, an indentation level of 2
, which will make it easier to read. And that’s all we need to do for this program! If we right-click here and we choose Run Code, we’ll see our expected output: user(s) 5 and 10 completed 12 TODOs
.
08:04
But if we head over to our new JSON file that’s been created, you’ll notice that we have a problem. It’s empty, but it’s supposed to have all of the items completed by users 5
and 10
, since they’ve completed the greatest amount of items.
08:19 So let’s head back to our code here and start the debugging process. I will say the bug in this program has nothing to do with the JSON serialization or deserialization, so I would encourage you to pause the video and step through this code with your favorite debugger. Personally, I like the Visual Studio debugger, so I’m going to use that.
08:39
Now we know everything up until our print statement is working because we got the right print()
output in the console, so let’s safely ignore all that. I suspect the issue is in our filtering function. I think it might be filtering everything out of our TODO list, which is why no data is actually getting serialized.
08:59
I’m going to set a breakpoint on the last line in the function, and then I will start the program with debugging. Now that we’re stopped here on the return
line, we could sit here and step through each item until the bug reveals itself, but I’ll save you the hassle and I’ll just tell you what it is.
09:17
If we hover over the users
variable here, we can see that users 5
and 10
are represented as strings. That’s because we appended the users as strings earlier so that we could print them out in a nice formatted manner. But if I hover over todo
here, notice that 'userId'
is actually an int
. Right now, we’re asking Python if an integer is in a list of strings, and so has_max_count
is always set to False
, even when we’re comparing the integer 5
to the string '5'
.
09:50
There’s a few different ways we can fix this, but the easiest is just to convert this to a string before we compare it. This way, we’re comparing two strings and not an integer and a string, which will always be False
.
10:03
And now if I stop the debugger and I run this code again, we should see that we still have our correct console output, but if we head over to the JSON file, we have all of the items that users 5
and 10
have completed.
10:18 Fortunately for us, that wasn’t too hard of a bug to find. In the next video, we’ll take a look at how we can encode custom Python objects or otherwise non-serializable types into JSON format.
himanshuwadhwa on June 12, 2019
Serializing - Converting Python’s List data format into JSON data . So you are encoding the python data into JSON data format (object format or string format)
Deserializing : Just doing the opposite ie decoding the JSON data back to Python data format .
Pulling the data from a Web API will give JSON formatted data which has to be converted (decoded) to Python data and based on whether you need the enitire data or not you apply relevant Python techniques to filter your final data.
ravipgupta12 on Nov. 12, 2019
What is the process for importing data from Json file into python?
Ranit Pradhan on March 28, 2020
Todos isn’t working in Jupyter Notebook and Python IDE 3.7 but it’s working in Spyder IDE. Can you tell me please, how it will work in Jupyter Notebok?
maniviswa on April 24, 2020
Its hard to understand, you could have taken aws as an example, it will be easier to understand.
J on May 7, 2020
How did you get the output screen on the right? what is that part in vs code is it an extension ?
J on May 7, 2020
So when I go to print the filtered object before converting it it just returns the object at some memory location, not the actual filtered object. so it seems in order to view it, you actually have to convert it to some type of containter, am I correct?
rajss2494 on Aug. 9, 2020
Very difficult to follow what is being explained. It would be worth giving an overview of what we are trying to achieve and how before any code is written
yaost on July 15, 2021
For the keep function, why use the “and” for the return values instead of the “,”. I tried the return iscomplete, has_max_count, but it seems doesn’t filter anything
Bartosz Zaczyński RP Team on July 16, 2021
@yaost The built-in filter()
function expects a predicate that returns either True or False. When you replace the and
keyword with a comma, then your keep()
function will return a tuple instead. A non-empty tuple, which has two elements in this case, will always evaluate to True, keeping all elements or not filtering anything as you observed.
Martin Breuss RP Team on July 16, 2021
Hi everyone, here are a couple of notes on the comments. @rajss2494 and @maniviswa, sorry you found this lesson hard to follow, and thanks for the suggestions! Maybe I can help to clear up some of the confusing bits:
Importing JSON from a File
@ravipgupta12 you can do this also with the json
module, similar to how Austin shows in the video, only you’ll have to open a file object first:
import json
with open("your-json-file.json", "r") as f_in:
data = json.load(f_in)
Note that you’ll have to use the json.load()
function (rather than json.loads()
) if you’re reading from a file object.
Todo’s not working
@Ranit Pradhan I don’t quite understand your question. Are you looking for a way to apply syntax highlighting to the word “TODO” in a Jupyter Notebook?
Right-Hand VSCode Panel
@J the panel you can see on the right is the Output panel that comes built-in with VSCode. It might be at the bottom for you, Austin just moved it to the right.
Also the Debugger that he uses for a moment already comes with VSCode, so you won’t have to install any extensions to do what Austin does in this lesson.
The filter()
Function
@J the filter()
function returns a filter object that yields values on demand. So you can either call the next()
function on it to get the next item, iterate over it, or—like you said—convert it to a collection, e.g. a list
, if you want to read all of the values it yields. You can check out Python’s filter(): Extract Values From Iterables for more information on this function.
String Output
@yaost the " and "
string that Austin uses for the output is just one possible example of how to format it. It doesn’t have any specific meaning, and you could have used a comma (,
) instead just as well.
techsukenik on Sept. 1, 2021
I re-coded “Working with JSON Data in Python” to use several features I learnt last week. The goal I have is to use python features to make the code more concise and easier to code. Feel free to comment and offer suggestions that will help me better understand python. (FYI… I did not put in any error checking in the code) Thanks for your feedback.
import json
import requests
from collections import Counter
response = requests.get("https://jsonplaceholder.typicode.com/todos")
todos = json.loads(response.text)
completed_users = list([_["userId"] for _ in todos if _["completed"]]) # filter out completed users
completed_users_counter = Counter(completed_users) # use counter to get totals of complete users
max_completed = completed_users_counter.most_common()[0][1] # Maximum completed items
max_users = list([_[0] for _ in completed_users_counter.items() if _[1] == max_completed ])
# completed_users_counter.items() returns [(5, 12), (10, 12)]
# returns [5, 10]
max_users_print = ' and '.join("{0}". format(n) for n in max_users)
print(f"user(s) {max_users_print} completed {max_completed} TODOs")
with open("filtered_data_file.json","w") as data_file:
max_users_completed_todos = [_ for _ in todos if _["userId"] in max_users and _["completed"] ]
json.dump(max_users_completed_todos,data_file,indent=4)
PururinTora on March 2, 2022
This was awesome Guys! I learnt a lot from it. These lines were absolute highlights for me:
sorted_users = sorted(todos_by_user.items(), key=lambda x: x[1], reverse=True)
filtered_todos = list(filter(keep, todos))
jccarrillosoto on May 27, 2023
As a beginner understanding this was a bit difficult with the implementation of lambda functions and explaining the main purpose of the code instead of why we do each step of the overall for loops.
Narendrakumar Ratibhai Patel on July 22, 2024
I found error while importing requests module
File “d:\pythonProject\WorkingWithJSONAndPython\jsonplace.py”, line 2, in <module> import requests ModuleNotFoundError: No module named ‘requests’
Become a Member to join the conversation.
terrymiddleton on May 11, 2019
I still don’t understand the what the difference is between serialized and deserialized means. Is it just that when deserializing we are pulling the matching the data to the key pairs when we read the json file? I’m missing somthing.