Manipulating the Data
00:00 Now that you can pull all the data out of an Excel workbook, let’s take a look at how you can convert that data into useful Python data structures. All the data you’ve returned so far has been in the form of tuples, which can be thought of as just immutable lists.
00:14 This should be relatively straightforward to work with. So, let’s try out an example where you need to extract product info from the spreadsheet and put it into a dictionary where the key is a product ID.
This dictionary of dictionaries will let you index our product ID and then return all the pertinent info, which is pretty cool. To get started, go ahead and grab all the header values so that you know which columns you are interested in. So,
for value in sheet.iter_rows()—make sure the capitalization’s correct—and you’ll want
min_row equal to
max_row is also equal to
1. This will just grab that first row.
And, because you’re interested in the values, say
values_only=True. And from here, go ahead and
print(value). All right. So take a look at all the header titles—so,
'customer_id'—and you’ll notice that you have
'product_category'. These are conveniently next to each other, so you can quickly pull all of these columns using the min and max column arguments that you used earlier. Let’s take a look. This will be
7, as the max column.
So, to test this out,
for value in sheet.iter_rows()—and this time, you’ll say
min_row=2, because you don’t want that header row. And then, the
min_col is going to be equal to
4, and the
max_col is going to be equal to
7, and like before, you’ll want
values_only set to
And from here, go ahead and print the value. Okay! So, that printed out a bunch of stuff here—I’ll scroll to the top—and you should see that you end up with the product ID, the product’s parent, product title—which is a pretty long one—and then the product category, which should be
'Watches', for all of these.
And to get more fancy, let’s go ahead and
import json, and then
from openpyxl import load_workbook. And also like before, let’s say,
workbook = load_workbook() and pass in the
sheet is still the
and then the
min_col=4 and the
max_col=7. Then, make sure you have the
values_only and set this equal to
True. And now, instead of printing out that
row—and I should probably have an
in up here—you’re going to make a
product_id, which is going to equal the first item in the row,
and then make the product dictionary. So, just say something like
product and then start a new dictionary, and you want to pass in the
"parent", which is
row, and then the
"title", which is
row, and then the
"category", which is that last item, so at index
And now that you have this, you can say that
products and then, at that
product_id, is going to equal
product. So, this line right here has a lot going on with it, very similar-sounding variables—and I’m not helping by not having an
So, let’s break this down.
products here is what you defined up here as an empty dictionary. And then now, you’re filling this dictionary with a key that’s equal to a
product_id, and then the value at that key is going to be the
product, which is this dictionary here and has more information on the product.
04:59 So, this is a pretty cool way to take your data and set it up. Like, if you had a product ID of something you pulled out of a database. Now, you could say the product ID, and then that would pull out all the product info in a dictionary format.
So, after this runs through every row in the worksheet, let’s take this
products dictionary and use
json to get this into a string format. We’ll just print that out, so,
json.dumps() (dump S)—to dump the dictionary into a string—and then pass in the
products dictionary. Save this. We can actually quit out of the interpreter session. And then, let’s try to run it. So say,
python parse_product.py. I got an error, so let’s see.
No [...] file or directory, so if I go up here—so, I named mine
sample. And that’s wrong. All right, let’s save that.
and then the first key is this ID, and then that ID represents this value, which is another dictionary, here. It has all the product information, with the
"category" down here. So, this is great! This might be a bit tricky to read printed out in the terminal here, but a JSON output is very useful in a lot of cases, and the dictionary that produced it would also be very useful in further Python scripts.
Become a Member to join the conversation.