Processing Data Before Loading a DataFrame
00:00 In this lesson, you’ll take the university towns dataset as it is, and you’ll process the data so that you can load it into a DataFrame. Now you have a rough idea of how the data’s structured and what kind of columns you want. At the moment, the stage is that it’s not working to put in a DataFrame, or at least it does work, it just ends up as one column with lots of rows, which isn’t structured at all.
You could have a two-dimensional list or a dictionary with two keys representing the column, each with a list of the values in those columns. All the states have
 next to them in square brackets, and this can be used as a marker in a
for loop when you’re processing the data.
Every line without the
 is a university town, so you can use this string,
"", as a sort of marker for your
for loop to construct this data structure. With this information, you can build a table with states in one column and towns in the other.
Working … and yes, that looks right. So it’s started a list and one line is the state with the
 and then the other line’s … okay, so each line is a different element. So that’s been successful.
So now you’ll want to start a
for loop, which you can do like this directly. That will automatically give you all the lines in the file. You don’t have to call
.readlines() on that, although you can if you want.
in keyword, line, then we will do something. Otherwise, we will do something else. We’ll want to be setting the state as the sort of local state of the application. Don’t get confused by the naming here.
03:26 Remember we’re not being asked to clean it just yet, but all the same, we’re just going to strip this, which means clear out the whitespace from the start and the end, which is pretty typical—you know, the whitespace, isn’t gonna have much information that they’re gonna need—so we will allow yourself this little bit of cleaning.
We won’t be removing
"" from this just yet. And then what you want to do here is to create the list that will contain the state and the line, because if the line doesn’t have
"", then you’ll want to just put in the state as you have it, and then the line.
Let’s look at it in a text editor because there’s quite a lot here. And as you can see, you’ve got your outer list and then you’ve got your inner list with each inner list has two items, and you’ve got the state,
Alabama, here and all the towns as the second elements.
and yes, that’s working. That successfully goes into a DataFrame now. That’s done. What you could do here is, once you have that DataFrame, is call
.to_csv(). Put in the filename here. Let’s call it
05:50 And then run this. That seems to have worked, and here it is. As you can see, it’s included the index, but for now, that looks good. Now you can pass this CSV back to your boss, and he’ll be able to load this directly into a DataFrame.
06:08 So you’ve processed that data for your colleagues, and you’ve loaded it into a DataFrame so that they can use it. Following their instructions, you haven’t cleaned it, but in the next lesson, they’re going to come back to you and ask you to clean it now that it’s in a DataFrame.
Become a Member to join the conversation.