A Quick Aside About Data
Anytime you are exploring a new visualization library, it’s a good idea to start with some data in a domain you are familiar with. The beauty of Bokeh is that nearly any idea you have should be possible. It’s just a matter of how you want to leverage the available tools to do so.
The remaining examples will use publicly available data from Kaggle, which has information about the National Basketball Association’s (NBA) 2017-18 season, specifically:
You can download the data files from the Real Python GitHub repo.
Create a new subdirectory name data inside the the Bokeh directory you created earlier, and save the files there.
Bokeh/data/
- 2017-18_playerBoxScore.csv: game-by-game snapshots of player statistics
- 2017-18_teamBoxScore.csv: game-by-game snapshots of team statistics
- 2017-18_standings.csv: daily team standings and rankings
File: read_nba_data.py
import pandas as pd
# Read the csv files
player_stats = pd.read_csv('data/2017-18_playerBoxScore.csv',
parse_dates=['gmDate'])
team_stats = pd.read_csv('data/2017-18_teamBoxScore.csv',
parse_dates=['gmDate'])
standings = pd.read_csv('data/2017-18_standings.csv',
parse_dates=['stDate'])
00:00 For the next set of tutorials, you’re going to need some data. Anytime you’re exploring a new visualization library, it’s a good idea to start with some data that you’re familiar with.
00:08 One of the great things about Bokeh is that it’s very flexible. Any idea that you have, you should be able to implement it. It’s just a matter of how you want to leverage all the tools that are available to you.
00:18 The remaining set of examples are going to use publicly available data from Kaggle, and there you can find a large collection of public datasets that you can use for practicing your visualizations. In our case, you’re going to use data from the National Basketball Association’s 2017 to 2018 season, and this data has been already saved from the Kaggle site and put on our GitHub site, where you can find three CSV files.
00:42
All three of these files are saved in the data/
directory inside the materials/intro-to-bokeh/
folder. There’s a link in the text just below this video.
00:52
You need to download those three data files, and save them inside of a folder named data/
inside your Bokeh/
directory. Now you’re going to create a file.
01:04 This file will act as a module that you can import your data into your visualizations from. You’ll be loading those CSV files in, and you’re going to do a little bit of data manipulation. Okay.
01:16
Create that new file and name it read_nba_data.py
to save it as a module.
01:27 One note: the code snippets that you can import in show dashes between all of these. That won’t work for naming a module and might give you some errors in importing it. That’s why I’ve changed the name to be underscores instead. Okay.
01:41
What’s this going to look like? You need to read the three CSV files in first. To do that, start by importing pandas
using the alias pd
, and now read the CSV data in.
01:59
You’ll create a variable named player_stats
and into it, you’ll use the pandas method read_csv()
to read that CSV file as a DataFrame
and put it into the player_stats
variable.
02:14
You’ll need to start with the subdirectory, 'data/'
, and then the filename in order for it to read from that subdirectory. And the source file will be the '2017-18_playerBoxScore.csv'
.
02:29
Add the argument parse_dates=
using the 'gmDate'
column.
02:39
The next file you’ll read into the variable team_stats
,
02:47
again using the 'gmDate'
(game date).
02:50
And then you’ll create a DataFrame
for standings. So, this one is the 'stDate'
(standings date). Okay. Make sure you save the file, and then you can experiment and make sure your data is there. I’m going to step into the REPL by typing python3
, and then I’m going to import—actually, from read_nba_data import
everything.
03:16
Normally it’s a bad habit to import using the wild card (*
), which I agree. In this case, I’m simply going to show you a few things and we’re doing a bit of testing. In the later tutorials, you will import from read_nba_data
only the items that you’re going to use in the script. Great.
03:36
So the standings
here. Yep. 5040 rows, 39 columns. team_stats
and player_stats
.
03:49
Now that your data is ready for importing into your scripts, it’s time for you to try out the ColumnDataSource
object. And that’s up next.
Chris Bailey RP Team on May 28, 2019
Hi Sion, I have a temporary solution, and will work on something more permanent. If you use this Real Python github materials link you can download the materials as a zip file, using the large green button on the right side. The path inside the unzipped folder will be: materials-master/intro-to-bokeh/data/
.
Dan Bader RP Team on May 28, 2019
@sion: You can also try this link which should download a zip file that only includes the CSVs from the repo, but not the rest of the code and the other projects.
Pygator on Aug. 18, 2019
I literally had no idea you could import * from a file like that inside a repl session and inspect the variables. i may have been using jupyter notebooks for too long
Become a Member to join the conversation.
sion on May 27, 2019
2017-18_teamBoxScore.csv’ downloads correctly from gitub. 2017-18_playerBoxScore.csv and 2017-18_teamBoxScore.csv only show the raw data without column headings and on the screen. So far I have been unable to find these files in Kaggle Any assistance to obtain these files will be welcome.