Linking Selections
In this lesson you will implement linked selections in your visualization. Allowing a selection made on one plot to be reflected on others. To see how this works, the next visualization will contain two scatter plots: one that shows the 76ers’ two-point versus three-point field goal percentage and the other showing the 76ers’ team points versus opponent points on a game-by-game basis.
The goal is to be able to select data points on the left-side scatter plot and quickly be able to recognize if the corresponding datapoint on the right scatter plot is a win or loss.
You will first edit the file read_nba_data.py
to create a very similar DataFrame to that from the last example.
For additional details on linking plots can be found at Linking Plots in the Bokeh User Guide.
File: read_nba_data.py
import pandas as pd
# Read the csv files
player_stats = pd.read_csv('data/2017-18_playerBoxScore.csv',
parse_dates=['gmDate'])
team_stats = pd.read_csv('data/2017-18_teamBoxScore.csv',
parse_dates=['gmDate'])
standings = pd.read_csv('data/2017-18_standings.csv',
parse_dates=['stDate'])
# Create west_top_2
west_top_2 = (standings[(standings['teamAbbr'] == 'HOU') |
(standings['teamAbbr'] == 'GS')]
.loc[:, ['stDate', 'teamAbbr', 'gameWon']]
.sort_values(['teamAbbr', 'stDate']))
# Find players who took at least 1 three-point shot during the season
three_takers = player_stats[player_stats['play3PA'] > 0]
# Clean up the player names, placing them in a single column
three_takers['name'] = [f'{p["playFNm"]} {p["playLNm"]}'
for _, p in three_takers.iterrows()]
# Aggregate the total three-point attempts and makes for each player
three_takers = (three_takers.groupby('name')
.sum()
.loc[:,['play3PA', 'play3PM']]
.sort_values('play3PA', ascending=False))
# Filter out anyone who didn't take at least 100 three-point shots
three_takers = three_takers[three_takers['play3PA'] >= 100].reset_index()
# Add a column with a calculated three-point percentage (made/attempted)
three_takers['pct3PM'] = three_takers['play3PM'] / three_takers['play3PA']
# Philadelphia 76ers data isolated
phi_gm_stats = (team_stats[(team_stats['teamAbbr'] == 'PHI') &
(team_stats['seasTyp'] == 'Regular')]
.loc[:, ['gmDate',
'teamPTS',
'teamTRB',
'teamAST',
'teamTO',
'opptPTS',]]
.sort_values('gmDate'))
# Add game number
phi_gm_stats['game_num'] = range(1, len(phi_gm_stats)+1)
# Derive a win_loss column
win_loss = []
for _, row in phi_gm_stats.iterrows():
# If the 76ers score more poins, its a win
if row['teamPTS'] > row['opptPTS']:
win_loss.append('W')
else:
win_loss.append('L')
# Add the win_loss data to the DataFrame
phi_gm_stats['winLoss'] = win_loss
# Isolate relevant data for 76er Scatter Plots
phi_gm_stats_2 = (team_stats[(team_stats['teamAbbr'] == 'PHI') &
(team_stats['seasTyp'] == 'Regular')]
.loc[:, ['gmDate',
'team2P%',
'team3P%',
'teamPTS',
'opptPTS']]
.sort_values('gmDate'))
# Add game number
phi_gm_stats_2['game_num'] = range(1, len(phi_gm_stats_2) + 1)
# Derive a win_loss column
win_loss = []
for _, row in phi_gm_stats_2.iterrows():
# If the 76ers score more points, it's a win
if row['teamPTS'] > row['opptPTS']:
win_loss.append('W')
else:
win_loss.append('L')
# Add the win_loss data to the DataFrame
phi_gm_stats_2['winLoss'] = win_loss
File: LinkSelection.py
# Bokeh Libraries
from bokeh.plotting import figure, show
from bokeh.io import output_file
from bokeh.models import ColumnDataSource, CategoricalColorMapper, NumeralTickFormatter
from bokeh.layouts import gridplot
# Load in Data
from read_nba_data import phi_gm_stats_2
# Out to file
output_file('phi_gm_linked_selections.html',
title='76ers Percentages vs. Win-Loss')
# Store the data in a ColumnDataSource
gm_stats_cds = ColumnDataSource(phi_gm_stats_2)
# Create a CategoricalColorMapper that assigns a color to wins and losses
win_loss_mapper = CategoricalColorMapper(factors = ['W', 'L'],
palette=['green', 'red'])
# Specify the tools
toolList = ['lasso_select', 'tap', 'reset', 'save']
# Create a figure relating the percentages
pctFig = figure(title='2PT FG % vs 3PT FG %, 2017-18 Regular Season',
plot_height=400, plot_width=400, tools=toolList,
x_axis_label='2PT FG%', y_axis_label='3PT FG%')
# Draw with circle markers
pctFig.circle(x='team2P%', y='team3P%', source=gm_stats_cds,
size=12, color='black')
# Format the y-axis and x-axis tick labels as percentages
pctFig.xaxis[0].formatter = NumeralTickFormatter(format='00.0%')
pctFig.yaxis[0].formatter = NumeralTickFormatter(format='00.0%')
# Create a figure relating the totals
totFig = figure(title='Team Points vs Opponent Points, 2017-18 Regular Season',
plot_height=400, plot_width=400, tools=toolList,
x_axis_label='Team Points', y_axis_label='Opponent Points')
# Draw with square markers
totFig.square(x='teamPTS', y='opptPTS', source=gm_stats_cds, size=10,
color=dict(field='winLoss', transform=win_loss_mapper))
# Create layout
grid = gridplot([[pctFig, totFig]])
# Visualize
show(grid)
00:00
For this next visualization, you’ll use some of the same data but you’re going to create two scatter plots. And as you select data in one, it’ll select the related points in the other visualization. In your editor, reopen read_nba_data.py
. And moving to the bottom of that file, you’re going to add a few new lines.
00:22
You’re going to isolate relevant data for the 76ers scatter plots. Okay. It’s going to be phi_gm_stats_2
, and for that, you’re going to use team_stats
again. Here you go isolating a little bit of data, team_stats
where team abbreviation is equal to 'PHI'
, for Philadelphia. And the season type is 'Regular'
.
00:50
Okay. What else are you going to do? You’re going to isolate your data set to these columns, 'gmDate'
for game date, and the team two-point percentage, the team three-point percentage, team altogether points, opposition’s points.
01:11 And lastly, sort values by the game date. Great. Add a game number.
01:22
So, you’re creating a new column. And for this, you’re going to use a range from 1
to the length of the game stats plus 1
. And then similarly to before, you’re going to derive a 'winLoss'
column.
01:40
Start it with an empty list, for _, row in phi_gm_stats_2:
and then you’re iterating through the rows.
01:53
If the 76ers score more points, it’s a win. So in this case, you’re using the team points greater than opposing points. And here you append a win, capital 'W'
, else:
in your win_loss
list you’re appending an 'L'
.
02:25
And lastly, you’re going to add that column as 'winLoss'
from your list. Great! Let’s save. Now that you’ve saved, go ahead and enter into the Python REPL.
02:40
from read_nba_data import
your new Philly game stats 2.
02:51 Since it’s in there, just look at the object.
02:59
Good, 82 rows
, 7 columns
. Just look at the head of it. Here you can see the game date, the two-point percentage, the three-pointer percentage, total points, opposition points, and then the winLoss
column that you created, along with the game number column.
03:14
Good. It all looks ready to go for you to use in your scatter plots. To save some time and effort, make a copy of LinkAxes01.py
and paste, and you’re going to rename it as LinkSelection.py
.
03:32
A lot of the information is going to stay the same, so let’s go through this. From the Bokeh libraries you’re still going to be importing figure
and show
, and then you’re going to output to a static file again. From the models
, you’re still importing the ColumnDataSource
and the CategoricalColorMapper
for setting up your wins and losses as red and green again.
03:51
You’re not going to use a Div
this time, so in its place, you’re going to add a NumeralTickFormatter
. And from the bokeh.layouts
, you don’t need column
, just gridplot
. Great. Okay.
04:02
And this time when you read in the data, you’re bringing in the second version of it. And that all looks good! The output file is going to be named, instead of stats
, linked_selections
.
04:13
And for the title
, it’s a '76ers Percentages vs. Win-Loss'
. Great. Looking good. For the ColumnDataSource
, you do need to change it to the new data that you’re importing. So again, phi_gm_stats_2
instead of the one without the number. Okay.
04:30
And you still need this win_loss_mapper
going on here. And with the factors
for the letters 'W'
and 'L'
to change to the palette
of 'green'
and 'red'
. All right, that’s all useful. Looking good. From then on, all of this other information—let me resize the terminal here—you can delete all this other information. Okay.
04:50 So next up, you’re going to specify the tools that’ll be available.
04:57
And to make the tool list, you’re going to create a list actually. You’re going to have 'lasso_select'
and 'tap'
, which you’ve used before.
05:05
We also have a 'reset'
, and then if you want, you can save an image. All right, create our first figure. This is the one that’s going to be relating to the percentages.
05:16
It’ll be called pctFig
, percentage fig. It will be a figure. title
will be a point field goal percentage versus three-point field goal percentage for the 2017-18 regular season.
05:33
Just as you had selected before. plot_height
of 400
and a plot_width
of the same. tools
—well, that’s where you created your tool list earlier. Great.
05:46
You’re going to create an x_axis_label
and that label will be two-point field goal percent. And the y-axis is going to be really similar, so let’s copy that by changing it to y
and '3PT'
. Great, that’s your pctFig
. You’re going to use that tool you imported to format the y-axis and x-axis tick labels to be as percentages.
06:15
So that’s pctFig.xaxis()
and you’re using a formatter, the NumeralTickFormatter
with a format style of '0.00%'
. In fact, the y-axis one is going to look almost exactly the same.
06:30
So copy that as .yaxis()
. Perfect. Okay, next up, now you’re going to create another figure relating to the total scores. It will be called totFig
, for total fig. It, again, is a figure. Its title
06:50
will be 'Team Points vs Opponent Points'
and the same '2017-18 Regular Season'
, copying that down. In fact, a lot of this other information will be similar, so copy and paste that. plot_height=400
, plot_width=400
. tools
, same from the toollist
. x_axis_label
is going to be different though.
07:15
It will be the 'Team Points'
. And the y_axis_label
will be your 'Opponent Points'
.
07:25
Okay. You’ve created your two figures. Now you need to create the glyphs that are going to be on it. For the pctFig
, draw them with circle markers.
07:36
So pctFig
using a .circle()
, and the x
is going to be from the column team two-point percentages
07:47
and the y
will be from the team three-point percentages. Your source will be from the ColumnDataSource
you set up a moment ago, gm_stats_cds
.
08:00
And then create a size
for each of these of 12
and a color
of 'black'
. Great. So those would be the circles for all of those two-point and three-point field goals set up on the x- and y-axes. Great.
08:12 And then you’re going to draw square markers—square glyphs—for the wins and losses.
08:22
x
is going to be the team points. y
will be the opposing points. source
is the same. size
of 10
, so a little bit smaller.
08:34
And then the color
, you’re going to use the same technique as the last time we used the color formatter—a dictionary. field
is from 'winLoss'
and then transform it using the win_loss_mapper
that you created. All right. Two things to do.
08:51
Create a quick layout. This layout’s going to be pretty simple. Just name it grid
. It will a gridplot
, and it’s only going to be one row, so pctFig
will be the first figure and totFig
will be the other.
09:05
Make sure you got the right names. pctFig
, yep, totFig
. Looks good. Now you’re ready to visualize. To do that, here you’ll show(grid)
.
09:16
Save. Okay. Run this script called LinkSelection.py
. Let’s see, it looks like I missed something.
09:27
line 26
… oh, I missed a comma. That will mess me up. Okay. Just a comma at the end of line 25. All right, try running it again after saving. And I missed a comma there at the end of line 30. Well, there’s a lesson for you. Don’t hurry through your arguments.
09:47 So now that you’ve created this, right now the lasso tool is ready to do some selecting, Lasso Select. If you select some of these higher percentages, you’ll see that those more likely turn out to be wins. Over here on the right.
10:01 And if you were to select, say, more of these values down here, these wins—again, we’ll see them higher up here—versus if you were to select more of the losses, you’ll see that percentages are much lower over here. Pretty neat!
10:15 Again, you can use individual points to select by tapping on them, holding Shift if you’d like. And you see the points appear on the other side. So now you’ve linked your selections.
10:24
And that’s the advantage of using a ColumnDataSource
. Since both of these visualizations inside your gridplot
share the same ColumnDataSource
, then the selections will be linked by default.
10:35 There’s much more information about linking plots in the Bokeh User Guide. Fantastic. Next, you get to practice using the legend to interactively highlight data.
Become a Member to join the conversation.