Join us and get access to hundreds of tutorials and a community of expert Pythonistas.

Unlock This Lesson

This lesson is for members only. Join us and get access to hundreds of tutorials and a community of expert Pythonistas.

Unlock This Lesson

Selecting Data Points

Give Feedback

In this lesson you’ll learn about selecting data points in your visualizations. Implementing selection behavior is as easy as adding a few specific keywords when declaring your glyphs. You will start by modifying read_nba_data.py and aggregating data from the player_stats DataFrame.

For even more information about what you can do upon selection, check out Selected and Unselected Glyphs.

File: read_nba_data.py

import pandas as pd 

# Read the csv files
player_stats = pd.read_csv('data/2017-18_playerBoxScore.csv',
                           parse_dates=['gmDate'])
team_stats = pd.read_csv('data/2017-18_teamBoxScore.csv',
                          parse_dates=['gmDate'])
standings = pd.read_csv('data/2017-18_standings.csv',
                         parse_dates=['stDate'])

# Create west_top_2
west_top_2 = (standings[(standings['teamAbbr'] == 'HOU') | 
              (standings['teamAbbr'] == 'GS')]
              .loc[:, ['stDate', 'teamAbbr', 'gameWon']]
              .sort_values(['teamAbbr', 'stDate']))

# Find players who took at least 1 three-point shot during the season
three_takers = player_stats[player_stats['play3PA'] > 0]

# Clean up the player names, placing them in a single column
three_takers['name'] = [f'{p["playFNm"]} {p["playLNm"]}'
                        for _, p in three_takers.iterrows()]

# Aggregate the total three-point attempts and makes for each player
three_takers = (three_takers.groupby('name')
                            .sum()
                            .loc[:,['play3PA', 'play3PM']]
                            .sort_values('play3PA', ascending=False))

# Filter out anyone who didn't take at least 100 three-point shots
three_takers = three_takers[three_takers['play3PA'] >= 100].reset_index()

# Add a column with a calculated three-point percentage (made/attempted)
three_takers['pct3PM'] = three_takers['play3PM'] / three_takers['play3PA']

File: ThreePointAttVsPct.py

# Bokeh Libraries
from bokeh.plotting import figure, show
from bokeh.io import output_file
from bokeh.models import ColumnDataSource, NumeralTickFormatter

# Import the data
from read_nba_data import three_takers

# Output to file
output_file('three_point_att_vs_pct.html',
            title='Three-Point Attempts vs. Percentage')

# Store the data in a ColumnDataSource
three_takers_cds = ColumnDataSource(three_takers)

#Specify the selection tools to be made available
select_tools = ['box_select', 'lasso_select', 'poly_select', 'tap', 'reset']

# Create the figure
fig = figure(plot_height=400,
             plot_width=600,
             x_axis_label='Three-Point Shots Attempted',
             y_axis_label='Percentage Made',
             title='3PT Shots Attempted vs. Percentage Made (min. 100 3PA), 2017-18',
             toolbar_location='below',
             tools=select_tools)

# Format the y-axis tick label as percentages
fig.yaxis[0].formatter = NumeralTickFormatter(format='00.0%')

# Add square representing each player
fig.square(x='play3PA',
           y='pct3PM',
           source=three_takers_cds,
           color='royalblue',
           selection_color='deepskyblue',
           nonselection_color='lightgray',
           nonselection_alpha=0.3)

# Visualize
show(fig)

Comments & Discussion

Pygator on Aug. 18, 2019

This set of video tutorials are great! I can already dream up some use cases. Some video series about some image manipulation packages would be great.

Become a Member to join the conversation.