Exploring plt.scatter() Further
plt.scatter() offers even more flexibility in customizing scatter plots. In this section, you’ll explore how to mask data using NumPy arrays and scatter plots through an example where you’ll generate random data points and then separate them into two distinct regions within the same scatter plot.
00:23 A commuter who’s keen on collecting data has collated the arrival times for buses at her local bus stop over a six-month period. The timetable arrival times are at 15 minutes and 45 minutes past the hour, but she noticed that the true arrival times follow a normal distribution around these times.
You’ve created two normal distributions centered on 15 and 45 minutes past the hour and summed them. You then set the most likely arrival time to a value of
1 by dividing by the maximum value. Onscreen, you can see the output from this code showing the two normal distributions centered on 15 and 45 minutes.
04:58 They’re then added together and normalized by dividing by the maximum. Next, the curve is added to the plot with the following line. This code gives the output seen onscreen. To keep the simulation realistic, you need to make sure that the random bus arrivals match the data and the distribution obtained from those data.
out_region are NumPy arrays containing Boolean values based on whether the randomly generated likelihoods fall above or below the distribution
y. You then plot two separate scatter plots: one with the points that fall within the distribution and another for the points that fall outside the distribution.
06:39 The data points that fall above the distribution are not representative of the real data. You’ve segmented the data points from the original scatter plot based on whether they fall within the distribution and used a different color and marker to identify the two sets of data. Onscreen, you can see the result of this code with the data points that fall above the distribution, which are not representative of the real data, plotted in a different color with an X marker. Once again, note that your result will look different as the points are randomly generated and will change each time the code is run.
Become a Member to join the conversation.