Zeroing the Bias
00:00 In the previous lesson, I tried to get rid of our rounding bias but didn’t quite get there. In this lesson, I promise I still won’t quite get there. A lack of symmetry in the half up and half down comes from the fact that tiebreakers always shift in the same direction.
00:18 One approach to deal with this is to not round in the same direction, instead round away from zero. That means ties for positive numbers go up while ties for negative numbers go down.
00:33 This gives us four situations. A positive number, greater or equal to the midway gets rounded up. A positive number less than the midway gets rounded down while a negative number greater or equal to the midway gets rounded down and a negative number less than the midway gets rounded up.
00:55 This text is almost pseudo code describing what I just showed you on the line. Think for a moment how you might implement this. The most obvious answer is probably a bunch of if/else blocks, but there is actually a faster way using only math.
01:13 You can use the half up approach on the absolute value of the number and then copy the sign of the original number. Essentially, you’re shifting the absolute number up, but seeing as the absolute number is always positive, it’s away from zero and then you apply the sign again, resetting it.
01:31 So if it was negative, it had shifted left. Let’s go see this in practice. Okay, here it is. As promised, I start by taking the absolute value of the number and applying the half_up algorithm.
01:46
Then to restore the numbers sign, I use the copy_sign()
function from the math library. The copy_sign()
function does what its name implies, copying the number’s positive or negativenesses onto the other number.
02:08 Just as a reminder, before testing our new function, let’s look at half_up once again. That’s a positive tiebreaker
02:19 and that’s a negative tiebreaker. Remember, this is always shifting to the right no matter what. Now, let’s try our half away from zero approach.
02:34 Same positive tiebreaker,
02:39 but the negative tie breaks in the other direction. Of course, this still works with other decimal places as well.
03:05 A reminder of our favorite mean,
03:13
the results from calling the half_zero()
method, and now let’s get the mean on that.
03:25 Hmm, that’s good, but logically this should be perfect, right?
03:32 So why didn’t this work? Our list had six numbers in it, but only one of them required a tiebreaker, which means the tie-breaking requirement wasn’t evenly distributed.
03:43 For a large enough dataset the half zero approach should go well, but if you don’t have a Gaussian distribution evenly centered on zero, the half zero approach can still shift your data.
03:56 The data I showed you wasn’t Gaussian, it only had one tiebreaker in it and it was a positive number, and so the mean shifted slightly to the right. Well, what can we do about this?
04:08 One approach is to use the same concept, but not to center the math around zero, but to center it on even numbers. That means the shifting should be better even in the case when all your tiebreakers are positive or negative.
04:22
In fact, I have brought you full circle. This is the algorithm Python’s round()
function uses. Way back many lessons ago, I showed you how Python’s round()
rounds 1.5 up while rounding 2.5 down.
04:36 This is why it does that. It’s rounding towards the even number, two. This is definitely better. Technically though, it’s still biased. If your data is somehow skewed to even or odd numbers, you’ll still see a shift, but that’s unusual, and by going towards the even value, there’s no longer a requirement for the data to be based on a bell curve to try and avoid most of the bias.
05:03
There are algorithms out there that handle the bias left in Python’s round()
by introducing randomness in the tiebreakers, but they tend to be computationally more complex and really aren’t needed very often in the real world.
05:17 Rounding in Python, especially with floats, still has a few surprises left.
Become a Member to join the conversation.