A Custom Parser
00:26 Let me just add that in up here just so that it’s clear to everyone what’s going on here. Okay. So that’s there. And as you can see, the usage here is exactly the same as it was for the regex implementation.
And then based on whether there’s one, two, or three operands, it constructs the actual logic of the
seq program. The
parse() is what I’m going to end up writing here, and it’s going to be much more complex than it was in the regex expression.
But where complexity is gained in
parse(), it’s lost in the fact that there’s no actual regex to deal with. But the
main() also works quite the same way, where it actually gets the separator and the operands from the parsing process, and then simply says if there are no operands, raise
SystemExit with the
USAGE, otherwise, use
seq() to actually construct the string that needs to be returned.
01:40 So, how does this actual parsing logic work? How is it possible to just parse through this without using something like a regex? Well, the idea is that you want to parse from left to right, one argument at a time, and follow certain rules based on what should happen with each argument.
01:57 So, that kind of structure suggests that it might be nice to use what’s called a double-ended queue, which is essentially a queue, or a list, that lets you pop really quickly and easily from both sides—from the left and the right.
We want to pop from the left, generally. So, the
arguments is going to be a double-ended queue of the arguments, and then I’ll just say for now that the
sep is equal to newline, even though that would actually be taken care of by the default argument to
seq(), but I just want to have this here for clarity’s sake.
And then, at the moment, there are no operands so far, so
operands is just an empty list even though, eventually, it will return a list of the operands. Now, the next thing I want to say is
while arguments—so, while
arguments has any members at all—I want to get the current argument, which is equal to
Now, the first case is if there are no operands whatsoever so far, so
if len(operands) == 0. That’s the only time when you want to be checking for the
--help or the
sep options. So,
if current == "--help",
then what you want to do is you want to print out the
USAGE, and then I actually just want to exit from the system, but I want to
exit() with the status code
0 to make sure that anyone looking at this knows that this wasn’t a failure—this was a designed exit.
--separator or the long form, here. Right? So, if it’s either one of those things, what I want to do is I want to say
sep = current, which should work just fine because
current is an argument, so it’s a string.
Then, I want to
continue because I don’t want to do any more logic after this, I just want to get the separator. And I said
sep = current but of course that doesn’t make sense because then it would just make
sep to be
So, the next thing I want to say is, well, “What happens if I’m not looking to parse
"--separator"?” Right? The next thing that I want to do is I want to actually try to get access to an operand, right?
Because anything other than
sep is just an integer operand. So, the first thing I can say here is
try—and I’ll tell you why I’m using a
except block in a second—but I’ll say
operands.append() the integer form of
that they know that they’ve made some mistake here. And then the final last thing I want to do is I want to say
if operands—or, I should actually be more precise—if
operands is more than
3—so, if this is the fourth or greater argument—then I also want to raise
SystemExit, because someone has passed in a malformed number of operands there. And then with all of that done, I can simply return in the correct order the separator, and then the operands.
So, that’s how you can use this parsing logic along with a double-ended queue to go through and follow simple rules, and by following those simple rules, you get some great properties. So for example,
--separator have to come before any operands in this case, right? Otherwise, if you see them in any other place you’ll get this
ValueError because they aren’t integers, right?
So if there are any operands, then there needs to between one and three, and there needs to be at least one, and otherwise, there’ll be some kind of error that’s raised just simply based on the behavior of this
while loop structure.
And there you get the usage, so it works just fine. And then a simple test case, here, just passing in
10—works great. And then my classic example—going up by
2, and then going up by
1—works perfectly fine.
Now, let’s use the separator real quick, and see if this works. So, I’ll say here—maybe my separator would be the three letters
"AAA". I don’t know why you would use that, but there—you actually get that just as desired, so this works just fine.
Writing a custom parser can be a really flexible and really powerful approach, but the problem, of course, is that this logic here that I showed you in
parse()—it also requires a lot of maintenance just like a complex regular expression.
07:24 So really, both of these approaches—as awesome as they are—are something that really needs to be automated, and so that’s what I’ll talk about in the next section of this series when I’ll talk about the tools that exist in Python to automate this stuff for you. But as of now, you have a great understanding of how all of this works under the hood, and you should be really well prepared to understand any command line parsing system that you encounter, at least on a basic level. In the next lesson, I’ll talk some about type validation.
Become a Member to join the conversation.