While reading the Python documentation, you may have found fragments of BNF notation (Backus–Naur form) that look something like the following:
name ::= lc_letter (lc_letter | "_")*
lc_letter ::= "a"..."z"
What’s the meaning of all this strange code? How can this help you in understanding Python concepts? How can you read and interpret this notation?
In this tutorial, you’ll get to know the basics of Python’s BNF notation and learn how to take advantage of it to get a deep understanding of the language’s syntax and grammar.
In this tutorial, you’ll:
- Learn what BNF notation is and what it’s used for
- Explore the characteristics of Python’s BNF variation
- Learn how to read the BNF notation in the Python documentation
- Explore some best practices for reading Python’s BNF notation
To get the most out of this tutorial, you should be familiar with Python syntax, including keywords, operators, and some common constructs like expressions, conditional statements, and loops.
Get Your Code: Click here to download the free sample code that shows you how to read Python’s BNF notation.
Getting to Know Backus-Naur Form Notation (BNF)
The Backus–Naur form or Backus normal form (BNF) is a metasyntax notation for context-free grammars. Computer scientists often use this notation to describe the syntax of programming languages because it allows them to write a detailed description of a language’s grammar.
The BNF notation consists of three core pieces:
Component | Description | Examples |
---|---|---|
Terminals | Strings that must exactly match specific items in the input. | "def" , "return" , ":" |
Nonterminals | Symbols that will be replaced by a concrete value. They may also be called simply syntactic variables. | <letter> , <digit> |
Rules | Conventions of terminals and nonterminals that define how these elements relate. | <letter> ::= "a" |
By combining terminals and nonterminals, you can create BNF rules, which can get as detailed as you need. Nonterminals must have their own defining rules. In a piece of grammar, you’ll have a root rule and potentially many secondary rules that define the required nonterminals. This way, you may end up with a hierarchy of rules.
BNF rules are the core components of a BNF grammar. So, a grammar is a set of BNF rules that are also called production rules.
In practice, you can build a set of BNF rules to specify the grammar of a language. Here, language refers to a set of strings that are valid according to the rules defined in the corresponding grammar. BNF is mainly used for programming languages.
For example, the Python syntax has a grammar that’s defined as a set of BNF rules, and these rules are used to validate the syntax of any piece of Python code. If the code doesn’t fulfill the rules, then you’ll get a SyntaxError
.
You’ll find many variations of the original BNF notation out there. Some of the most relevant include the extended Backus–Naur form (EBNF) and augmented Backus–Naur form (ABNF).
In the following sections, you’ll learn the basics of creating BNF rules. Note that you’ll use a variation of BNF that matches the requirements of the BNF Playground site, which you’ll use for testing your rules.
BNF Rules and Their Components
As you already learned, by combining terminals and nonterminals, you can create BNF rules. These rules typically follow the syntax below:
<symbol> ::= expression
In the BNF rule syntax, you have the following parts:
<symbol>
is a nonterminal variable, which is often enclosed in angle brackets (<>
).::=
means that the nonterminal on the left will be replaced with the expression on the right.expression
consists of a series of terminals, nonterminals, and other symbols that define a specific piece of grammar.
When building BNF rules, you can use a variety of symbols with specific meanings. For example, if you’re going to use the BNF Playground site to compile and test your rules, then you’ll find yourself using some of the following symbols:
Symbol | Meaning |
---|---|
"" |
Encloses a terminal symbol |
<> |
Indicates a nonterminal symbol |
() |
Indicates a group of valid options |
+ |
Specifies one or more of the previous element |
* |
Specifies zero or more of the previous element |
? |
Specifies zero or one occurrence of the previous element |
| |
Indicates that you can select one of the options |
[x-z] |
Indicates letter or digit intervals |
Once you know how to write a BNF rule and what symbols to use, you can start creating your own rules. Note that the BNF Playground has several additional symbols and syntactical constructs that you can use in your rules. For a complete reference, click the Grammar Help section at the top of the page.
Now, it’s time to start playing with a couple of custom BNF rules. To kick things off, you’ll start with a generic example.
A Generic Example: Grammar for a Full Name
Say that you need to create a context-free grammar to define how a user should input a person’s full name. In this case, the full name will have three components:
- First name
- Middle name
- Family name
Between each component, you need to place exactly one whitespace. You should also treat the middle name as optional. Here’s how you can define this rule:
<full_name> ::= <first_name> " " (<middle_name> " ")? <family_name>
The left-hand part of your BNF rule is a nonterminal variable that identifies the person’s full name. The ::=
symbol denotes that <full_name>
will be replaced with the right-hand part of the rule.
The right-hand part of the rule has several components. First, you have the first name, which you define using the <first_name>
nonterminal. Next, you need a space to separate the first name from the following component. To define this space, you use a terminal, which consists of a space character between quotes.
After the first name, you can accept a middle name, and after that, you need another space. So, you open parentheses to group these two elements. Then you create <middle_name>
and the " "
terminal. Both are optional, so you use a question mark (?
) after to denote that condition.
Finally, you need the family name. To define this component, you use another nonterminal, <family_name>
. That’s it! You’ve built your first BNF rule. However, you still don’t have a working grammar. You only have a root rule.
To complete the grammar, you need to define rules for <first_name>
, <middle_name>
, and <family_name>
. To do this, you need to meet some requirements:
- Each name component will accept only letters.
- Each name component will start with a capital letter and continue with lowercase letters.
In this case, you can start by defining two rules, one for uppercase letters and one for lowercase letters:
<full_name> ::= <first_name> " " (<middle_name> " ")? <family_name>
<uppercase_letter> ::= [A-Z]
<lowercase_letter> ::= [a-z]
In the highlighted lines of this grammar snippet, you create two pretty similar rules. The first rule accepts all the ASCII letters from uppercase A to Z. The second rule accepts all the lowercase letters. In this example, you don’t support accents or other non-ASCII letters.
With these rules in place, you can build the rest of your rules. To kick things off, go ahead and add the <first_name>
rule:
<full_name> ::= <first_name> " " (<middle_name> " ")? <family_name>
<uppercase_letter> ::= [A-Z]
<lowercase_letter> ::= [a-z]
<first_name> ::= <uppercase_letter> <lowercase_letter>*
To define the <first_name>
rule, you start with the <uppercase_letter>
nonterminal to express that the first letter of the name must be an uppercase letter. Then, you continue with the <lowercase_letter>
nonterminal followed by an asterisk (*
). This asterisk means that the first name will accept zero or more lowercase letters after the initial uppercase letter.
You can follow this same pattern to build the <middle_name>
and <family_name>
rules. Would you like to give it a try? Once you’re done, click the collapsible section below to get the complete grammar so that you can compare it with yours:
<full_name> ::= <first_name> " " (<middle_name> " ")? <family_name>
<uppercase_letter> ::= [A-Z]
<lowercase_letter> ::= [a-z]
<first_name> ::= <uppercase_letter> <lowercase_letter>*
<middle_name> ::= <uppercase_letter> <lowercase_letter>*
<family_name> ::= <uppercase_letter> <lowercase_letter>*
You can check if your full name grammar works using the BNF Playground site. Here’s a demo:
Once you navigate to the BNF Playground site, you can paste your grammar rules in the text input area. Then press the COMPILE BNF button. If everything is okay with your BNF rules, then you can enter a full name in the Test a string here! input field. Once you’ve entered a person’s full name, the field will turn green if the input string fulfills the rules.
A Programming-Related Example: Identifiers
In the previous section, you learned how to create a BNF grammar that defines how your users must provide a person’s name. This is a generic example that may or may not relate to programming. In this section, you’ll get more technical by writing a short set of BNF rules to validate an identifier in a hypothetical programming language.
An identifier can be a variable, function, class, or an object’s name. In your example, you’ll write a set of rules to check whether a given string meets the following requirements:
- The first character is an uppercase or lowercase letter or an underscore.
- The rest of the characters can be uppercase or lowercase letters, digits, or underscores.
Here’s the root rule for your identifier:
<identifier> ::= <char> (<char> | <digit>)*
In this rule, you have the <identifier>
nonterminal variable, which defines the root. On the right-hand side, you first have the <char>
nonterminal. The rest of the identifier is grouped inside parentheses. The asterisk after the group says that elements from the group can appear zero or more times. Each such element is either a character or a digit.
Now, you need to define the <char>
and <digit>
nonterminals with their own dedicated rules. They’ll look like in the code below:
<identifier> ::= <char> (<char> | <digit>)*
<char> ::= [A-Z] | [a-z] | "_"
<digit> ::= [0-9]
The <char>
rule accepts one ASCII letter in either lowercase or uppercase. Alternatively, it can accept an underscore. Finally, the <digit>
rule accepts a digit from 0 to 9. Now, your set of rules is complete. Go ahead and give it a try on the BNF Playground site.
For you as a programmer, reading BNF rules can be a pretty useful skill. For example, you’ll often find that the official documentation of many programming languages includes the BNF grammar of the languages, in whole or in part. So, being able to read BNF allows you to better understand the language syntax and intricacies.
From this point on, you’ll learn how to read Python’s BNF variation, which you’ll find in several parts of the language documentation.
Understanding Python’s BNF Variation
Python uses a custom variation of the BNF notation to define the language’s grammar. In many parts of the Python documentation, you’ll find portions of BNF grammar. These snippets can help you better understand any syntactic construct that you’re studying.
Python’s BNF variation uses the following style:
Symbol | Meaning |
---|---|
name |
Holds the name of a rule or nonterminal |
::= |
Means expand into |
| |
Separates alternatives |
* |
Accepts zero or more repetitions of the preceding item |
+ |
Accepts one or more repetitions of the preceding item |
[] |
Accepts zero or one occurrence, which means that the enclosed item is optional |
() |
Groups options |
"" |
Defines literal strings |
space | Is only meaningful to separate tokens |
These symbols define Python’s BNF variation. One notable difference from what regular BNF rules look like is that Python doesn’t use angle brackets (<>
) to enclose nonterminal symbols. It only uses the nonterminal identifier or name. Arguably, this makes rules cleaner and more readable.
Also note that the square brackets ([]
) have a different meaning for Python. Up to this point, you’ve used them to enclose sets of characters like [a-z]
. In Python, these brackets mean that the enclosed element is optional. To define something like [a-z]
in Python’s BNF variation, you’ll use "a"..."z"
instead.
You’ll find many BNF snippets in the Python documentation. Learning how to navigate and read them is quite a useful skill for you as a Python developer. So, in the following sections, you’ll explore a few examples of BNF rules from the Python documentation, and you’ll learn how to read them.
Reading BNF Rules From Python’s Documentation: Examples
Now that you know the basics of reading the BNF notation and you’ve learned the characteristics of Python’s BNF variation, it’s time for you to start reading some BNF grammar from the Python documentation. This way, you’ll build the required skills to take advantage of this notation to learn more about Python and its syntax.
The pass
and return
Statements
To kick things off, you’ll start with the pass
statement, which is a simple statement that allows you to do nothing in Python. The BNF notation for this statement is like the following:
pass_stmt ::= "pass"
Here, you have the name of the rule, pass_stmt
. Then you have the ::=
symbol to indicate that the rule expands to "pass"
, which is a terminal symbol. This means that this statement consists of the pass
keyword on its own. There are no additional syntactical components. So, you end up knowing the syntax for the pass
statement:
pass
The BNF rule for the pass
statement is one of the simplest rules that you’ll find in the documentation. It only contains a terminal that defines the syntax straightforwardly.
Another common statement that you’ll often use in your day-to-day coding is return
. This statement is a bit more complex than pass
. Here’s the BNF rule for return
from the documentation:
return_stmt ::= "return" [expression_list]
In this case, you have the rule’s name, return_stmt
, and the ::=
as usual. Then, you have a terminal symbol consisting of the word return
. The second component of this rule is an optional list of expressions, expression_list
. You know that this second component is optional because it’s enclosed in square brackets.
Having an optional list of expressions after the word return
is consistent with the fact that Python allows return
statements without an explicit return value. In this case, the language automatically returns None
, which is Python’s null value:
>>> def func():
... return
...
>>> print(func())
None
This toy function uses a bare return
without providing an explicit return value. In this case, Python automatically returns None
for you.
Now, if you click the expression_list
variable on the documentation, then you’ll land on the rule below:
expression_list ::= expression ("," expression)* [","]
Again, you have the rule’s name and the ::=
symbol. Then, you have a required nonterminal variable, expression
. This nonterminal symbol has its own definition rule, which you can access by clicking on the symbol itself.
Up to this point, you have the syntax of a return
statement with a single return value:
>>> def func():
... return "Hello!"
...
>>> func()
'Hello!'
In this example, you use the "Hello!"
string as the return value of your function. Note that the return value can be any Python object or expression.
The rule continues by opening parentheses. Remember that BNF uses parentheses to group objects. In this case, you have a terminal consisting of a comma (","
), and then you have the expression
symbol again. The asterisk after the closing parentheses indicates that this construct can appear zero or more times.
This part of the rule describes those return statements with multiple return values:
>>> def func():
... return "Hello!", "Pythonista!"
...
>>> func()
('Hello!', 'Pythonista!')
Now, your function returns two values. To do this, you provide a comma-separated series of values. When you call the function, you get a tuple of values.
The final part of the rule is [","]
. This tells you that the list of expressions can include an optional trailing comma. This comma may cause tricky results:
>>> def func():
... return "Hello!",
...
>>> func()
('Hello!',)
In this example, you use the trailing comma after a single return value. As a result, your function returns a tuple with a single item. However, note that the comma doesn’t cause any effect if you already have multiple comma-separated values:
>>> def func():
... return "Hello!", "Pythonista!",
...
>>> func()
('Hello!', 'Pythonista!')
In this example, you add a trailing comma to a return statement with multiple return values. Again, you get a tuple of values when you call the function.
Assignment Expressions
Another interesting BNF snippet that you can find in the Python documentation is the one that defines the syntax of assignment expressions, which you build with the walrus operator. Here’s the root BNF rule for this type of expression:
assignment_expression ::= [identifier ":="] expression
The right-hand part of this rule starts with an optional component that includes a nonterminal called identifier
and a terminal consisting of the ":="
symbol. This symbol is the walrus operator itself. Then, you have a required expression.
Note: At first glance, it may be weird that the assignment part is optional, as the whole point of an assignment expression is the assignment itself. However, making this part optional greatly simplifies many of the grammar rules because an assignment expression is allowed almost everywhere a plain expression is. You’ll see an example of this simplification in the following section.
This matches the syntax of an assignment expression with the walrus operator:
identifier := expression
Note that in an assignment expression, the assignment part is optional. You’ll get the same value out of evaluating the expression whether you perform the assignment or not.
Here’s a working example of an assignment expression:
>>> (length := len([1, 2, 3]))
3
>>> length
3
In this example, you create an assignment expression that assigns the number of items in a list to the length
variable.
Note that you’ve enclosed the expression in parentheses. Otherwise, it’ll fail with a SyntaxError
exception. Check out the Walrus Operator Syntax section from The Walrus Operator: Python’s Assignment Expressions to figure out why you need the parentheses.
Conditional Statements
Now that you’ve learned how to read the BNF rules for simple expressions, you can jump into compound statements. Conditional statements are pretty common in any piece of Python code. The Python documentation provides the BNF rule for this type of statement:
if_stmt ::= "if" assignment_expression ":" suite
("elif" assignment_expression ":" suite)*
["else" ":" suite]
When you start reading this rule, you immediately find the "if"
terminal symbol, which you must use to start any conditional statement. Then, you find the assignment_expression
nonterminal, which you already studied in the previous section.
Note: The if_stmt
rule uses the assignment_expression
nonterminal to define the condition. This allows you to use either an assignment expression or a plain expression in the condition. Remember that the assignment part is optional in assignment_expression
.
Next, you have the ":"
terminal. This is the colon that you need to use at the end of a compound statement’s header. This colon denotes that the statement’s header is complete. Finally, you have a required nonterminal called suite
, which is a set of indented statements.
Following this first part of the rule, you end up with the following Python syntax:
if assignment_expression:
suite
This is a bare-bones if
statement. It starts with the if
keyword. Then, you have an expression that Python evaluates for truth value. Finally, you have a colon that opens the possibility to have an indented block that works as the suite.
The second line of the BNF rule defines the syntax of elif
clauses. In this line, you have the elif
keyword as a terminal symbol. Then, you have an expression, a colon, and again, a suite of indented code:
if assignment_expression:
suite
elif assignment_expression:
suite
You can have zero or more elif
clauses in a conditional statement, which you know because of the asterisk after the closing parentheses. All of them will follow the same syntax.
The final part of the conditional BNF rule is the else
clause, which consists of the else
keyword followed by a colon and an indented suite of code. Here’s how this translates to Python syntax:
if assignment_expression:
suite
elif assignment_expression:
suite
else:
suite
The else
clause is also optional in Python. In the BNF rule, you know that because of the square brackets surrounding the final line of the rule.
Here’s a toy example of a working conditional statement:
>>> def read_temperature():
... return 25
...
>>> if (temperature := read_temperature()) < 10:
... print("The weather is cold!")
... elif 10 <= temperature <= 25:
... print("The weather is nice!")
... else:
... print("The weather is hot!")
...
The weather is nice!
In the if
clause, you use an assignment expression to grab the current temperature value. Then, you compare the current value with 10
. Next, you reuse the temperature value to create the expression in the elif
clause. Finally, you have the else
clause for those cases where the temperature is hot.
Loop Constructs
Loops are another commonly used compound statement in Python. You have two different loop statements in Python:
The BNF grammar for Python’s for
loop is the following:
for_stmt ::= "for" target_list "in" starred_list ":" suite
["else" ":" suite]
The first line defines the loop header, which starts with the "for"
terminal. Then you have the target_list
nonterminal. In short, this nonterminal represents the loop variable or variables.
Next, you have the "in"
terminal, which represents the in
keyword. The starred_list
nonterminal symbol represents an iterable object. Finally, you have a colon that gives a pass to an indented block of code, suite
.
Note: Python’s grammar is in constant evolution. For example, in Python 3.10, the for
loop rule was written as:
for_stmt ::= "for" target_list "in" expression_list ":" suite
["else" ":" suite]
Here, instead of starred_list
, you have expression_list
. In Python 3.11, starred lists became valid in for
loops. So, the grammar changed.
Again, you can click any nonterminal symbol to navigate to its defining BNF rule and dive deeper into its definition and syntax. For example, if you click the target_list
symbol, then you’ll be presented with the following BNF rules:
target_list ::= target ("," target)* [","]
target ::= identifier
| "(" [target_list] ")"
| "[" [target_list] "]"
| attributeref
| subscription
| slicing
| "*" target
In the first line, you can see that target_list
consists of one or more target
objects separated by commas. This list can include an optional trailing comma, which doesn’t alter the result. In practice, target objects can be an identifier
(variable), a tuple, a list, or any other of the provided options. The pipe characters (|
) let you know that all these values are separate alternatives.
The second line of the BNF rule for a for
loop defines the syntax of the loop’s else
clause. This clause is optional, which you learned from the enclosing square brackets. The line consists of the "else"
terminal, followed by a colon and a suite of indented code.
You can translate the above BNF rule to the following Python syntax:
for target_list in starred_list:
suite
else:
suite
The loop has a series of comma-separated loop variables in target_list
and an iterable of data represented by starred_list
.
Here’s a quick example of a for
loop:
>>> high = 5
>>> for number in range(high):
... if number > 5:
... break
... print(number)
... else:
... print("range covered")
...
0
1
2
3
4
range covered
This loop iterates over a range of numbers that goes from 0
to high
. In this example, high
is 5
, so the break
statement doesn’t run, and the else
clause runs at the end of the loop. If you change the value of high
to 10
, then the break
statement will run, and the else
clause won’t.
Note: It doesn’t make sense to have a loop with an else
clause if the loop’s main suite doesn’t have a break
statement. If you find yourself in this situation, then remove the else:
header and unindent its suite.
When it comes to while
loops, their BNF rule is the following:
while_stmt ::= "while" assignment_expression ":" suite
["else" ":" suite]
Python’s while loops start with the while
keyword, which is the first component in the right-hand part of the rule. Then, you need an assignment_expression
, a colon, and a suite of indented code:
while assignment_expression:
suite
else:
suite
Note that the while
loop also has an optional else
clause that works the same as in for
loops. Can you come up with a working example of a while
loop?
Exploring Best Practices for Reading Python’s BNF
When you’re reading Python’s BNF rules in the documentation, you can follow a few best practices to improve your understanding. Here are a few recommendations for you:
- Familiarize yourself with the BNF notation: Familiarize yourself with its basic concepts and syntax. Understand terms such as nonterminal symbols, terminal symbols, production rules, and so on.
- Experiment and practice: Write small custom BNF rules and experiment with them using the BNF Playground site.
- Familiarize yourself with Python’s BNF variation: Learn about the symbols that Python uses to define its BNF variant. Knowing the symbols for grouping, expressing repetition, and optionality is a must-have skill.
- Break down the BNF rules: Break down BNF rules into smaller parts and analyze each component individually.
- Identify nonterminal symbols: Look for nonterminal symbols in the BNF rule. These symbols contain links that you can click to navigate to their definitions.
- Identify terminal symbols: Look for terminal symbols that represent specific elements in the language, such as keywords, operators, literals, or identifiers. These symbols are enclosed in quotes.
- Study examples: Study practical examples that correspond to the BNF rule that you’re trying to understand. Analyze how the BNF rule applies to those examples. Contrast the rule with the actual Python syntax.
- Review additional notes or explanations: Read additional notes provided in the documentation for the BNF rules that you’re studying.
If you apply these recommendations to your BNF reading adventure, then you’ll feel way more comfortable with them. You’ll be able to better understand the rules and improve your Python skills in the process.
Conclusion
Now you know what BNF notation is and how Python uses it in the official documentation. You’ve learned the basics of Python’s version of the BNF notation and how to read it. This is a fairly advanced skill that will help you better understand the language’s syntax and grammar.
In this tutorial, you’ve:
- Learned what BNF notation is and what it’s used for
- Understood Python’s BNF variation
- Read some practical examples of BNF grammar in the Python docs
- Identified some best practices for reading Python’s BNF variation
Knowing how to read the BNF notation in the Python documentation will give you a better and deeper understanding of Python’s syntax and grammar. Go for it!
Get Your Code: Click here to download the free sample code that shows you how to read Python’s BNF notation.