BNF Notation: Dive Deeper Into Python's Grammar

BNF Notation: Dive Deeper Into Python's Grammar

by Leodanis Pozo Ramos Feb 14, 2024 advanced python

While reading the Python documentation, you may have found fragments of BNF notation (Backus–Naur form) that look something like the following:

BNF Grammar
name      ::= lc_letter (lc_letter | "_")*
lc_letter ::= "a"..."z"

What’s the meaning of all this strange code? How can this help you in understanding Python concepts? How can you read and interpret this notation?

In this tutorial, you’ll get to know the basics of Python’s BNF notation and learn how to take advantage of it to get a deep understanding of the language’s syntax and grammar.

In this tutorial, you’ll:

  • Learn what BNF notation is and what it’s used for
  • Explore the characteristics of Python’s BNF variation
  • Learn how to read the BNF notation in the Python documentation
  • Explore some best practices for reading Python’s BNF notation

To get the most out of this tutorial, you should be familiar with Python syntax, including keywords, operators, and some common constructs like expressions, conditional statements, and loops.

Getting to Know Backus-Naur Form Notation (BNF)

The Backus–Naur form or Backus normal form (BNF) is a metasyntax notation for context-free grammars. Computer scientists often use this notation to describe the syntax of programming languages because it allows them to write a detailed description of a language’s grammar.

The BNF notation consists of three core pieces:

Component Description Examples
Terminals Strings that must exactly match specific items in the input. "def", "return", ":"
Nonterminals Symbols that will be replaced by a concrete value. They may also be called simply syntactic variables. <letter>, <digit>
Rules Conventions of terminals and nonterminals that define how these elements relate. <letter> ::= "a"

By combining terminals and nonterminals, you can create BNF rules, which can get as detailed as you need. Nonterminals must have their own defining rules. In a piece of grammar, you’ll have a root rule and potentially many secondary rules that define the required nonterminals. This way, you may end up with a hierarchy of rules.

BNF rules are the core components of a BNF grammar. So, a grammar is a set of BNF rules that are also called production rules.

In practice, you can build a set of BNF rules to specify the grammar of a language. Here, language refers to a set of strings that are valid according to the rules defined in the corresponding grammar. BNF is mainly used for programming languages.

For example, the Python syntax has a grammar that’s defined as a set of BNF rules, and these rules are used to validate the syntax of any piece of Python code. If the code doesn’t fulfill the rules, then you’ll get a SyntaxError.

You’ll find many variations of the original BNF notation out there. Some of the most relevant include the extended Backus–Naur form (EBNF) and augmented Backus–Naur form (ABNF).

In the following sections, you’ll learn the basics of creating BNF rules. Note that you’ll use a variation of BNF that matches the requirements of the BNF Playground site, which you’ll use for testing your rules.

BNF Rules and Their Components

As you already learned, by combining terminals and nonterminals, you can create BNF rules. These rules typically follow the syntax below:

BNF Grammar
<symbol> ::= expression

In the BNF rule syntax, you have the following parts:

  • <symbol> is a nonterminal variable, which is often enclosed in angle brackets (<>).
  • ::= means that the nonterminal on the left will be replaced with the expression on the right.
  • expression consists of a series of terminals, nonterminals, and other symbols that define a specific piece of grammar.

When building BNF rules, you can use a variety of symbols with specific meanings. For example, if you’re going to use the BNF Playground site to compile and test your rules, then you’ll find yourself using some of the following symbols:

Symbol Meaning
"" Encloses a terminal symbol
<> Indicates a nonterminal symbol
() Indicates a group of valid options
+ Specifies one or more of the previous element
* Specifies zero or more of the previous element
? Specifies zero or one occurrence of the previous element
| Indicates that you can select one of the options
[x-z] Indicates letter or digit intervals

Once you know how to write a BNF rule and what symbols to use, you can start creating your own rules. Note that the BNF Playground has several additional symbols and syntactical constructs that you can use in your rules. For a complete reference, click the Grammar Help section at the top of the page.

Now, it’s time to start playing with a couple of custom BNF rules. To kick things off, you’ll start with a generic example.

A Generic Example: Grammar for a Full Name

Say that you need to create a context-free grammar to define how a user should input a person’s full name. In this case, the full name will have three components:

  1. First name
  2. Middle name
  3. Family name

Between each component, you need to place exactly one whitespace. You should also treat the middle name as optional. Here’s how you can define this rule:

BNF Grammar
<full_name> ::= <first_name> " " (<middle_name> " ")? <family_name>

The left-hand part of your BNF rule is a nonterminal variable that identifies the person’s full name. The ::= symbol denotes that <full_name> will be replaced with the right-hand part of the rule.

The right-hand part of the rule has several components. First, you have the first name, which you define using the <first_name> nonterminal. Next, you need a space to separate the first name from the following component. To define this space, you use a terminal, which consists of a space character between quotes.

After the first name, you can accept a middle name, and after that, you need another space. So, you open parentheses to group these two elements. Then you create <middle_name> and the " " terminal. Both are optional, so you use a question mark (?) after to denote that condition.

Finally, you need the family name. To define this component, you use another nonterminal, <family_name>. That’s it! You’ve built your first BNF rule. However, you still don’t have a working grammar. You only have a root rule.

To complete the grammar, you need to define rules for <first_name>, <middle_name>, and <family_name>. To do this, you need to meet some requirements:

  • Each name component will accept only letters.
  • Each name component will start with a capital letter and continue with lowercase letters.

In this case, you can start by defining two rules, one for uppercase letters and one for lowercase letters:

BNF Grammar
<full_name>        ::= <first_name> " " (<middle_name> " ")? <family_name>
<uppercase_letter> ::= [A-Z]
<lowercase_letter> ::= [a-z]

In the highlighted lines of this grammar snippet, you create two pretty similar rules. The first rule accepts all the ASCII letters from uppercase A to Z. The second rule accepts all the lowercase letters. In this example, you don’t support accents or other non-ASCII letters.

With these rules in place, you can build the rest of your rules. To kick things off, go ahead and add the <first_name> rule:

BNF Grammar
<full_name>        ::= <first_name> " " (<middle_name> " ")? <family_name>
<uppercase_letter> ::= [A-Z]
<lowercase_letter> ::= [a-z]
<first_name>       ::= <uppercase_letter> <lowercase_letter>*

To define the <first_name> rule, you start with the <uppercase_letter> nonterminal to express that the first letter of the name must be an uppercase letter. Then, you continue with the <lowercase_letter> nonterminal followed by an asterisk (*). This asterisk means that the first name will accept zero or more lowercase letters after the initial uppercase letter.

You can follow this same pattern to build the <middle_name> and <family_name> rules. Would you like to give it a try? Once you’re done, click the collapsible section below to get the complete grammar so that you can compare it with yours:

BNF Grammar
<full_name>        ::= <first_name> " " (<middle_name> " ")? <family_name>
<uppercase_letter> ::= [A-Z]
<lowercase_letter> ::= [a-z]
<first_name>       ::= <uppercase_letter> <lowercase_letter>*
<middle_name>      ::= <uppercase_letter> <lowercase_letter>*
<family_name>      ::= <uppercase_letter> <lowercase_letter>*

You can check if your full name grammar works using the BNF Playground site. Here’s a demo:

Once you navigate to the BNF Playground site, you can paste your grammar rules in the text input area. Then press the COMPILE BNF button. If everything is okay with your BNF rules, then you can enter a full name in the Test a string here! input field. Once you’ve entered a person’s full name, the field will turn green if the input string fulfills the rules.

In the previous section, you learned how to create a BNF grammar that defines how your users must provide a person’s name. This is a generic example that may or may not relate to programming. In this section, you’ll get more technical by writing a short set of BNF rules to validate an identifier in a hypothetical programming language.

An identifier can be a variable, function, class, or an object’s name. In your example, you’ll write a set of rules to check whether a given string meets the following requirements:

  • The first character is an uppercase or lowercase letter or an underscore.
  • The rest of the characters can be uppercase or lowercase letters, digits, or underscores.

Here’s the root rule for your identifier:

BNF Grammar
<identifier> ::= <char> (<char> | <digit>)*

In this rule, you have the <identifier> nonterminal variable, which defines the root. On the right-hand side, you first have the <char> nonterminal. The rest of the identifier is grouped inside parentheses. The asterisk after the group says that elements from the group can appear zero or more times. Each such element is either a character or a digit.

Now, you need to define the <char> and <digit> nonterminals with their own dedicated rules. They’ll look like in the code below:

BNF Grammar
<identifier> ::= <char> (<char> | <digit>)*
<char>       ::= [A-Z] | [a-z] | "_"
<digit>      ::= [0-9]

The <char> rule accepts one ASCII letter in either lowercase or uppercase. Alternatively, it can accept an underscore. Finally, the <digit> rule accepts a digit from 0 to 9. Now, your set of rules is complete. Go ahead and give it a try on the BNF Playground site.

For you as a programmer, reading BNF rules can be a pretty useful skill. For example, you’ll often find that the official documentation of many programming languages includes the BNF grammar of the languages, in whole or in part. So, being able to read BNF allows you to better understand the language syntax and intricacies.

From this point on, you’ll learn how to read Python’s BNF variation, which you’ll find in several parts of the language documentation.

Understanding Python’s BNF Variation

Python uses a custom variation of the BNF notation to define the language’s grammar. In many parts of the Python documentation, you’ll find portions of BNF grammar. These snippets can help you better understand any syntactic construct that you’re studying.

Python’s BNF variation uses the following style:

Symbol Meaning
name Holds the name of a rule or nonterminal
::= Means expand into
| Separates alternatives
* Accepts zero or more repetitions of the preceding item
+ Accepts one or more repetitions of the preceding item
[] Accepts zero or one occurrence, which means that the enclosed item is optional
() Groups options
"" Defines literal strings
space Is only meaningful to separate tokens

These symbols define Python’s BNF variation. One notable difference from what regular BNF rules look like is that Python doesn’t use angle brackets (<>) to enclose nonterminal symbols. It only uses the nonterminal identifier or name. Arguably, this makes rules cleaner and more readable.

Also note that the square brackets ([]) have a different meaning for Python. Up to this point, you’ve used them to enclose sets of characters like [a-z]. In Python, these brackets mean that the enclosed element is optional. To define something like [a-z] in Python’s BNF variation, you’ll use "a"..."z" instead.

You’ll find many BNF snippets in the Python documentation. Learning how to navigate and read them is quite a useful skill for you as a Python developer. So, in the following sections, you’ll explore a few examples of BNF rules from the Python documentation, and you’ll learn how to read them.

Reading BNF Rules From Python’s Documentation: Examples

Now that you know the basics of reading the BNF notation and you’ve learned the characteristics of Python’s BNF variation, it’s time for you to start reading some BNF grammar from the Python documentation. This way, you’ll build the required skills to take advantage of this notation to learn more about Python and its syntax.

The pass and return Statements

To kick things off, you’ll start with the pass statement, which is a simple statement that allows you to do nothing in Python. The BNF notation for this statement is like the following:

BNF Grammar
pass_stmt ::=  "pass"

Here, you have the name of the rule, pass_stmt. Then you have the ::= symbol to indicate that the rule expands to "pass", which is a terminal symbol. This means that this statement consists of the pass keyword on its own. There are no additional syntactical components. So, you end up knowing the syntax for the pass statement:

Python
pass

The BNF rule for the pass statement is one of the simplest rules that you’ll find in the documentation. It only contains a terminal that defines the syntax straightforwardly.

Another common statement that you’ll often use in your day-to-day coding is return. This statement is a bit more complex than pass. Here’s the BNF rule for return from the documentation:

BNF Grammar
return_stmt ::= "return" [expression_list]

In this case, you have the rule’s name, return_stmt, and the ::= as usual. Then, you have a terminal symbol consisting of the word return. The second component of this rule is an optional list of expressions, expression_list. You know that this second component is optional because it’s enclosed in square brackets.

Having an optional list of expressions after the word return is consistent with the fact that Python allows return statements without an explicit return value. In this case, the language automatically returns None, which is Python’s null value:

Python
>>> def func():
...     return
...

>>> print(func())
None

This toy function uses a bare return without providing an explicit return value. In this case, Python automatically returns None for you.

Now, if you click the expression_list variable on the documentation, then you’ll land on the rule below:

BNF Grammar
expression_list ::= expression ("," expression)* [","]

Again, you have the rule’s name and the ::= symbol. Then, you have a required nonterminal variable, expression. This nonterminal symbol has its own definition rule, which you can access by clicking on the symbol itself.

Up to this point, you have the syntax of a return statement with a single return value:

Python
>>> def func():
...     return "Hello!"
...

>>> func()
'Hello!'

In this example, you use the "Hello!" string as the return value of your function. Note that the return value can be any Python object or expression.

The rule continues by opening parentheses. Remember that BNF uses parentheses to group objects. In this case, you have a terminal consisting of a comma (","), and then you have the expression symbol again. The asterisk after the closing parentheses indicates that this construct can appear zero or more times.

This part of the rule describes those return statements with multiple return values:

Python
>>> def func():
...     return "Hello!", "Pythonista!"
...

>>> func()
('Hello!', 'Pythonista!')

Now, your function returns two values. To do this, you provide a comma-separated series of values. When you call the function, you get a tuple of values.

The final part of the rule is [","]. This tells you that the list of expressions can include an optional trailing comma. This comma may cause tricky results:

Python
>>> def func():
...     return "Hello!",
...

>>> func()
('Hello!',)

In this example, you use the trailing comma after a single return value. As a result, your function returns a tuple with a single item. However, note that the comma doesn’t cause any effect if you already have multiple comma-separated values:

Python
>>> def func():
...     return "Hello!", "Pythonista!",
...

>>> func()
('Hello!', 'Pythonista!')

In this example, you add a trailing comma to a return statement with multiple return values. Again, you get a tuple of values when you call the function.

Assignment Expressions

Another interesting BNF snippet that you can find in the Python documentation is the one that defines the syntax of assignment expressions, which you build with the walrus operator. Here’s the root BNF rule for this type of expression:

BNF Grammar
assignment_expression ::=  [identifier ":="] expression

The right-hand part of this rule starts with an optional component that includes a nonterminal called identifier and a terminal consisting of the ":=" symbol. This symbol is the walrus operator itself. Then, you have a required expression.

This matches the syntax of an assignment expression with the walrus operator:

Python
identifier := expression

Note that in an assignment expression, the assignment part is optional. You’ll get the same value out of evaluating the expression whether you perform the assignment or not.

Here’s a working example of an assignment expression:

Python
>>> (length := len([1, 2, 3]))
3
>>> length
3

In this example, you create an assignment expression that assigns the number of items in a list to the length variable.

Note that you’ve enclosed the expression in parentheses. Otherwise, it’ll fail with a SyntaxError exception. Check out the Walrus Operator Syntax section from The Walrus Operator: Python 3.8 Assignment Expressions to figure out why you need the parentheses.

Conditional Statements

Now that you’ve learned how to read the BNF rules for simple expressions, you can jump into compound statements. Conditional statements are pretty common in any piece of Python code. The Python documentation provides the BNF rule for this type of statement:

BNF Grammar
if_stmt ::=  "if" assignment_expression ":" suite
             ("elif" assignment_expression ":" suite)*
             ["else" ":" suite]

When you start reading this rule, you immediately find the "if" terminal symbol, which you must use to start any conditional statement. Then, you find the assignment_expression nonterminal, which you already studied in the previous section.

Next, you have the ":" terminal. This is the colon that you need to use at the end of a compound statement’s header. This colon denotes that the statement’s header is complete. Finally, you have a required nonterminal called suite, which is a set of indented statements.

Following this first part of the rule, you end up with the following Python syntax:

Python
if assignment_expression:
    suite

This is a bare-bones if statement. It starts with the if keyword. Then, you have an expression that Python evaluates for truth value. Finally, you have a colon that opens the possibility to have an indented block that works as the suite.

The second line of the BNF rule defines the syntax of elif clauses. In this line, you have the elif keyword as a terminal symbol. Then, you have an expression, a colon, and again, a suite of indented code:

Python
if assignment_expression:
    suite
elif assignment_expression:
    suite

You can have zero or more elif clauses in a conditional statement, which you know because of the asterisk after the closing parentheses. All of them will follow the same syntax.

The final part of the conditional BNF rule is the else clause, which consists of the else keyword followed by a colon and an indented suite of code. Here’s how this translates to Python syntax:

Python
if assignment_expression:
    suite
elif assignment_expression:
    suite
else:
    suite

The else clause is also optional in Python. In the BNF rule, you know that because of the square brackets surrounding the final line of the rule.

Here’s a toy example of a working conditional statement:

Python
>>> def read_temperature():
...     return 25
...

>>> if (temperature := read_temperature()) < 10:
...     print("The weather is cold!")
... elif 10 <= temperature <= 25:
...     print("The weather is nice!")
... else:
...     print("The weather is hot!")
...
The weather is nice!

In the if clause, you use an assignment expression to grab the current temperature value. Then, you compare the current value with 10. Next, you reuse the temperature value to create the expression in the elif clause. Finally, you have the else clause for those cases where the temperature is hot.

Loop Constructs

Loops are another commonly used compound statement in Python. You have two different loop statements in Python:

The BNF grammar for Python’s for loop is the following:

BNF Grammar
for_stmt ::=  "for" target_list "in" starred_list ":" suite
              ["else" ":" suite]

The first line defines the loop header, which starts with the "for" terminal. Then you have the target_list nonterminal. In short, this nonterminal represents the loop variable or variables.

Next, you have the "in" terminal, which represents the in keyword. The starred_list nonterminal symbol represents an iterable object. Finally, you have a colon that gives a pass to an indented block of code, suite.

Again, you can click any nonterminal symbol to navigate to its defining BNF rule and dive deeper into its definition and syntax. For example, if you click the target_list symbol, then you’ll be presented with the following BNF rules:

BNF Grammar
target_list ::=  target ("," target)* [","]
target      ::=  identifier
                 | "(" [target_list] ")"
                 | "[" [target_list] "]"
                 | attributeref
                 | subscription
                 | slicing
                 | "*" target

In the first line, you can see that target_list consists of one or more target objects separated by commas. This list can include an optional trailing comma, which doesn’t alter the result. In practice, target objects can be an identifier (variable), a tuple, a list, or any other of the provided options. The pipe characters (|) let you know that all these values are separate alternatives.

The second line of the BNF rule for a for loop defines the syntax of the loop’s else clause. This clause is optional, which you learned from the enclosing square brackets. The line consists of the "else" terminal, followed by a colon and a suite of indented code.

You can translate the above BNF rule to the following Python syntax:

Python
for target_list in starred_list:
    suite
else:
    suite

The loop has a series of comma-separated loop variables in target_list and an iterable of data represented by starred_list.

Here’s a quick example of a for loop:

Python
>>> high = 5

>>> for number in range(high):
...     if number > 5:
...         break
...     print(number)
... else:
...     print("range covered")
...
0
1
2
3
4
range covered

This loop iterates over a range of numbers that goes from 0 to high. In this example, high is 5, so the break statement doesn’t run, and the else clause runs at the end of the loop. If you change the value of high to 10, then the break statement will run, and the else clause won’t.

When it comes to while loops, their BNF rule is the following:

BNF Grammar
while_stmt ::=  "while" assignment_expression ":" suite
                ["else" ":" suite]

Python’s while loops start with the while keyword, which is the first component in the right-hand part of the rule. Then, you need an assignment_expression, a colon, and a suite of indented code:

Python
while assignment_expression:
    suite
else:
    suite

Note that the while loop also has an optional else clause that works the same as in for loops. Can you come up with a working example of a while loop?

Exploring Best Practices for Reading Python’s BNF

When you’re reading Python’s BNF rules in the documentation, you can follow a few best practices to improve your understanding. Here are a few recommendations for you:

  1. Familiarize yourself with the BNF notation: Familiarize yourself with its basic concepts and syntax. Understand terms such as nonterminal symbols, terminal symbols, production rules, and so on.
  2. Experiment and practice: Write small custom BNF rules and experiment with them using the BNF Playground site.
  3. Familiarize yourself with Python’s BNF variation: Learn about the symbols that Python uses to define its BNF variant. Knowing the symbols for grouping, expressing repetition, and optionality is a must-have skill.
  4. Break down the BNF rules: Break down BNF rules into smaller parts and analyze each component individually.
  5. Identify nonterminal symbols: Look for nonterminal symbols in the BNF rule. These symbols contain links that you can click to navigate to their definitions.
  6. Identify terminal symbols: Look for terminal symbols that represent specific elements in the language, such as keywords, operators, literals, or identifiers. These symbols are enclosed in quotes.
  7. Study examples: Study practical examples that correspond to the BNF rule that you’re trying to understand. Analyze how the BNF rule applies to those examples. Contrast the rule with the actual Python syntax.
  8. Review additional notes or explanations: Read additional notes provided in the documentation for the BNF rules that you’re studying.

If you apply these recommendations to your BNF reading adventure, then you’ll feel way more comfortable with them. You’ll be able to better understand the rules and improve your Python skills in the process.

Conclusion

Now you know what BNF notation is and how Python uses it in the official documentation. You’ve learned the basics of Python’s version of the BNF notation and how to read it. This is a fairly advanced skill that will help you better understand the language’s syntax and grammar.

In this tutorial, you’ve:

  • Learned what BNF notation is and what it’s used for
  • Understood Python’s BNF variation
  • Read some practical examples of BNF grammar in the Python docs
  • Identified some best practices for reading Python’s BNF variation

Knowing how to read the BNF notation in the Python documentation will give you a better and deeper understanding of Python’s syntax and grammar. Go for it!

🐍 Python Tricks 💌

Get a short & sweet Python Trick delivered to your inbox every couple of days. No spam ever. Unsubscribe any time. Curated by the Real Python team.

Python Tricks Dictionary Merge

About Leodanis Pozo Ramos

Leodanis is an industrial engineer who loves Python and software development. He's a self-taught Python developer with 6+ years of experience. He's an avid technical writer with a growing number of articles published on Real Python and other sites.

» More about Leodanis

Each tutorial at Real Python is created by a team of developers so that it meets our high quality standards. The team members who worked on this tutorial are:

Master Real-World Python Skills With Unlimited Access to Real Python

Locked learning resources

Join us and get access to thousands of tutorials, hands-on video courses, and a community of expert Pythonistas:

Level Up Your Python Skills »

Master Real-World Python Skills
With Unlimited Access to Real Python

Locked learning resources

Join us and get access to thousands of tutorials, hands-on video courses, and a community of expert Pythonistas:

Level Up Your Python Skills »

What Do You Think?

Rate this article:

What’s your #1 takeaway or favorite thing you learned? How are you going to put your newfound skills to use? Leave a comment below and let us know.

Commenting Tips: The most useful comments are those written with the goal of learning from or helping out other students. Get tips for asking good questions and get answers to common questions in our support portal.


Looking for a real-time conversation? Visit the Real Python Community Chat or join the next “Office Hours” Live Q&A Session. Happy Pythoning!

Keep Learning

Related Tutorial Categories: advanced python