EBNF As a Mental Model to Learn Programming Languages
Learning how syntax can be expressed gives you an extra mental model to learn new programming languages.
What is and why learn EBNF?
At some point, everybody gets syntax errors: a misspelled keyword or a missing semicolon. Yet, have you ever wondered how did you learn the syntax rules? Most of the time, when you start to code, you don't learn all the syntax rules directly. The same way that when you begin to talk or write, you don't understand about subject, verb, direct object, etc. You know the syntax by doing.
But when you want to become proficient at writing, you need to learn and understand the syntax rules: what is a direct object, where is the subject, to identify the verb, etc. Therefore, if you want to become a proficient software engineer, you also need to learn the syntax rules of the programming languages.
EBNF is a language to describe the syntax rules of programming languages.
An example of Python grammar:
if_stmt:
| 'if' named_expression ':' block elif_stmt
| 'if' named_expression ':' block [else_block]
Syntax versus Semantics
The best example of syntax versus semantics is this sentence in English: "Colorless green ideas sleep furiously." This is a syntactically correct English sentence, but it has no meaning. The syntax is right, but not the semantics.
EBNF only helps to know the syntax, not the semantics. Yet, by understanding syntax, we learn to differentiate between syntax and semantics.
Another example in javascript is:
const a = 10;
a = 20;
The previous code is syntactically valid, yet the semantics of JS doesn't allow reassigning a const
variable.
Learning the difference between the syntax and semantics rules gives you more tools and models to learn new languages.
EBNF
Let's learn more about this language to describe grammar.
EBNF stands for Extended Backus-Narus Form. The Backus-Narus Form first appeared in 1959 in the definition of the ALGOL programming language. The author was John Backus.
EBNF definition of integer
Everything is better with examples. So let's look at the notation of integers in the EBNF.
digit := 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
integer := [+|-] digit { digit }
EBNF is a list of rules:
Left-hand side: The name of the rule.
Right-hand side: The description of the rule.
Note: EBNF is more of a family of notations, they do not always conform to the same characters to denote the same concepts, but the concepts are the same.
:=
denotes the separation of left and right sides|
means OR
The digit
rule is considered a terminal because it's composed only of tokens, not of other rules, like integer
. On the other hand, the rule integer
depends on the digit
terminal; therefore, it's considered a non-terminal rule.
[
and]
mean optional{
and}
mean zero or more,
In English, the rule for the integer
is read as: "An integer is defined as an optional symbol of +
or -
followed by a digit
and possibly by zero, one or more digit
s.
A few examples:
0 -> is an integer
-+10 -> is NOT an integer. Only
+
or-
are valid, not both.-50 -> is an integer
1.54 -> is not an integer. The dot is not correct in the EBNF rules.
EBNF for Python
The Python grammar is listed here in the official docs, but the notation does not follow precisely the EBNF rules. Like they mention in the docs: "The notation is a mixture of EBNF and PEG."
Let's review the example from the beginning:
if_stmt:
| 'if' named_expression ':' block elif_stmt
| 'if' named_expression ':' block [else_block]
We can see that it has two options:
'if' named_expression ':' block elif_stmt
'if' named_expression ':' block [else_block]
Let's take a look at the second one: 'if' named_expression ':' block [else_block]
.
It starts with the "if"
keyword, then a rule called named_expression
. Then a colon ":"
then a rule called block
and an optional else_block
.
The rule accepts the following:
if 10 < 20:
print(‘in da if”)
But not the following:
if: 10 < 20
print(‘in da if”)
EBNF for Javascript
We can find a notation of the JS grammar here. For example, the if
statement says the following:
IfStatement ::= "if" "(" Expression ")" Statement ( "else" Statement )?
The notation is also not precisely EBNF, but close enough to understand: "if" "(" Expression ")" Statement ( "else" Statement )?
It starts with the
"if"
keyword.Followed by an opening parenthesis
"("
.Then another rule:
Expression
.A closing parenthesis
")"
.Then another rule:
Statement
.Finally, an optional
"else"
keyword with another Statement rule.
A question you might be wondering is, isn't else if
allowed? The answer is yes, and it's actually accepted in this notation as well: The Statement rule can be another IfStatement
. This is why in JS, the notation is else if
. The else
is an optional keyword in the ifStatement
rule, and whatever comes with the new "if"
is considered a new statement syntactically.
This is a different approach than the python one, where the elif_stmt
was another statement.
EBNF As a Mental Model
You don't need to learn or even know all the rules of EBNF or any other notation. The important aspect is that you understand and have the tools to learn a new programming language faster.
Understanding how grammar is defined gives you a model to apply when learning a new language. You can better understand why something is allowed or not allowed and use it in other scenarios.
If you like this post, consider sharing it with your friends on twitter or forwarding this email to them 🙈
Don't hesitate to reach out to me if you have any questions or see an error. I highly appreciate it.
And thanks to Sebastià for reviewing this article 🙏
Further Readings
Interesting article: Language of languages.
ANTLR, a tool that builds language parsers from a grammar definition.
List of open-source grammar for the ANTLR.