Extended Backus–Naur form
Basics
Examples
Syntax diagram

EBNF
Even EBNF can be described using EBNF. Consider below grammar (using conventions such as "-" to indicate set disjunction, "+" to indicate one or more matches, and "?" for optionality):
letter = "A" | "B" | "C" | "D" | "E" | "F" | "G"
| "H" | "I" | "J" | "K" | "L" | "M" | "N"
| "O" | "P" | "Q" | "R" | "S" | "T" | "U"
| "V" | "W" | "X" | "Y" | "Z" | "a" | "b"
| "c" | "d" | "e" | "f" | "g" | "h" | "i"
| "j" | "k" | "l" | "m" | "n" | "o" | "p"
| "q" | "r" | "s" | "t" | "u" | "v" | "w"
| "x" | "y" | "z" ;
digit = "0" | "1" | "2" | "3" | "4" | "5" | "6" | "7" | "8" | "9" ;
symbol = "[" | "]" | "{" | "}" | "(" | ")" | "<" | ">"
| "'" | '"' | "=" | "|" | "." | "," | ";" | "-"
| "+" | "*" | "?" | "\n" | "\t" | "\r" | "\f" | "\b" ;
character = letter | digit | symbol | "_" | " " ;
identifier = letter , { letter | digit | "_" } ;
S = { " " | "\n" | "\t" | "\r" | "\f" | "\b" } ;
terminal = "'" , character - "'" , { character - "'" } , "'"
| '"' , character - '"' , { character - '"' } , '"' ;
terminator = ";" | "." ;
term = "(" , S , rhs , S , ")"
| "[" , S , rhs , S , "]"
| "{" , S , rhs , S , "}"
| terminal
| identifier ;
factor = term , S , "?"
| term , S , "*"
| term , S , "+"
| term , S , "-" , S , term
| term , S ;
concatenation = ( S , factor , S , "," ? ) + ;
alternation = ( S , concatenation , S , "|" ? ) + ;
rhs = alternation ;
lhs = identifier ;
rule = lhs , S , "=" , S , rhs , S , terminator ;
grammar = ( S , rule , S ) * ;
Pascal
A Pascal-like programming language that allows only assignments can be defined in EBNF as follows:
(* a simple program syntax in EBNF - Wikipedia *)
program = 'PROGRAM', white space, identifier, white space,
'BEGIN', white space,
{ assignment, ";", white space },
'END.' ;
identifier = alphabetic character, { alphabetic character | digit } ;
number = [ "-" ], digit, { digit } ;
string = '"' , { all characters - '"' }, '"' ;
assignment = identifier , ":=" , ( number | identifier | string ) ;
alphabetic character = "A" | "B" | "C" | "D" | "E" | "F" | "G"
| "H" | "I" | "J" | "K" | "L" | "M" | "N"
| "O" | "P" | "Q" | "R" | "S" | "T" | "U"
| "V" | "W" | "X" | "Y" | "Z" ;
digit = "0" | "1" | "2" | "3" | "4" | "5" | "6" | "7" | "8" | "9" ;
white space = ? white space characters ? ;
all characters = ? all visible characters ? ;
For example, a syntactically correct program then could be:
PROGRAM DEMO1
BEGIN
A:=3;
B:=45;
H:=-100023;
C:=A;
D123:=B34A;
BABOON:=GIRAFFE;
TEXT:="Hello world!";
END.
The language can easily be extended with control flows, arithmetical expressions, and Input/Output instructions. Then a small, usable programming language would be developed.
Advantages over BNF
Any grammar defined in EBNF can also be represented in BNF, though representations in the latter are generally lengthier. E.g., options and repetitions cannot be directly expressed in BNF and require the use of an intermediate rule or alternative production defined to be either nothing or the optional production for option, or either the repeated production of itself, recursively, for repetition. The same constructs can still be used in EBNF.
The BNF uses the symbols (<
, >
, |
, ::=
) for itself, but does not include quotes around terminal strings. This prevents these characters from being used in the languages, and requires a special symbol for the empty string. In EBNF, terminals are strictly enclosed within quotation marks ("
..."
or '
...'
). The angle brackets ("<
...>
") for nonterminals can be omitted.
BNF syntax can only represent a rule in one line, whereas in EBNF a terminating character, the semicolon character “;
” marks the end of a rule.
Furthermore, EBNF includes mechanisms for enhancements, defining the number of repetitions, excluding alternatives, comments, etc.
Conventions
- According to the section 4 of the ISO/IEC 14977 standard, the following conventions are used:
- Each meta-identifier of Extended BNF is written as one or more words joined together by hyphens. However, joining the words seems to apply only to referencing meta-identifiers outside of the metalanguage itself, as seen in the examples of the standard.
- A meta-identifier ending with -symbol is the name of a terminal symbol of Extended BNF.
- The normal character representing each operator of Extended BNF and its implied precedence is (highest precedence at the top):
* repetition-symbol - except-symbol , concatenate-symbol | definition-separator-symbol = defining-symbol ; terminator-symbol . terminator-symbol
- The normal precedence is overridden by the following bracket pairs:
The first-quote-symbol is the apostrophe as defined by ISO/IEC 646:1991, that is to say Unicode U+0027 (
(* start-comment-symbol end-comment-symbol *) ' first-quote-symbol first-quote-symbol ' ( start-group-symbol end-group-symbol ) [ start-option-symbol end-option-symbol ] { start-repeat-symbol end-repeat-symbol } ? special-sequence-symbol special-sequence-symbol ? " second-quote-symbol second-quote-symbol "
'
); the font used in ISO/IEC 14977:1996(E) renders it very much like the acute, Unicode U+00B4 (´
), so confusion sometimes arises. However, the ISO Extended BNF standard invokes ISO/IEC 646:1991, "ISO 7-bit coded character set for information interchange", as a normative reference and makes no mention of any other character sets, so formally, there is no confusion with Unicode characters outside the 7-bit ASCII range.
As examples, the following syntax rules illustrate the facilities for expressing repetition:
aa = "A";
bb = 3 * aa, "B";
cc = 3 * [aa], "C";
dd = {aa}, "D";
ee = aa, {aa}, "E";
ff = 3 * aa, 3 * [aa], "F";
gg = {3 * aa}, "G";
hh = (aa | bb | cc), "H";
Terminal strings defined by these rules are as follows:
aa: A bb: AAAB cc: C AC AAC AAAC dd: D AD AAD AAAD AAAAD etc. ee: AE AAE AAAE AAAAE AAAAAE etc. ff: AAAF AAAAF AAAAAF AAAAAAF gg: G AAAG AAAAAAG etc. hh: AH AAABH CH ACH AACH AAACH
Extensibility
According to the ISO 14977 standard EBNF is meant to be extensible, and two facilities are mentioned. The first is part of EBNF grammar, the special sequence, which is arbitrary text enclosed with question marks. The interpretation of the text inside a special sequence is beyond the scope of the EBNF standard. For example, the space character could be defined by the following rule:
space = ? ASCII character 32 ?;
The second facility for extension is using the fact that parentheses in EBNF cannot be placed next to identifiers (they must be concatenated with them). The following is valid EBNF:
something = foo, ( bar );
The following is not valid EBNF:
something = foo ( bar );
Therefore, an extension of EBNF could use that notation. For example, in a Lisp grammar, function application could be defined by the following rule:
function application = list( symbol, { expression } );
Related work
- The W3C publishes an EBNF notation.
- The W3C used a different EBNF to specify the XML syntax.
- The British Standards Institution published a standard for an EBNF: BS 6154 in 1981.
- The IETF uses augmented BNF (ABNF), specified in RFC 5234.
See also
- Meta-II – An early compiler writing tool and notation
- Phrase structure rules – The direct equivalent of EBNF in natural languages
- Regular expression
- Spirit Parser Framework
References
External links
- ISO/IEC 14977 : 1996(E)
- BNF/EBNF variants – A table by Pete Jinks comparing several syntaxes