User:Code-Analysis/sandbox
The grammar of any programming language can be considered either in wide terms that include exact specification of everything what is allowed and what is not allowed in the language or in narrow terms that describe only the formal grammar that is sutable for automatic creation of LR parsers. This article focuses ont the formal grammar. Details of the C++ grammar are described in the main article on C++.
Formal grammar describes 'context free gramamar' of the language. It lacks various restrictions like requirement for all variables to be defined; formal grammar cannot distinguish between the name of the variable and the name of the types. All identifiers for LR parser are simply identifiers. Information about identifiers is stored in the name tables. Name tables are not part of the formal grammar. Neverthless sometimes LR parser has to make decision on the nature of the identifier. This decision shows up as resolution of the grammar conflict.
C++ 2003 Grammar
Formal gramar is presented in the Annex A of the standard. It consists of 3 major parts.
Lexical conventions
This parst of the grammar describes what is identifier, number, string, etc. Some of the rules are vague and contain human language like `each non-white-space character that cannot be ...` or `any member of the source character set except ...`. Other rules contain lengthy enumerations that mention all letters of the English alphabet or names of all possible operations. This is why the table below contains 2 serarate lines for the number of fules.
Non terminals | 142 |
Grammar rules (significantly different) | 110 |
Grammar rules all | 576 |
Preprocessing Directives
This parst of the grammar describes the C++ preprocessor. It is important to note that conditional compilation expressions are not defined in this section. This section references 'constant-expression' from the core grammar. The requirement for the 'constant-expression' to be numeric is not expressed in the formal grammar in any form. One grammar rule contains human language "the left-parenthesis character without preseeding white space".
Non terminals | 14 |
Grammar rules | 28 |
Core Grammar
This part of the grammar describes the language itself; features like classes, functions, statements, expressions, etc. In particular it contans:
Non terminals | 142 |
Grammar rules | 576 |
Parsing states (LR) | 14235 |
Grammar conflicts | 9337 |
External links
Links to C++ grammars that can be used in compiler generation tools.