Grammar Structure
The format for a context free grammar is as follows:
- Any reference to "punctuation" when describing legal identifiers is restricted to certain pieces of punctuation. In no particular order, here are the punctuation characters
that are legal: ~ ! @ # * ( ) _ + ' ; : / ?
- Terminals are strings which either start with a lowercase letter, a number, or a piece of punctuation, followed by any amount of letters, numbers and punctuation. i.e. a
terminal matches the regular expression [{punctuation}a-z0-9][{punctuation}a-zA-Z0-9]*. Examples are id + begin end plus 9 8' 7+? '' ....
- Nonterminals start with uppercase letters, followed by any number of legal characters. i.e. a nonterminal matches the regular expression
[A-Z][{punctuation}a-zA-Z0-9]*. Examples are S A EXPR TERM~3 Stop ....
- It is assumed that the start symbol is the first nonterminal whose productions are given.
- The productions associated with a nonterminal are indicated as
head -> RHS1
| RHS2
....
| RHSn.
where the alternative righthand sides of the productions are separated by a slash, "|", and terminated by a period, ".".
- The RHS of a production is a sequence of terminals or nonterminals separated by spaces. For example:
Expr add Expr.
- C-style comments are allowed, e.g.
/* this is a comment
possibly spanning many lines */
Empty grammars
The grammar X IS EMPTY. for any nonterminal X is a valid grammar. This represents a grammar with no productions. The nonterminal X is needed as the start symbol, as all context-free
grammars must have start symbols.
Empty grammars form a subset of the grammars for the empty language, and there is a subtle difference between the two. For example, the non-empty grammar S -> T. T -> S. does not produce any strings — i.e.,
the grammar forms the empty language. This owes to the start symbol S being unrealizable. The grammar S IS EMPTY. is a grammar for the same language with the start symbol, and additionally the grammar
itself is empty. Indeed transforming the former grammar via removing unrealizable productions will yield the latter grammar.