PARSE_ABORT
on error and
have the caller ignore the rest of the input line when that happens (and then
call parse() again). But this is inadequate for a compiler, because it
forgets all the syntactic context leading up to the error. A syntactical error
deep within a function in the compiler input should not cause the compiler to
treat the following line like the beginning of a source file.
It is possible to specify how to recover from a syntactical error by
writing rules recognizing the special token error
. This is a terminal
symbol that is always defined (it must not be declared) and is reserved
for error handling. The Bisonc++ parser generates an error
token whenever a
syntactical error is detected; if a rule was provided recognizing this token
in the current context, the parse can continue. For example:
statements: // empty | statements '\n' | statements expression '\n' | statements error '\n'The fourth rule in this example says that an error followed by a newline makes a valid addition to any
statements
.
What happens if a syntactical error occurs in the middle of an
expression
? The error recovery rule, interpreted strictly, applies to the
precise sequence of a statements
, an error and a newline. If an error
occurs in the middle of an expression
, there will probably be some
additional tokens and subexpressions on the parser's stack after the last
statements
, and there will be tokens waiting to be read before the next
newline. So the rule is not applicable in the ordinary way.
Bisonc++, however, can force the situation to fit the rule, by discarding
part of the semantic context and part of the input. When a (syntactical) error
occurs the parsing algorithm will try to recover from the error in the
following way: First it discards states from the stack until it encounters a
state in which the error
token is acceptable (meaning that the
subexpressions already parsed are discarded, back to the last complete
statements
). At this point the error token is shifted. Then, if the
available look-ahead token is not acceptable to be shifted next, the parser
continues to read tokens and to discard them until it finds a token which
is acceptable. I.e., a token which can follow an error
token in
the current state. In this example, Bisonc++ reads and discards input until the
next newline was read so that the fourth rule can apply.
The choice of error rules in the grammar is a choice of strategies for error recovery. A simple and useful strategy is simply to skip the rest of the current input line or current statement if an error is detected:
statement: error ';' // on error, skip until ';' is readAnother useful recovery strategy is to recover to the matching close-delimiter of an opening-delimiter that has already been parsed. Otherwise the close-delimiter will probably appear to be unmatched, generating another, spurious error message:
primary: '(' expression ')' | '(' error ')' | ... ;Error recovery strategies are necessarily guesses. When they guess wrong, one syntactical error often leads to another. In the above example, the error recovery rule guesses that an error is caused by bad input within one statement. Suppose that instead a spurious semicolon is inserted in the middle of a valid statement. After the error recovery rule recovers from the first error, another syntactical error will be found straightaway, since the text following the spurious semicolon is also an invalid statement.
To prevent an outpouring of error messages, the parser may be configured in such a way that no error message will be generated for another syntactical error that happens shortly after the first. E.g., only after three consecutive input tokens have been successfully shifted error messages will be generated again. This configuration is currently not available in Bisonc++'s parsers.
Note that rules using the error
token may have actions, just as any
other rules can.
The token causing an error is re-analyzed immediately when an error occurs. If this is unacceptable, then the member function clearin may be called to skip this token. The function can be called by any member function of the Parser class. For example, suppose that on a parse error, an error handling routine is called that advances the input stream to some point where parsing should once again commence. The next symbol returned by the lexical scanner is probably correct. The previous token ought to be discarded using clearin.
expr '=' exprThe left-hand side must be a so-called lvalue. An lvalue is simply an addressable location, like a variable's identifier, a dereferenced pointer expression or some other address-expression. The right-hand side is a so-called rvalue: this may be any value: any expression will do.
A rule like the above leaves room for many different semantical errors:
expr
at its left-hand side, any
expression will be accepted by the parser. E.g.,
3 = 12So, the action associated with this rule should check whether the left-hand side is actually an lvalue. If not, a semantical error should be reported;
size_t/*unsigned*/
d_nSemanticErrors
. It may be possible to test this counter's value once the
input has been parsed, calling ABORT() (see section 6.3) if the
counter isn't zero anymore. When the grammar's start symbol itself has
multiple alternatives, it is probably easiest to augment the grammar with an
additional rule, becoming the augmented grammar's start symbol which simply
calls the former start symbol. For example, if input
was the name of the
original start-symbol, augment the grammar as follows to ensure a
PARSE_ABORT return value of the parse() member when either syntactical
or semantical errors were detected:
semantic_input: // new start-symbol input { if (d_nSemanticErrors) // return PARSE_ABORT ABORT(); // on semantic errors too. }Returning from the parser's parse() member the number of syntactical and semantical errors could then be printed, whereupon the program might terminate.