1 Series Description
GitHub Address Source Code for Each Stage Collection of Explanations for Each Stage
2 Syntax Analysis Explanation
Syntax: The method of combining words to form phrases, clauses, or sentences.
After lexical analysis, we can already recognize the input text as individual words. The goal of this stage is to recognize these words as sentences and determine if the combination of words conforms to the syntax we have defined.
2.1 Defining Syntax with Grammar
Syntax analysis requires additional expressive power obtained through recursion. Obviously, regular expressions cannot meet our needs.
In fact, grammar can also be used to describe the structure of lexical words, but regular expressions are already sufficient for this purpose. In such cases, using regular expressions is more concise.
2.2 LR(1) Analysis Method
@%……¥&%#¥ Too complicated, don't want to explain.
In short, LR(1) is a very powerful analysis algorithm that can solve many shift-reduce conflicts. Most programming languages that describe their syntax using context-free grammars have an LR(1) grammar.
2.3 Generating Syntax Analyzer using Yacc
The algorithm for constructing LR(1) parsing table is simple enough to be automated by computers, and manual construction is very tedious and boring. Therefore, using Yacc is a wise decision.
Similar to Lex, the Yacc specification is divided into three parts:
The first part is the same as Lex, including includes and declarations.
The second part defines the terminals received from lexical analysis, the start symbol, precedence, etc.
The third part defines the grammar and semantic actions. Only the grammar is defined in the syntax analysis stage, and the semantic actions are completed in the semantic analysis stage.
2.3.1 Conflicts
Yacc chooses shift over reduce to resolve shift-reduce conflicts, and chooses the rule that appears first in the grammar to resolve reduce-reduce conflicts.
2.3.2 Precedence Guidance
Defining precedence is to resolve ambiguity, which makes writing grammar much easier.
Yacc can include precedence guidance commands in the second part:
The precedence decreases from top to bottom, and "left" and "right" indicate whether the word is left-associative or right-associative.
3 Specific Implementation
The source code for the syntax analyzer and the improved lexical analyzer can be found at the beginning of the article.
At this stage, the compiler has developed into four modules:
- Error handling module (errormsg.c errormsg.h): Used to generate error messages with file names and line numbers.
- Common utility module (util.c util.h): Defines some commonly used functions.
- Lexical analysis module (simplec.lex): Performs lexical analysis using Lex.
- Syntax analysis module (simplec.yacc): Performs syntax analysis using Yacc.
The previous stage's token.h is no longer needed. Instead, Yacc will automatically generate a header file, y.tab.h, related to the words based on the word specifications in the grammar we write. parsetest.c is a driver program that will output "Parsing successful!" under normal circumstances.
Attachment: ANSI C grammar (Lex) ANSI C grammar (Yacc)
Syntax analysis Done.