1 Series Description
GitHub address Source code of each stage Collection of descriptions of each stage
2 Syntax analysis instructions
Grammar: The way words are combined to form phrases, clauses, or sentences.
After lexical analysis, we can already identify the input text into words one by one. The goal of this stage is to identify these words into sentences and determine whether this combination of words conforms to the grammar we defined.
2.1 Use grammar to define grammar
Grammatical analysis requires additional representation capabilities obtained by recursion, and it is obvious that regular expressions cannot meet our needs.
In fact, grammar can also be used to describe the structure of lexical words, but regular expressions can already meet the needs, and it is more concise to use regular expressions at this time.
2.2 LR(1) analysis method
@%……¥&%#¥It’s too complicated and I don’t want to explain it.
In summary, LR(1) is a very, very powerful parsing algorithm capable of resolving many reduce-reduce conflicts, and most programming languages whose grammars are described by context-free grammars have an LR(1) grammar.
2.3 Use Yacc to generate a parser
The algorithm for constructing LR(1) analysis tables is simple enough to be completed automatically by a computer, and manual construction is very troublesome and boring, so using Yacc is a wise decision.
Similar to Lex, the Yacc specification is divided into three parts
%{
...
%}
...
%%
...
The first part is the same as Lex, including include and declaration.
The second part defines the terminal symbols, start symbols, precedence, etc. received from the lexical analysis
The third part defines grammar and semantic actions. Only grammar is defined in the syntax analysis stage, and semantic actions are completed during semantic analysis.
2.3.1 Conflict
Yacc resolves shift-reduce conflicts by choosing move closer, and resolves reduce-reduce conflicts by using the rule that appears first in the grammar.
2.3.2 Priority guidance
The priority is defined to resolve ambiguity, which makes it much more convenient when writing grammar.
Yacc can use priority guidance commands in the second part
%leftCOMMA %right PLUSASSIGN MINUSASSIGN TIMESASSIGN DIVIDEASSIGN ASSIGN %left OR %left AND %left EQ NEQ %left LE GE LT GT %left PLUS MINUS %left TIMES DIVIDE MOD %right INC DEC NOT %left LPAREN RPAREN LBRACK RBRACK
The priority decreases from top to bottom, left right indicates whether the word is left combined or right combined.
3 Specific implementation
The source code of the syntax analyzer and improved lexical analyzer can be found at the beginning of the article.
At this stage, the compiler has developed into four modules:
- Error handling module (errormsg.c errormsg.h): used to generate error messages containing file names and line numbers 2. Common tool module (util.c util.h): defines some commonly used functions 3. Lexical analysis module (simplec.lex): performs lexical analysis through Lex 4. Syntax analysis module (simplec.yacc): performs syntax analysis through Yacc
The token.h in the previous stage is no longer needed. Instead, Yacc will automatically generate a word-related header file y.tab.h based on the word specifications of the grammar we wrote; parsetest.c is a driver, which will normally output Parsing successful!
Attachment: ANSI C grammar (Lex) ANSI C grammar (Yacc)
Syntax Analysis Done.