Implementation of a complete compiler (2) Syntax analysis

February 16, 2015 · Workshop

1 Series Description

GitHub address Source code of each stage Collection of descriptions of each stage

2 Syntax analysis instructions

Grammar: The way words are combined to form phrases, clauses, or sentences.

After lexical analysis, we can already identify the input text into words one by one. The goal of this stage is to identify these words into sentences and determine whether this combination of words conforms to the grammar we defined.

2.1 Use grammar to define grammar

Grammatical analysis requires additional representation capabilities obtained by recursion, and it is obvious that regular expressions cannot meet our needs.

In fact, grammar can also be used to describe the structure of lexical words, but regular expressions can already meet the needs, and it is more concise to use regular expressions at this time.

2.2 LR(1) analysis method

@%……¥&%#¥It’s too complicated and I don’t want to explain it.

In summary, LR(1) is a very, very powerful parsing algorithm capable of resolving many reduce-reduce conflicts, and most programming languages whose grammars are described by context-free grammars have an LR(1) grammar.

2.3 Use Yacc to generate a parser

The algorithm for constructing LR(1) analysis tables is simple enough to be completed automatically by a computer, and manual construction is very troublesome and boring, so using Yacc is a wise decision.

Similar to Lex, the Yacc specification is divided into three parts

%{
...
%}
...
%%
...

The first part is the same as Lex, including include and declaration.

The second part defines the terminal symbols, start symbols, precedence, etc. received from the lexical analysis

The third part defines grammar and semantic actions. Only grammar is defined in the syntax analysis stage, and semantic actions are completed during semantic analysis.

2.3.1 Conflict

Yacc resolves shift-reduce conflicts by choosing move closer, and resolves reduce-reduce conflicts by using the rule that appears first in the grammar.

2.3.2 Priority guidance

The priority is defined to resolve ambiguity, which makes it much more convenient when writing grammar.

Yacc can use priority guidance commands in the second part

%leftCOMMA
%right PLUSASSIGN MINUSASSIGN TIMESASSIGN DIVIDEASSIGN ASSIGN
%left OR
%left AND
%left EQ NEQ
%left LE GE LT GT
%left PLUS MINUS
%left TIMES DIVIDE MOD
%right INC DEC NOT
%left LPAREN RPAREN LBRACK RBRACK

The priority decreases from top to bottom, left right indicates whether the word is left combined or right combined.

3 Specific implementation

The source code of the syntax analyzer and improved lexical analyzer can be found at the beginning of the article.

At this stage, the compiler has developed into four modules:

  1. Error handling module (errormsg.c errormsg.h): used to generate error messages containing file names and line numbers 2. Common tool module (util.c util.h): defines some commonly used functions 3. Lexical analysis module (simplec.lex): performs lexical analysis through Lex 4. Syntax analysis module (simplec.yacc): performs syntax analysis through Yacc

The token.h in the previous stage is no longer needed. Instead, Yacc will automatically generate a word-related header file y.tab.h based on the word specifications of the grammar we wrote; parsetest.c is a driver, which will normally output Parsing successful!

Attachment: ANSI C grammar (Lex) ANSI C grammar (Yacc)

 

Syntax Analysis Done.

DIYgod Hi, DIYgod

Running for 4344 days

© 2026 DIYgod. All rights reserved. Please credit when sharing.