For my finale year project I'm learning about compiler techniques, and currently I'm trying to experiment with the GCC intermediate representation (raw GIMPLE) and getting the control flow graphs from different source files (C, Cpp and Java) using GCC-5.4.
So far i can generate *.004t.gimple
and *.011t.cfg
raw files using -fdump-tree-all-graph-raw
but later I'm looking to understand more the GIMPLE language so i searched for its grammar and i have found this :
- GIMPLE WIKI
- SIMPLE
- GENERIC and GIMPLE
- latest GIMPLE Doc (has no grammar!!!)
- GCC FE
- grammar for gcc-4.3.6
- grammar for gcc-4.2.1
- GIMPLE Doc for gcc-5.4.0 (has no grammar too!!!)
So the language seems to be constantly changing and have multiple formats (High level GIMPLE, Low_level_GIMPLE, SSA GIMPLE, tree) and also the grammar seems to keep changing between versions but i can't find the GIMPLE grammar for the recent versions and specifically the one used in GCC-5.4 and i can't understand the different formats.
Questions about the grammar :
- where can i find the GIMPLE grammar used in GCC-5.4 and more recent versions?
- how is it written ? (in BNF or EBNF or ...)
- How does GCC implement this grammar to generate, parse and understand Gimple files it generates and later transform them to RTL?
- is it possible for me to write a small subset of the GIMPLE grammar
in Xtext from examples of
*.004t.gimple
files that i generate?
Questions about the formats:
- What's the difference between the 3 Gimple formats? (i can't seem to find detailed documentation about each one in the wiki)
- which format is used in the raw files
*.c.004t.gimple
and*.c.011t.cfg
? (High or Low, ...) - which one represents better the control flow from the original source code without optimizations ?
Thank You,
It looks like you just starting to learn GIMPLE and did not even read documents you`re posted above. I am digging in depth of GCC for some time and I will try to answer your questions.
Anyway you need to read
gccint
document lays here: https://gcc.gnu.org/onlinedocs/gccint.pdf it helps to answer some questions and gives some info about GIMPLE, and this is the only document where GIMPLE is described at least somehow. The best description in sources, it is sad but as is. Look also here, http://www.netgull.com/gcc/summit/2003/GENERIC%20and%20GIMPLE.pdf, this document based ongccint
and consist of some extract from.There is no "GIMPLE grammar" described in a clear way, like C language, just look in sources, maybe some poor examples on the internet.
I think it is generated from Tree-adjoining grammar(TAG), based on SIMPLE IL used by the McCAT compiler project at McGill University [SIMPLE].
How GCC implement and understand? And again you need to look in depths of GCC,
gimple.h
,basic-block.h
,tree-pass.h
for example, all of these lays in$src/gcc/
. Some part of the functions is described ingccint
in sectionGIMPLE
. The referencegccint
is not exactly accurate, it consists of some outdated functions and references, you must remember that(FOR_EACH_BB for example, deprecated in 2013).About Xtext, I never used that, and I do not understand the need to write some GIMPLE yourself, which is intermediate language
IL
you can create a plugin for optimizing your code flow, but I can not see the need to use GIMPLE separately.About format.
There is one GIMPLE format, but it can have two forms AFAIK.
GIMPLE HIGH
it is just GIMPLE that is not fully lowered and consists of the IL before the passpass_lower_cf
.High GIMPLE
contains some container statements like lexical scopes (represented byGIMPLE_BIND
) and nested expressions (e.g.,GIMPLE_TRY
).Low GIMPLE
exposes all of the implicit jumps for control and exception expressions directly in theIL
andEH
region trees(EH
meansException Handling
). There is alsoRAW
representation, it is some kind of polish notation as I understand, IMO it more useful than usual representation, you can get it with-fdump-tree-all-all-raw
for example.*.c.004t.gimple
- this is the first step of GIMPLE appear,*.c.011t.cfg
- first attempt for control flow graph(cfg
). The internal name of GIMPLE lower is "lower
" you can see them ingimple-low.c
in sectionYou can use search and find that this pass is
*.c.007t.lower
The answer is above I think, I am using RAW representation it is more informative IMO.
It not much, but I hope it helps you with your GCC exploration, and sorry for my bad "Engrish".