|
List Info
Thread: intermediary representation and bison?
|
|
| intermediary representation and bison? |

|
2007-12-23 07:26:19 |
Hi,
After writing a simple interpreter for my simplified C
language using
bison, I'm currently planning to write a small and a really
simple
compiler for it using flex and bison as a front-end. My
problem right
now is that I'm not pretty sure if I have to implement an
intermediary
representation for a given source script, or not. Actually,
I'd like to
do it but then I remember the mission/goal of bison, i.e
does pattern
matching and then executes an action if a rule matches, and
by
implementing an IR tree to walk through it later, it feels
like I'm
redoing bison's job... I could just place the code emitters
right in the
actions associated with the grammar rules and that's it. I
still don't
know what to do...
Here is ideally what I want to construct given this little
piece of code:
int a, b, c;
a = 5;
b = 10;
c = a + b;
then bison would construct these trees for me:
ASSIGN --- ASSIGN --- ASSIGN
/ / /
ID CONST ID CONST ID BINARY (+)
/
ID ID
And then, I would just traverse these trees (top to bottom,
left to
right) to emit bind the identifiers to registers and to emit
the native
opcodes. But I can also do these actions directly in the
semantic rules
without trees construction..
Guys, any suggestions? I'm confused.
Best regards,
Ilyes Gouta.
_______________________________________________
help-bison gnu.org http
://lists.gnu.org/mailman/listinfo/help-bison
|
|
| Re: intermediary representation and
bison? |
  Germany |
2007-12-23 08:17:13 |
> After writing a simple interpreter for my simplified C
language using
> bison, I'm currently planning to write a small and a
really simple
> compiler for it using flex and bison as a front-end.
The difference between a compiler and an interpreter is not
clearly
defined. _Very_ roughly speaking, people usually talk about
interpreters
when it's a program that accepts input interactively (like
METAFONT) and
about compilers when it takes an input and produces an
output without
allowing any kind of interaction while it's running (like
GCC). On the
other hand, TeX is sometimes called a compiler and it can be
used
interactively.
> My problem right
> now is that I'm not pretty sure if I have to implement
an intermediary
> representation for a given source script, or not.
Actually, I'd like to
> do it but then I remember the mission/goal of bison,
i.e does pattern
> matching and then executes an action if a rule matches,
and by
> implementing an IR tree to walk through it later, it
feels like I'm
> redoing bison's job...
That's what I think, whereby I believe that what Bison is
meant to
implement is called a "finite-state machine".
> I could just place the code emitters right in the
> actions associated with the grammar rules and that's
it. I still don't
> know what to do...
>
> Here is ideally what I want to construct given this
little piece of code:
>
> int a, b, c;
>
> a = 5;
> b = 10;
> c = a + b;
>
> then bison would construct these trees for me:
>
> ASSIGN --- ASSIGN --- ASSIGN
> / / /
> ID CONST ID CONST ID BINARY (+)
> /
> ID ID
>
> And then, I would just traverse these trees (top to
bottom, left to
> right) to emit bind the identifiers to registers and to
emit the native
> opcodes. But I can also do these actions directly in
the semantic rules
> without trees construction..
>
> Guys, any suggestions? I'm confused.
To the best of my knowledge, GCC uses Bison and it also
creates an
intermediate representation of the code using a LISP-like
syntax. You
might want to look at what the GCC developers have done.
If this would be useful to you, I don't see any reason why
you shouldn't
do it. On the other hand, I think you could just use the
actions in your
current grammar to write assembler code, or even plain
machine-language
code, for a given processor type. So it's really up to
you.
What I think is that one shouldn't re-invent the wheel.
However, if
you're looking for suggestions, `comp.compilers' would be a
good place to
ask.
Laurence
_______________________________________________
help-bison gnu.org http
://lists.gnu.org/mailman/listinfo/help-bison
|
|
| Re: intermediary representation and
bison? |
  United Kingdom |
2007-12-24 04:13:20 |
I had the same problem on my first language; here's what I
did.
I started off thinking that I needed a simple interpreter,
so I wrote
the Bison code, and basically did all the required work in
the Bison
actions. This worked well; It looks like you're at this
stage.
A little later, I realised that this wasn't quite good
enough. The
specific problem, I think, was that I needed some forward
declarations -
ie. sometimes code earlier in the source file needed to
know about
something later in the source file. Tricky. After some
thought, I
decided that I could do this with minimal changes, just by
rescanning
the input again, with Bison. I scanned it once, remembered
the bits I
needed to know about, reset yyin to the beginning of the
file, and
scanned it again properly. I now had a 2-pass interpreter,
which worked
well.
As time went on, it became obvious that this was very
limiting. I needed
to do all sorts of extra things; some of these were:
1 - more forward declarations
2 - the language allowed constant expressions, so the
sematic checking
code had to confirm that an expression was actually
constant. how do you
do this in one pass? The easy answer is to have an extra
pass - you scan
the code finding everthing that should be a constant,
evaluate any
expression you find there, replace the expression with a
constant (if
you can), and let the semantic checker confirm that there's
a constant
there.
3 - As semantic checking became more complex, it became
obvious that I
had to analyse *all* the code before I had enough
information to analyse
*some* of the code. This will depend on your language.
4 - some things are next to impossible to code generate just
be scanning
and rescanning the input. since you're looking at C, what
about sequence
points, and expressions with side-effects? How do you handle
the
side-effects? Imagine code-generating "y = x++, c(),
x" - where do you
put in the 'x' increment?
5 - optimisations? A code generator for a new target? What
if you have
to scan something right-to-left? And so on.
Anyway, it quickly became obvious that handling anything
that wasn't
trivial is next to impossible just be scanning and
rescanning the input.
Fortunately, there's a really simple answer. The lexer and
scanner do
very little, apart from creating an AST. The scanner is no
longer the
'compiler'; it's just a very simple front-end that creates a
data
structure, the AST. The compilation process is now a
standard
data-processing problem - it's nothing more than analysing,
manipulating, and transforming the AST. Each analysis or
transformation
of the AST is, loosely speaking, a compiler
"pass"; you can do this as
few, or as many, times as you wish. It's trivial to add a
new pass when
you have some new feature or requirement. The AST *is* your
program; the
scanning is basically irrelevant.
And you can still have an "interpreter" at the end
of it, if you want
one; as soon as you finish "code generation", you
just run your "code",
or whatever it is that actually does anything. Technically,
however, you
should probably call this JIT compilation, rather than
interpreting.
The bad news is that you need to know a *lot* more to do
this, over and
above writing a Bison scanner. The (user) scanner code (ie.
your
"actions") is probably a lot less than 5% of a
practical and useful
compiler. So, stick with scanning your input until you find
out what the
problems are and, if it's really necessary to fix them,
rewrite your
code around an AST. If you need to find out more about ASTs,
look at
Antlr; you can use the Antlr library to create and maintain
your AST.
Evan
_______________________________________________
help-bison gnu.org http
://lists.gnu.org/mailman/listinfo/help-bison
|
|
| Re: intermediary representation and
bison? |

|
2007-12-24 10:29:15 |
Thank you, Laurence and Evan, for your extensive
explanations! I had a
look on ANTLR and it seems to be a great tool and probably
it will be my
second option. I think I'm going to try to get the scanner
to construct
an AST (instead of emitting code) so I'll stick with bison
for a while.
As you said, the rest of the compilation phases such as
resource binding
and code emission would become just direct products of the
AST traversal.
Thank you again for your suggestions!
Best regards,
Ilyes Gouta.
Evan Lavelle wrote:
> I had the same problem on my first language; here's
what I did.
>
> I started off thinking that I needed a simple
interpreter, so I wrote
> the Bison code, and basically did all the required work
in the Bison
> actions. This worked well; It looks like you're at this
stage.
>
> A little later, I realised that this wasn't quite good
enough. The
> specific problem, I think, was that I needed some
forward declarations -
> ie. sometimes code earlier in the source file needed
to know about
> something later in the source file. Tricky. After some
thought, I
> decided that I could do this with minimal changes, just
by rescanning
> the input again, with Bison. I scanned it once,
remembered the bits I
> needed to know about, reset yyin to the beginning of
the file, and
> scanned it again properly. I now had a 2-pass
interpreter, which worked
> well.
>
> As time went on, it became obvious that this was very
limiting. I needed
> to do all sorts of extra things; some of these were:
>
> 1 - more forward declarations
>
> 2 - the language allowed constant expressions, so the
sematic checking
> code had to confirm that an expression was actually
constant. how do you
> do this in one pass? The easy answer is to have an
extra pass - you scan
> the code finding everthing that should be a constant,
evaluate any
> expression you find there, replace the expression with
a constant (if
> you can), and let the semantic checker confirm that
there's a constant
> there.
>
> 3 - As semantic checking became more complex, it became
obvious that I
> had to analyse *all* the code before I had enough
information to analyse
> *some* of the code. This will depend on your language.
>
> 4 - some things are next to impossible to code generate
just be scanning
> and rescanning the input. since you're looking at C,
what about sequence
> points, and expressions with side-effects? How do you
handle the
> side-effects? Imagine code-generating "y = x++,
c(), x" - where do you
> put in the 'x' increment?
>
> 5 - optimisations? A code generator for a new target?
What if you have
> to scan something right-to-left? And so on.
>
> Anyway, it quickly became obvious that handling
anything that wasn't
> trivial is next to impossible just be scanning and
rescanning the input.
> Fortunately, there's a really simple answer. The lexer
and scanner do
> very little, apart from creating an AST. The scanner is
no longer the
> 'compiler'; it's just a very simple front-end that
creates a data
> structure, the AST. The compilation process is now a
standard
> data-processing problem - it's nothing more than
analysing,
> manipulating, and transforming the AST. Each analysis
or transformation
> of the AST is, loosely speaking, a compiler
"pass"; you can do this as
> few, or as many, times as you wish. It's trivial to add
a new pass when
> you have some new feature or requirement. The AST *is*
your program; the
> scanning is basically irrelevant.
>
> And you can still have an "interpreter" at
the end of it, if you want
> one; as soon as you finish "code generation",
you just run your "code",
> or whatever it is that actually does anything.
Technically, however, you
> should probably call this JIT compilation, rather than
interpreting.
>
> The bad news is that you need to know a *lot* more to
do this, over and
> above writing a Bison scanner. The (user) scanner code
(ie. your
> "actions") is probably a lot less than 5% of
a practical and useful
> compiler. So, stick with scanning your input until you
find out what the
> problems are and, if it's really necessary to fix them,
rewrite your
> code around an AST. If you need to find out more about
ASTs, look at
> Antlr; you can use the Antlr library to create and
maintain your AST.
>
> Evan
_______________________________________________
help-bison gnu.org http
://lists.gnu.org/mailman/listinfo/help-bison
|
|
| Re: intermediary representation and
bison? |
  Germany |
2007-12-27 03:37:21 |
On Mon, 24 Dec 2007, Ilyes Gouta wrote:
> Thank you, Laurence and Evan, for your extensive
explanations!
You're welcome.
It occurred to me that another reason why
GCC creates a syntax tree or whatever it's called (I know
virtually
nothing about the theory of compilers) is that GCC has
multiple front-ends
for different languages, i.e., FORTRAN and Pascal. I'm not
sure how C++
is handled, since it requires support for features such as
code generation
for templates.
Evan brought up a problem which seems to me to be related to
something I
call "the parse before the parse" problem:
Sometimes one needs
information to affect the course of parsing which can only
be found by
more parsing, having taken place beforehand. Multiple
passes is one
solution. Another possibility _might_ be to spawn a thread
and call the
same or a different parser to handle some section of the
input and/or code
generated by your main parser, and then make its results
available
somewhere. The technique of "faking" tokens and
the use of an object
passed as a parameter to `yyparse' to contain information
and/or the state
of the program would probably be useful here.
I find that Bison and threads work together well and the
extra effort
of making `yyparse' thread-safe is minimal.
Laurence
>
> Thank you, Laurence and Evan, for your extensive
explanations! I had a look on
> ANTLR and it seems to be a great tool and probably it
will be my second
> option. I think I'm going to try to get the scanner to
construct an AST
> (instead of emitting code) so I'll stick with bison for
a while. As you said,
> the rest of the compilation phases such as resource
binding and code emission
> would become just direct products of the AST
traversal.
>
> Thank you again for your suggestions!
>
> Best regards,
> Ilyes Gouta.
>
> Evan Lavelle wrote:
> > I had the same problem on my first language;
here's what I did.
> >
> > I started off thinking that I needed a simple
interpreter, so I wrote the
> > Bison code, and basically did all the required
work in the Bison actions.
> > This worked well; It looks like you're at this
stage.
> >
> > A little later, I realised that this wasn't quite
good enough. The specific
> > problem, I think, was that I needed some forward
declarations - ie.
> > sometimes code earlier in the source file needed
to know about something
> > later in the source file. Tricky. After some
thought, I decided that I could
> > do this with minimal changes, just by rescanning
the input again, with
> > Bison. I scanned it once, remembered the bits I
needed to know about, reset
> > yyin to the beginning of the file, and scanned it
again properly. I now had
> > a 2-pass interpreter, which worked well.
> >
> > As time went on, it became obvious that this was
very limiting. I needed to
> > do all sorts of extra things; some of these were:
> >
> > 1 - more forward declarations
> >
> > 2 - the language allowed constant expressions, so
the sematic checking code
> > had to confirm that an expression was actually
constant. how do you do this
> > in one pass? The easy answer is to have an extra
pass - you scan the code
> > finding everthing that should be a constant,
evaluate any expression you
> > find there, replace the expression with a constant
(if you can), and let the
> > semantic checker confirm that there's a constant
there.
> >
> > 3 - As semantic checking became more complex, it
became obvious that I had
> > to analyse *all* the code before I had enough
information to analyse *some*
> > of the code. This will depend on your language.
> >
> > 4 - some things are next to impossible to code
generate just be scanning and
> > rescanning the input. since you're looking at C,
what about sequence points,
> > and expressions with side-effects? How do you
handle the side-effects?
> > Imagine code-generating "y = x++, c(),
x" - where do you put in the 'x'
> > increment?
> >
> > 5 - optimisations? A code generator for a new
target? What if you have to
> > scan something right-to-left? And so on.
> >
> > Anyway, it quickly became obvious that handling
anything that wasn't trivial
> > is next to impossible just be scanning and
rescanning the input.
> > Fortunately, there's a really simple answer. The
lexer and scanner do very
> > little, apart from creating an AST. The scanner is
no longer the 'compiler';
> > it's just a very simple front-end that creates a
data structure, the AST.
> > The compilation process is now a standard
data-processing problem - it's
> > nothing more than analysing, manipulating, and
transforming the AST. Each
> > analysis or transformation of the AST is, loosely
speaking, a compiler
> > "pass"; you can do this as few, or as
many, times as you wish. It's trivial
> > to add a new pass when you have some new feature
or requirement. The AST
> > *is* your program; the scanning is basically
irrelevant.
> >
> > And you can still have an "interpreter"
at the end of it, if you want one;
> > as soon as you finish "code generation",
you just run your "code", or
> > whatever it is that actually does anything.
Technically, however, you should
> > probably call this JIT compilation, rather than
interpreting.
> >
> > The bad news is that you need to know a *lot* more
to do this, over and
> > above writing a Bison scanner. The (user) scanner
code (ie. your "actions")
> > is probably a lot less than 5% of a practical and
useful compiler. So, stick
> > with scanning your input until you find out what
the problems are and, if
> > it's really necessary to fix them, rewrite your
code around an AST. If you
> > need to find out more about ASTs, look at Antlr;
you can use the Antlr
> > library to create and maintain your AST.
> >
> > Evan
>
_______________________________________________
help-bison gnu.org http
://lists.gnu.org/mailman/listinfo/help-bison
|
|
[1-5]
|
|