List Info

Thread: Preference Keyword token against Literals




Preference Keyword token against Literals
user name
2007-09-17 10:53:20
Hi all,

many thanks so far!
Two comments however:

> 
> Date: Fri, 14 Sep 2007 06:38:02 +0800
> From: J.Chris Findlay <j.chris.findlaygmail.com>
> Content-Type: text/plain; charset=ISO-8859-1
> Subject: [JavaCC] Preference Keyword token against
Literals
> 
> 
> One option is to replace the usage of
<ID_LITERAL> with a production
> that allows that along with any keyword tokens that are
also valid
> ids.

You mean, like that?

void Bla() #BlaNode : { }
{
  <BLA> [ <ID_LITERAL> | <UNKNOWN> | ... ]
<BLA>
  {
    ...
  }

Hmmm, OK for specific keywords, but then I'd need to put
*all* the possible 
keyword tokens there ...
There are ~20 keywords - not really nice, but if there is no
better way ...

> That way, they are still separate tokens, defined in
the correct order
> such that <ID_LITERAL> doesn't soak up the other
definitions (i.e. the
> same as it appears to be currently),

I think, it is the other way around. It expects the general
<ID> token but 
reads "Unknown" as <UNKNOWN> specific token
(the UNKNOWN token is defined 
before <ID>).

What you mean (I think  is rather
that it matches all keywords as <ID> 
token, i.e. it wants an <UNKNOWN> but interprets the
"Unknown" as <ID>.
This happens if I change the order of both and put the more
general *before* 
the specific - wrong too but differently wrong 

> but where necessary the grammar 
> accepts either.  Defining it in a separate production
gives you just
> the one place where this list of alternatives needs to
be kept.
> > 
> > Hello *,
> > 
> > I have a Javacc parser which works nicely, except
in one case that I have 
data 
> > file to be parsed where somebody used a keyword as
variable.
> > Example:
> > 
> > TOKEN :
> > {
> >   < UNKNOWN:  "Unknown" >
> >   ...
> > }
> > TOKEN :
> > {
> >   ...
> > | < ID_LITERAL   : ( ~[ " ",
"t", "n", "r",
""" ] )+ >
> > }
> > 
> > Means, the word Unknown is a keyword and at the
same time (at least when 
just 
> > purely looking at the pattern) an ID_LITERAL.
> > Now, the syntax is
> >   <BLA> <ID_LITERAL> <BLA> ...
> > and somebody has configured
> >   BLA Unknown BLA ...
> > which is in theory OK.
> > 
> > But, what I get is:
> > ==================================
> > ...
> > <DEFAULT>Current character : n (110) at line
176 column 13
> >    No more string literal token matches are
possible.
> >    Currently matched the first 7 characters as a
"Unknown" token.
> > <DEFAULT>Current character :   (32) at line
176 column 14
> >    Starting NFA to match one of : { <token of
kind 6>, <INT_LITERAL>, 
> > <STRING_LITERAL>, <OID_LITERAL>, 
> >      <PATH_LITERAL>, <ID_LITERAL> }
> > <DEFAULT>Current character :   (32) at line
176 column 14
> >    Currently matched the first 7 characters as a
"Unknown" token.
> >    Putting back 1 characters into the input
stream.
> > ****** FOUND A "Unknown" MATCH (Unknown)
******
> > ...
> > ParseException: Encountered "Unknown" at
line 176, column 7.
> > Was expecting:
> >     <ID_LITERAL> ...
> > ==================================
> > 
> > I think this behavior is even correct.
Unfortunately the original 
application 
> > dealing with the config file accepts this, so my
parser has to do this 
too.
> > 
> > Is there anybody with a hint how I can make my
parser deal with this?
> > Of course, it is not just the word
"Unknown", potentially there are others 
> > like that ...
> > 
> > Many thx + best regards,
> >   tge
> > 
> >
> > --
> >
........................................................
> >  Thomas Gentsch
> >  blue elephant systems GmbH
> >  Wollgrasweg 49
> >  D-70599 Stuttgart
> >
> >  e-mail:    tge AT blue MINUS elephant MINUS
systems DOT com
> > 
........................................................
> >
> >
------------------------------------------------------------
---------
> > To unsubscribe, e-mail: users-unsubscribejavacc.dev.java.net
> > For additional commands, e-mail: users-helpjavacc.dev.java.net
> >
> >
> 
> 
> -- 
>  - J.Chris Findlay
>    (c:
> 

........................................................
 Thomas Gentsch
 ........................................................

------------------------------------------------------------
---------
To unsubscribe, e-mail: users-unsubscribejavacc.dev.java.net
For additional commands, e-mail: users-helpjavacc.dev.java.net


Re: Preference Keyword token against Literals
user name
2007-09-17 09:00:23
The code as you had it was (as I said) reading the specific
token and
therefore not matching the wider <ID_LITERAL> that was
expected in the
grammar.

The problem being that (as you said) reversing the order so
that the
<ID_LITERAL> does match there means it always matches,
so cases where
you do want the specific tokens do not work.

The solution I proposed is to set up a production like your
example,
e.g. called AnyIDs() or something, that does list all ~20
specific
tokens, along with <ID_LITERAL>.
There isn't really a way around listing them for inclusion
in the
other parts of the grammar that need to accept inclusively
these other
ones, but it makes most sense to only list them in one place
if you
can (or in a few small lists and a few productions to pull
them
together in larger groups if there are cases where only some
can be
used).

E.g. I have a grammar that can accept a variable by the
standard
identifier characters, but I also need to specify specific
type names,
and I have a production to combine these with another list
of other
keywords to allow for places where I expect a typename and
to allow
for a place to match any reserved keyword, so one list
refers to the
other so as to not double up.

On 17/09/2007, Thomas Gentsch <tgeblue-elephant-systems.com> wrote:
>
> Hi all,
>
> many thanks so far!
> Two comments however:
>
> >
> > Date: Fri, 14 Sep 2007 06:38:02 +0800
> > From: J.Chris Findlay <j.chris.findlaygmail.com>
> > Content-Type: text/plain; charset=ISO-8859-1
> > Subject: [JavaCC] Preference Keyword token against
Literals
> >
> >
> > One option is to replace the usage of
<ID_LITERAL> with a production
> > that allows that along with any keyword tokens
that are also valid
> > ids.
>
> You mean, like that?
>
> void Bla() #BlaNode : { }
> {
>   <BLA> [ <ID_LITERAL> | <UNKNOWN> |
... ] <BLA>
>   {
>     ...
>   }
>
> Hmmm, OK for specific keywords, but then I'd need to
put *all* the possible
> keyword tokens there ...
> There are ~20 keywords - not really nice, but if there
is no better way ...
>
> > That way, they are still separate tokens, defined
in the correct order
> > such that <ID_LITERAL> doesn't soak up the
other definitions (i.e. the
> > same as it appears to be currently),
>
> I think, it is the other way around. It expects the
general <ID> token but
> reads "Unknown" as <UNKNOWN> specific
token (the UNKNOWN token is defined
> before <ID>).
>
> What you mean (I think  is rather
that it matches all keywords as <ID>
> token, i.e. it wants an <UNKNOWN> but interprets
the "Unknown" as <ID>.
> This happens if I change the order of both and put the
more general *before*
> the specific - wrong too but differently wrong 
>
> > but where necessary the grammar
> > accepts either.  Defining it in a separate
production gives you just
> > the one place where this list of alternatives
needs to be kept.
> > >
> > > Hello *,
> > >
> > > I have a Javacc parser which works nicely,
except in one case that I have
> data
> > > file to be parsed where somebody used a
keyword as variable.
> > > Example:
> > >
> > > TOKEN :
> > > {
> > >   < UNKNOWN:  "Unknown" >
> > >   ...
> > > }
> > > TOKEN :
> > > {
> > >   ...
> > > | < ID_LITERAL   : ( ~[ " ",
"t", "n", "r",
""" ] )+ >
> > > }
> > >
> > > Means, the word Unknown is a keyword and at
the same time (at least when
> just
> > > purely looking at the pattern) an
ID_LITERAL.
> > > Now, the syntax is
> > >   <BLA> <ID_LITERAL> <BLA>
...
> > > and somebody has configured
> > >   BLA Unknown BLA ...
> > > which is in theory OK.
> > >
> > > But, what I get is:
> > > ==================================
> > > ...
> > > <DEFAULT>Current character : n (110) at
line 176 column 13
> > >    No more string literal token matches are
possible.
> > >    Currently matched the first 7 characters
as a "Unknown" token.
> > > <DEFAULT>Current character :   (32) at
line 176 column 14
> > >    Starting NFA to match one of : { <token
of kind 6>, <INT_LITERAL>,
> > > <STRING_LITERAL>, <OID_LITERAL>,
> > >      <PATH_LITERAL>, <ID_LITERAL>
}
> > > <DEFAULT>Current character :   (32) at
line 176 column 14
> > >    Currently matched the first 7 characters
as a "Unknown" token.
> > >    Putting back 1 characters into the input
stream.
> > > ****** FOUND A "Unknown" MATCH
(Unknown) ******
> > > ...
> > > ParseException: Encountered
"Unknown" at line 176, column 7.
> > > Was expecting:
> > >     <ID_LITERAL> ...
> > > ==================================
> > >
> > > I think this behavior is even correct.
Unfortunately the original
> application
> > > dealing with the config file accepts this, so
my parser has to do this
> too.
> > >
> > > Is there anybody with a hint how I can make
my parser deal with this?
> > > Of course, it is not just the word
"Unknown", potentially there are others
> > > like that ...
> > >
> > > Many thx + best regards,
> > >   tge
> > >
> > >
> > > --
> > >
........................................................
> > >  Thomas Gentsch
> > >  blue elephant systems GmbH
> > >  Wollgrasweg 49
> > >  D-70599 Stuttgart
> > >
> > >  e-mail:    tge AT blue MINUS elephant MINUS
systems DOT com
> > > 
........................................................
> > >
> > >
------------------------------------------------------------
---------
> > > To unsubscribe, e-mail: users-unsubscribejavacc.dev.java.net
> > > For additional commands, e-mail:
users-helpjavacc.dev.java.net
> > >
> > >
> >
> >
> > --
> >  - J.Chris Findlay
> >    (c:
> >
>
>
........................................................
>  Thomas Gentsch
> 
........................................................
>
>
------------------------------------------------------------
---------
> To unsubscribe, e-mail: users-unsubscribejavacc.dev.java.net
> For additional commands, e-mail: users-helpjavacc.dev.java.net
>
>


-- 
 - J.Chris Findlay
   (c:

------------------------------------------------------------
---------
To unsubscribe, e-mail: users-unsubscribejavacc.dev.java.net
For additional commands, e-mail: users-helpjavacc.dev.java.net


[1-2]

about | contact  Other archives ( Real Estate discussion Medical topics )