Hi Pawel
Pawel Mazur wrote:
> Hi all,
>
> 1.
> I found that in situations like:
>
> WWW-WWW-WWWW-WWW NN TT TT
> LL
>
> where
> W denotes a character of a word token
> N is a character of number token
> T is just a character of some token
> and L is a character of a recognized Lookup element.
>
> I have a rule for matching ""LL NN"
> But I get as the result whole expression:
> "WWW-WWW-WWWW-WWW NN"
> which is wrong of course...
>
> Isn't it checked that if a Lookup'ed string is a
substring of a token,
> (or some number of tokens + part of a token) that this
situation should
> not be matched? Is there any setting for handling that?
> Or what can I do about it?
If I remember correctly (someone please correct me if I am
wrong) the
gazetteer checks that a Lookup-ed string is not part of a
larger string by
checking for spaces or punctuation marks before and after
the string in
question, rather than looking at actual Tokens. I *think*
you can use the
Flexible Gazetteer (see the Plugins chapter in the User
Guide) to get round
this problem. Alternatively, use some Java on the RHS of
your JAPE rule to
perform a check that the lookup-ed string is not part of a
longer Token.
> 2.
>
> It also happens that some of the matched and annotated
strings are not
> shown in the Jape Debugger as annotated.. I am using
GATE 3/1846, is
> there any improved version in more up-to-date gate
build?
>
> Second thing I noticed in the Jape Debugger is that
sometimes I have to
> mark a few characters earlier (closer to the beginning
of the file) than
> the real interesting text begin (there is some shift
between what I am
> selecting to debug, and what debugger is actually
debugging).
AFAIK there have been no changes to the JAPE debugger since
- note that you
can check what's changed in different versions by looking
at the ChangeLog
section of the User Guide.
Afraid I can't help on the spanning problem...
>
> 3.
> Another question is that normally my pipeline process a
document in some
> short time (let's say 0.5 sec, not important exactly).
But sometimes
> it takes 15-30 seconds without any changes in the
code/jape
> rules/doc/etc. Is this connected with garbage collector
or some other
> magic happens?
Not sure, other than that I know this does happen - perhaps
someone else can
give an explanation.
Regards
Diana
|