List Info

Thread: 1. Lookup in Token 2. debugger 3. and sth else




1. Lookup in Token 2. debugger 3. and sth else
user name
2006-02-09 05:35:27
Hi all,

1.
I found that in situations like:

WWW-WWW-WWWW-WWW  NN TT TT
LL

where
W denotes a character of a word token
N is a character of number token
T is just a character of some token
and L is a character of a recognized Lookup element.

I have a rule for matching ""LL NN"
But I get as the result whole expression:
"WWW-WWW-WWWW-WWW  NN"
which is wrong of course...

Isn't it checked that if a Lookup'ed string is a substring
of a token, 
(or some number of tokens + part of a token) that this
situation should 
not be matched? Is there any setting for handling that?
Or what can I do about it?


2.

It also happens that some of the matched and annotated
strings are not 
shown in the Jape Debugger as annotated.. I am using GATE
3/1846, is 
there any improved version in more up-to-date gate build?

Second thing I noticed in the Jape Debugger is that
sometimes I have to
mark a few characters earlier (closer to the beginning of
the file) than 
the real interesting text begin (there is some shift between
what I am 
selecting to debug, and what debugger is actually
debugging).


3.
Another question is that normally my pipeline process a
document in some 
short time (let's say 0.5 sec, not important exactly). But
sometimes
it takes 15-30 seconds without any changes in the code/jape 
rules/doc/etc. Is this connected with garbage collector or
some other 
magic happens?

rgds,
Pawel

1. Lookup in Token 2. debugger 3. and sth else
user name
2006-02-13 20:48:17
Hi Pawel


Pawel Mazur wrote:
> Hi all,
> 
> 1.
> I found that in situations like:
> 
> WWW-WWW-WWWW-WWW  NN TT TT
> LL
> 
> where
> W denotes a character of a word token
> N is a character of number token
> T is just a character of some token
> and L is a character of a recognized Lookup element.
> 
> I have a rule for matching ""LL NN"
> But I get as the result whole expression:
> "WWW-WWW-WWWW-WWW  NN"
> which is wrong of course...
> 
> Isn't it checked that if a Lookup'ed string is a
substring of a token, 
> (or some number of tokens + part of a token) that this
situation should 
> not be matched? Is there any setting for handling that?
> Or what can I do about it?

If I remember correctly (someone please correct me if I am
wrong) the 
gazetteer checks that a Lookup-ed string is not part of a
larger string by 
checking for spaces or punctuation marks before and after
the string in 
question, rather than looking at actual Tokens. I *think*
you can use the 
Flexible Gazetteer (see the Plugins chapter in the User
Guide) to get round 
this problem. Alternatively, use some Java on the RHS of
your JAPE rule to 
perform a check that the lookup-ed string is not part of a
longer Token.

> 2.
> 
> It also happens that some of the matched and annotated
strings are not 
> shown in the Jape Debugger as annotated.. I am using
GATE 3/1846, is 
> there any improved version in more up-to-date gate
build?
> 
> Second thing I noticed in the Jape Debugger is that
sometimes I have to
> mark a few characters earlier (closer to the beginning
of the file) than 
> the real interesting text begin (there is some shift
between what I am 
> selecting to debug, and what debugger is actually
debugging).

AFAIK there have been no changes to the JAPE debugger since
- note that you 
can check what's changed in different versions by looking
at the ChangeLog 
section of the User Guide.
Afraid I can't help on the spanning problem...

> 
> 3.
> Another question is that normally my pipeline process a
document in some 
> short time (let's say 0.5 sec, not important exactly).
But sometimes
> it takes 15-30 seconds without any changes in the
code/jape 
> rules/doc/etc. Is this connected with garbage collector
or some other 
> magic happens?

Not sure, other than that I know this does happen - perhaps
someone else can 
give an explanation.
Regards
Diana
[1-2]

about | contact  Other archives ( Real Estate discussion Medical topics )