List Info

Thread: Unknown taggs for Spanish.




Unknown taggs for Spanish.
user name
2006-02-14 11:37:46
I *think* we now have this but it might just have been a
nice dream!
Does anyone else know about this?
Diana

José Ramón Pérez Agüera wrote:

>If you have the java code to train the tagger in english
i could work in one port to spanish, is that posible?
>
>Regards
>
>jose
>
>José Ramón Pérez Agüera
>Despacho 411 tlf. 913947599
>Dept. de Sistemas Informáticos y Programación
>Facultad de Informática
>Universidad Complutense de Madrid
>
>----- Mensaje original -----
>De: Diana Maynard <d.maynarddcs.shef.ac.uk>
>Fecha: Martes, Febrero 14, 2006 12:16 pm
>Asunto: Re: Unknown taggs for Spanish.
>
>  
>
>>Yes, Jose is correct in that these tags are caused
by the default 
>>rules 
>>in the tagger which fire when the words in question
are not in the 
>>lexicon. Possible solutions are either to map these
into tags from 
>>your 
>>Spanish tagset, modify the Spanish lexicon manually
to include 
>>missing 
>>words not being recognised, or modify the tagger
code appropriately 
>>to 
>>change the way the default rules are applied.....
>>I'm afraid we don't have a better version of the
Spanish tagger - 
>>if we 
>>did, it would have been included......
>>There are quite a few people on this list using GATE
for Spanish - 
>>someone might have some solution they have already
tried.
>>Regards
>>Diana
>>
>>
>>José Ramón Pérez Agüera wrote:
>>
>>    
>>
>>>Hi Sergi,
>>>
>>>I work with Gate's POS Tagger in my thesis, and
I think that this 
>>>      
>>>
>>tags (NN, NNS, NNP) are generic and the tagger use
it by default 
>>when he don't know the suitable tag. I need
re-train the POS Tagger 
>>for spanish but this is no posible with the Gate's
API. I don't 
>>have any solution, sorry, but i think this is the
problem.
>>    
>>
>>>Regards, and sorry for my english
>>>
>>>jose
>>>
>>>José Ramón Pérez Agüera
>>>Despacho 411 tlf. 913947599
>>>Dept. de Sistemas Informáticos y Programación
>>>Facultad de Informática
>>>Universidad Complutense de Madrid
>>>
>>>----- Mensaje original -----
>>>De: Sergi Fernandez <devilsfhotmail.com>
>>>Fecha: Martes, Febrero 14, 2006 0:40 am
>>>Asunto: Unknown taggs for Spanish.
>>>
>>> 
>>>
>>>      
>>>
>>>>Hi there!
>>>>
>>>>Thank you for your quick answer!!
>>>>
>>>>I've just solved the problem of using
independent grammars. 
>>>>
>>>>I'm working with the Spanish Plugin for
GATE 3.0. As I read in 
>>>>        
>>>>
>>the 
>>    
>>
>>>>documentation "D1.4.1a Language
Issues", GATE uses a tagger based 
>>>>on the Brill tagger, but trained on Spanish
text. The taggers for 
>>>>Spanish are different from the ones for
English, and they are 
>>>>defined in " Guia para la anotacin
morfosintctica del corpus CLiC-
>>>>TALP by M. Civit.". Until now
everything was ok. But right now 
>>>>        
>>>>
>>I'm 
>>    
>>
>>>>working with gate for my final degree
project and there are some 
>>>>taggers that don't work quite well. What I
mean is that there are 
>>>>some taggers, as NN, NNP or NNS, that are
not described in my 
>>>>        
>>>>
>>Guide 
>>    
>>
>>>>by M.Civit.
>>>>
>>>>I believe those taggs came from the English
tagger.
>>>>
>>>>I'm trying to adapt Text2Onto, from
Univerisity of Karlsruhe 
>>>>(Deutschland), AIFB Institute,  to Spanish,
and maybe 30% or 40% 
>>>>        
>>>>
>>of 
>>    
>>
>>>>nouns are tagged with those "English
Taggs". That causes a very 
>>>>high losing of accuracy and recall if I want
only to use the 
>>>>correct taggs. Then, If I consider the NNS
and NNP tags as nouns, 
>>>>        
>>>>
>>I 
>>    
>>
>>>>win recall but lose accuracy. Then my
questions are: What I can 
>>>>        
>>>>
>>do 
>>    
>>
>>>>with those tag's? Is there a new version of
the Spanish plugin 
>>>>avaiable and more accurate? Could you please
explain it to me or 
>>>>tell me where to find the way Spanish Plugin
is built so that I 
>>>>        
>>>>
>>can 
>>    
>>
>>>>figure out which of those taggs to accept or
reject ?
>>>>
>>>>Best regards.
>>>>
>>>>Sergi
>>>>   
>>>>
>>>>        
>>>>
>>> 
>>>
>>>      
>>>
>
>
>  
>
Unknown taggs for Spanish.
user name
2006-02-14 11:48:55
Hi,

> I work with Gate's POS Tagger in my thesis, and I
think that this tags (NN,
> NNS, NNP) are generic and the tagger use it by default
when he don't know
> the suitable tag. I need re-train the POS Tagger for
spanish but this is no
> posible with the Gate's API. I don't have any
solution, sorry, but i think
> this is the problem.

Why don't you try using the Spanish version of the
TreeTagger (using the 
existing GATE wrapper)?  You'd have to create a new command
script (the 
plugin only comes with GATE scripts for German and French),
but that 
shouldn't be too difficult.  The Spanish tagset and
parameter files are 
available under 
http://www.ims.uni-stuttgart.
de/projekte/corplex/TreeTagger/DecisionTreeTagger.html

Cheers, René

Unknown taggs for Spanish.
user name
2006-02-14 12:37:32
On Tuesday 14 February 2006 12:48, you wrote:
> Why don't you try using the Spanish version of the
TreeTagger (using the
> existing GATE wrapper)?  You'd have to create a new
command script (the
> plugin only comes with GATE scripts for German and
French), but that
> shouldn't be too difficult. 

Actually, I've just had a look at the TreeTagger command
file for Spanish, and 
it's quite simple.  I've adapted it to the GATE plugin; if
you use the script 
at the bottom and save it under
	gate/plugins/TreeTagger/resources/tree-tagger-spanish-gate
(with your path information), you can run the GATE
TreeTagger plugin on a 
Spanish text.

Whether the result is better than with the Brill version, I
can't really tell.  
Looks all Spanish to me 

Cheers, René

PS: GATE folks, could you add the script to the CVS?
______________________________________________

#!/bin/sh

# Set these paths appropriately

BIN=/usr/local/clactools/TreeTagger/bin
CMD=/usr/local/clactools/TreeTagger/cmd
LIB=/usr/local/clactools/TreeTagger/lib

MWL=$/mwl-lookup.perl
TAGGER=$/tree-tagger
ABBR_LIST=$/spanish-abbreviations
PARFILE=$/spanish.par

cat $* |
# remove empty lines
grep -v '^$' |
# recognition of MWLs
$MWL -f $/spanish-mwls |
# tagging
$TAGGER $PARFILE -token -lemma -sgml

[1-3]

about | contact  Other archives ( Real Estate discussion Medical topics )