List Info

Thread: Affixes leftover from expanded wordlist dumps




Affixes leftover from expanded wordlist dumps
country flaguser name
United States
2008-06-06 16:48:54
Hello,

I am building a dictionary based language detection program
using the
dumps of aspell dictionaries.

I need to expand wordlists completely, however some
languages, such as
Russian, after expansion will leave behind affixes (I think)
after a
'?'.  For example:

aspell dump master ru | aspell -l ru expand

will produce lines like:
умаслит? умаслит?ла умаслит?ли
умаслит?ло

'умаслит' appears to be the stem, but what about the
characters after
the '?'.  Are they affixes?  If so, how do I fully expand
them.  Any
insight on how to correctly expand wordlists for every
language would be
greatly appreciated. 

Thanks,
Isaac Colley


_______________________________________________
Aspell-user mailing list
Aspell-usergnu.org
htt
p://lists.gnu.org/mailman/listinfo/aspell-user

Re: Affixes leftover from expanded wordlist dumps
country flaguser name
United States
2008-06-06 17:21:15
On Fri, 6 Jun 2008, Isaac Colley wrote:

> Hello,
>
> I am building a dictionary based language detection
program using the
> dumps of aspell dictionaries.
>
> I need to expand wordlists completely, however some
languages, such as
> Russian, after expansion will leave behind affixes (I
think) after a
> '?'.  For example:
>
> aspell dump master ru | aspell -l ru expand

I think it might be an encoding problem.  Try setting your
locale to C by 
setting the LANG environmental variable and making sure the
locale was 
changes.  For example using bash:

   $ export LANG=C
   $ locale
   LANG=C
   LC_CTYPE="C"
   LC_COLLATE="C"
   LC_TIME="C"
   LC_NUMERIC="C"
   LC_MONETARY="C"
   LC_MESSAGES="C"
   LC_ALL=

You might also need to set LC_ALL.



_______________________________________________
Aspell-user mailing list
Aspell-usergnu.org
htt
p://lists.gnu.org/mailman/listinfo/aspell-user

[1-2]

about | contact  Other archives ( Real Estate discussion Medical topics )