List Info

Thread: slovenian stemmer for snowball for pylucene




slovenian stemmer for snowball for pylucene
user name
2006-12-27 03:51:17
On Wed, 27 Dec 2006, Andra Tori wrote:

> On Tue, 2006-12-26 at 16:34 -0800, Andi Vajda wrote:
>>> I managed to compile it with standard tools in
debian unstable, but
>>> Highlighter only partly works - until i try to
pass it my own or already
>>> existing fragmenter as a parameter... then i
start getting null pointer
>>> exceptions.
>>
>> As said in my previous reply, run 'make test', if
you get failures, use a
>> different compiler: gcj 3.4.6 or a recent gcj 4.2.0
snapshot are good. Nothing
>> in between usually is on Linux.
>
>
> I just tried to build with gcj from experimental
(4.2-20061003-1), but
> without luck...

That may not be recent enough of a snapshot.
I think I used late November ones with success.

For more details, see:
http://lists.osafoundation.org/pi
permail/pylucene-dev/2006-November/001404.html
(and correct appropriately for your platform).

Please, note that if you wish to use a gcj 4.x compiler, you
need to start 
building from the Lucene Java sources (as outlined at the
URL above) and not 
from the source tarballs I produce. These seem to work with
gcj 3.4.x only for 
a strange, so far unsolved, reason.

GCJ 4.x is rather bleeding edge, if you want to take an
easier route, use gcj 
3.4.6. If yu need to build gcj 3.4.x from sourcesm, see
PyLucene's INSTALL 
file.

Andi..
_______________________________________________
pylucene-dev mailing list
pylucene-devosafoundation.org
http://lists.osafoundation.org/mailman/listinfo/pylu
cene-dev
debian unstable package of pylucene
user name
2006-12-27 13:18:09
On Tue, 2006-12-26 at 19:51 -0800, Andi Vajda wrote:
> On Wed, 27 Dec 2006, Andra Tori wrote:
> 
> > Well it was submitted there more than a year ago
(i got it from the list
> > and fixed it up to handle utf8 properly), but
Porter did not include it
> > into the distribution because he wanted to clarify
if all the rules are
> > ok.
> 
> Oh well. That happens.
> I saw no copyright notice on the source file you sent.
> It might be good to add that first. If you agree to the
License used by 
> PyLucene, just copy the copyright notice from one of
the other PyLucene source 
> files.

I can't do that - i am not the author of the file in
question, Porter is and he posted it on the mailing list, i
just fixed it up.
It is reasonably to expect that the file bears the same
licence as other stemmers he publishes in Snowball...

> > Well... by my own evaluation the stemmer is far
from perfect, but still
> > very useful for many uses (i already use it
directly in my project), and
> > way better than no stemmer at all... So it would
be great if I could use
> > it for the PyLucene which I use...
> 
> That makes it less than ideal for distribution by
PyLucene and PyLucene only.
> 
> Can you send a python sample about how this stemmer
is/would be used with 
> PyLucene (I know next to nothing about the porter
stemmers package) and from 
> that I can see what is needed to include it in PyLucene
and I can at least 
> send you instructions on how to do it yourself if it
comes to that...

Well, i have to figure out first how to use any stemmer at
all in PyLucenne ... I am currently using this stemmer
through PyStemmer... I'll get back to you when i do it.

> > I just tried to build with gcj from experimental
(4.2-20061003-1), but
> > without luck...
> 
> GCJ 4.x is rather bleeding edge, if you want to take an
easier route, use gcj 
> 3.4.6. If yu need to build gcj 3.4.x from sourcesm, see
PyLucene's INSTALL 
> file.

Thank you _very_ much for your help. I managed to build the
packages
finally, I used gcj 3.4.4 as explained in INSTALL and with
the debian
skeleton from Brett Parker. 

they pass the "make test" test.

The result is here for anyone who needs this:
http://www
.kiberpipa.org/~minmax/pylucene/


bye
Andraz Tori


_______________________________________________
pylucene-dev mailing list
pylucene-devosafoundation.org
http://lists.osafoundation.org/mailman/listinfo/pylu
cene-dev
debian unstable package of pylucene
user name
2006-12-27 14:20:08
On Wed, 2006-12-27 at 14:18 +0100, Andraž Tori wrote:

> Thank you _very_ much for your help. I managed to build
the packages
> finally, I used gcj 3.4.4 as explained in INSTALL and
with the debian
> skeleton from Brett Parker. 
> 
> they pass the "make test" test.
> 
> The result is here for anyone who needs this:
> http://www
.kiberpipa.org/~minmax/pylucene/
> 


I proclaimed victory too soon...

all tests pass, but when try to use it inside Django, it
silently
crushes. Crush happens on import sentance already... for
example at:
from PyLucene import FSDirectory
... no error message, just crush

anyone had this problem before ?

bye
andraz

_______________________________________________
pylucene-dev mailing list
pylucene-devosafoundation.org
http://lists.osafoundation.org/mailman/listinfo/pylu
cene-dev
debian unstable package of pylucene
user name
2006-12-27 15:25:53
On Wed, Dec 27, 2006 at 03:20:08PM +0100, Andra?? Tori
wrote:
> On Wed, 2006-12-27 at 14:18 +0100, Andra?? Tori wrote:
> 
> > Thank you _very_ much for your help. I managed to
build the packages
> > finally, I used gcj 3.4.4 as explained in INSTALL
and with the debian
> > skeleton from Brett Parker. 
> > 
> > they pass the "make test" test.
> > 
> > The result is here for anyone who needs this:
> > http://www
.kiberpipa.org/~minmax/pylucene/
> > 
> 
> 
> I proclaimed victory too soon...
> 
> all tests pass, but when try to use it inside Django,
it silently
> crushes. Crush happens on import sentance already...
for example at:
> from PyLucene import FSDirectory
> ... no error message, just crush
> 
> anyone had this problem before ?

brettperwin:~$ python
Python 2.4.4 (#2, Oct 20 2006, 00:23:25) 
[GCC 4.1.2 20061015 (prerelease) (Debian 4.1.1-16.1)] on
linux2
Type "help", "copyright",
"credits" or "license" for more
information.
>>> from PyLucene import FSDirectory
WARNING: could not properly read security provider files:
        
file:/usr/lib/python2.4/site-packages/security/libgcj.securi
ty
        
file:/usr/lib/python2.4/site-packages/security/classpath.sec
urity
         Falling back to standard GNU security provider
>>> 

Using your package it looks fine here - can you give any
debug information?
Potentially the output of strace -f for the above would be
good... something
like:

  strace -f -o pylucene-crash.strace python -c "from
PyLucene import FSDirectory"

Should do the trick.

Cheers,
-- 
Brett Parker
_______________________________________________
pylucene-dev mailing list
pylucene-devosafoundation.org
http://lists.osafoundation.org/mailman/listinfo/pylu
cene-dev
debian unstable package of pylucene
user name
2006-12-27 15:45:22
> I proclaimed victory too soon...
> 
> all tests pass, but when try to use it inside Django, it silently
> crushes. Crush happens on import sentance already... for example at:
> from PyLucene import FSDirectory
> ... no error message, just crush
> 
> anyone had this problem before ?

brettperwin:~$ python
Python 2.4.4 (#2, Oct 20 2006, 00:23:25) 
[GCC 4.1.2 20061015 (prerelease) (Debian 4.1.1-16.1)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> from PyLucene import FSDirectory
WARNING: could not properly read security provider files:
         file:/usr/lib/python2.4/site-packages/security/libgcj.security
         file:/usr/lib/python2.4/site-packages/security/classpath.security
         Falling back to standard GNU security provider
>>> 

Using your package it looks fine here - can you give any debug information?
Potentially the output of strace -f for the above would be good... something
like:

  strace -f -o pylucene-crash.strace python -c "from PyLucene import FSDirectory"

The problems comes up _only_ inside Django... If i load pylucene cleanly in python it works.

Here's the end of the trace -f:

[pid  4490] open(";/usr/lib/python2.4/site-packages/_PyLucene.so", O_RDONLY) = 8
[pid  4490] read(8, "177ELF111331360250";..., 512) = 512
[pid  4490] fstat64(8, {st_mode=S_IFREG|0755, st_size=17034414, ...}) = 0
[pid  4490] mmap2(NULL, 6510496, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 8, 0) = 0xb650c000
[pid ; 4490] mmap2(0xb6a69000, 782336, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 8, 0x55d) = 0xb6a69000
[pid ; 4490] mmap2(0xb6b28000, 104352, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0xb6b28000
[pid ; 4490] mprotect(0xbfe7b000, 4096, PROT_READ|PROT_WRITE|PROT_EXEC|PROT_GROWSDOWN) = 0
[pid  4490] mprotect(0xb701f000, 8384512, PROT_READ|PROT_WRITE|PROT_EXEC) = 0
[pid  4490] close(8)      ;           ;   = 0
[pid  4490] open(";/etc/ld.so.cache";, O_RDONLY) = 8
[pid  4490] fstat64(8, {st_mode=S_IFREG|0644, st_size=59980, ...}) = 0
[pid  4490] mmap2(NULL, 59980, PROT_READ, MAP_PRIVATE, 8, 0) = 0xb6c92000
[pid ; 4490] close(8)      ;           ;   = 0
[pid  4490] access("/etc/ld.so.nohwcap", F_OK) = -1 ENOENT (No such file or directory)
[pid ; 4490] open(";/usr/lib/libstdc++.so.6", O_RDONLY) = 8
[pid  4490] read(8, "177ELF111331`3103"..., 512) = 512
[pid  4490] fstat64(8, {st_mode=S_IFREG|0644, st_size=909044, ...}) = 0
[pid  4490] mmap2(NULL, 935588, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 8, 0) = 0xb6427000
[pid ; 4490] mmap2(0xb6501000, 20480, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 8, 0xd9) = 0xb6501000
[pid ; 4490] mmap2(0xb6506000, 22180, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0xb6506000
[pid ; 4490] close(8)      ;           ;   = 0
[pid  4490] access("/etc/ld.so.nohwcap", F_OK) = -1 ENOENT (No such file or directory)
[pid ; 4490] open(";/lib/libgcc_s.so.1", O_RDONLY) = 8
[pid  4490] read(8, "177ELF11133124030"..., 512) = 512
[pid  4490] fstat64(8, {st_mode=S_IFREG|0644, st_size=41096, ...}) = 0
[pid  4490] mmap2(NULL, 44292, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 8, 0) = 0xb6c7c000
[pid ; 4490] mmap2(0xb6c86000, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 8, 0x9) = 0xb6c86000
[pid ; 4490] close(8)      ;           ;   = 0
[pid  4490] mprotect(0xb6501000, 12288, PROT_READ) = 0
[pid  4490] mprotect(0xb650c000, 5623808, PROT_READ|PROT_WRITE) = 0
[pid&nbsp; 4488] <... select resumed>; ) &nbsp; &nbsp;  = 0 (Timeout)
[pid  4488] futex(0x8237f68, FUTEX_WAIT, 0, NULL <unfinished ...>
[pid  4490] mprotect(0xb650c000, 5623808, PROT_READ|PROT_EXEC) = 0
[pid&nbsp; 4490] futex(0xb65074fc, FUTEX_WAKE, 2147483647) = 0
[pid&nbsp; 4490] munmap(0xb6c92000, 59980)&nbsp;  = 0
[pid&nbsp; 4490] rt_sigaction(SIGHUP, {0xb68f7340, [], 0}, NULL, 8) = 0
[pid&nbsp; 4490] rt_sigaction(SIGPWR, {0xb694bb10, ~[HUP INT RTMIN RT_1], SA_RESTART}, NULL, 8) = 0
[pid&nbsp; 4490] rt_sigaction(SIGXCPU, {0xb694bd60, ~[HUP INT RTMIN RT_1], SA_RESTART}, NULL, 8) = 0
[pid&nbsp; 4490] open(";/proc/stat", O_RDONLY) = 8
[pid&nbsp; 4490] read(8, "cpu&nbsp; 424289 2962 71023 1889509 1"..., 4096) = 707
[pid&nbsp; 4490] close(8)&nbsp; &nbsp;   ; &nbsp; &nbsp; &nbsp; &nbsp;   ; &nbsp; = 0
[pid&nbsp; 4490] open(";/dev/zero&quot;, O_RDONLY) = 8
[pid&nbsp; 4490] fcntl64(8, F_SETFD, FD_CLOEXEC) = 0
[pid&nbsp; 4490] mmap2(0x1000, 65536, PROT_READ|PROT_WRITE|PROT_EXEC, MAP_PRIVATE, 8, 0) = 0x1000
[pid  4490] mmap2(0x11000, 65536, PROT_READ|PROT_WRITE|PROT_EXEC, MAP_PRIVATE, 8, 0) = 0x11000
[pid  4490] mmap2(0x21000, 65536, PROT_READ|PROT_WRITE|PROT_EXEC, MAP_PRIVATE, 8, 0) = 0x21000
[pid  4490] --- SIGSEGV (Segmentation fault) 0 (0) ---


debian unstable package of pylucene
user name
2006-12-27 18:24:09
> all tests pass, but when try to use it inside Django,
it silently
> crushes. Crush happens on import sentance already...
for example at:
> from PyLucene import FSDirectory
> ... no error message, just crush
>
> anyone had this problem before ?

Not sure about that specific problem, but as far as I know,
PyLucene will not 
work from inside Django. The issue is Django uses python
threads and PyLucene 
uses PyLucene threads (a small wrapper on top of normal
python threads). Even 
though they are just a small wrapper, they are incompatible
with each other.

The only solution would be to grep all the 'import Thread'
in Django source 
code and substitute with 'import PythonThread as Thread'
(don't recall 
offhand if that's the exact name).

If you google, this issue has been raised before - not
specific to Django, but 
usually related to some web framework which was making
extensive use of 
threads (eg. CherryPy in this 
http://lists.osafoundation.org/piperm
ail/pylucene-dev/2006-June/001107.html). 
I'm a bit short on time to find more relevant threads, but
searching 
pylucene-dev should explain the issue clearer.

The only solution I came up with temporarily is to run
PyLucene as a seperate 
server and have Django communicate with it remotely.

If you discover a way to run PyLucene inside Django, I would
love to hear 
about it.

Cheers,
Norbert
_______________________________________________
pylucene-dev mailing list
pylucene-devosafoundation.org
http://lists.osafoundation.org/mailman/listinfo/pylu
cene-dev
debian unstable package of pylucene
user name
2006-12-27 18:59:24
On Wed, 27 Dec 2006, Andra Tori wrote:

> I can't do that - i am not the author of the file in
question, Porter is and 
> he posted it on the mailing list, i just fixed it up.
It is reasonably to 
> expect that the file bears the same licence as other
stemmers he publishes 
> in Snowball...

I guess including the .java file generated from the stemmer
source may be ok.

> Well, i have to figure out first how to use any stemmer
at all in PyLucenne 
> ... I am currently using this stemmer through
PyStemmer... I'll get back to 
> you when i do it.

The "Lucene in Action" samples ported to PyLucene
include a SnowballTest.py 
file that may help in trying this out.

> Thank you _very_ much for your help. I managed to build
the packages
> finally, I used gcj 3.4.4 as explained in INSTALL and
with the debian
> skeleton from Brett Parker.
>
> they pass the "make test" test.

Great !

> The result is here for anyone who needs this:
> http://www
.kiberpipa.org/~minmax/pylucene/

If you want, I can add that link to the PyLucene homepage so
that others can 
use this ?

Andi..
_______________________________________________
pylucene-dev mailing list
pylucene-devosafoundation.org
http://lists.osafoundation.org/mailman/listinfo/pylu
cene-dev
debian unstable package of pylucene
user name
2006-12-27 19:03:56
On Wed, 27 Dec 2006, Andra Tori wrote:

> I proclaimed victory too soon...
>
> all tests pass, but when try to use it inside Django,
it silently
> crushes. Crush happens on import sentance already...
for example at:
> from PyLucene import FSDirectory
> ... no error message, just crush
>
> anyone had this problem before ?

It's probably a threading issue. Any thread using PyLucene
*must* be an 
instance of PyLucene.PythonThread. This class ensures that
the thread is 
properly setup with regards to libgcj's garbage collector.
Currently, there is 
no way to add a thread to libgcj after it was created. This
is a very old 
problem that may hopefully be fixed in the 4.x timeframe.

Others have found ways around this as has been discussed on
this list before.
The archives [1] or google may help. Sorry, I can't be more
specific, I have 
never used PyLucene with a web framework. Using twisted,
it's as easy as 
creating a thread pool with a bunch of PyLucene.PythonThread
instances.

Andi..

[1] http://lists.osafoundation.org/pipermail/pylucene-dev/

_______________________________________________
pylucene-dev mailing list
pylucene-devosafoundation.org
http://lists.osafoundation.org/mailman/listinfo/pylu
cene-dev
debian unstable package of pylucene
user name
2006-12-27 19:12:42
On Wed, 27 Dec 2006, Norbert Wojtowicz wrote:

> Not sure about that specific problem, but as far as I
know, PyLucene will not
> work from inside Django. The issue is Django uses
python threads and PyLucene
> uses PyLucene threads (a small wrapper on top of normal
python threads). Even
> though they are just a small wrapper, they are
incompatible with each other.

PyLucene.PythonThread is a subclass of Python's
threading.Thread that 
delegates the starting of the OS thread to libgcj so that it
can set it up 
with the garbage collector. Internally, there are then two
thread instances, 
one Python, one Java, that wrap the *same* OS thread.
Because a 
PyLucene.PythonThread is a subclass of python's
threading.Thread it is fully 
compatible with regular Python threads and the Python
threading APIs.

The problem with running PyLucene in a web framework such as
Django is making 
sure the threads making PyLucene calls are of the right
class. I don't know 
enough about Django to say how this is done but this being
python and open 
source there has got to be a way.

> The only solution would be to grep all the 'import
Thread' in Django source
> code and substitute with 'import PythonThread as
Thread' (don't recall
> offhand if that's the exact name).

That could work. Others have suggested to simply set
threading.Thread to 
PyLucene.PythonThread. There may also be a way to configure
Django without 
having to resort to such tricks.

Andi..
_______________________________________________
pylucene-dev mailing list
pylucene-devosafoundation.org
http://lists.osafoundation.org/mailman/listinfo/pylu
cene-dev
debian unstable package of pylucene
user name
2006-12-27 20:50:53
On Wed, 2006-12-27 at 11:12 -0800, Andi Vajda wrote:
> On Wed, 27 Dec 2006, Norbert Wojtowicz wrote:
> 
> > Not sure about that specific problem, but as far
as I know, PyLucene will not
> > work from inside Django. The issue is Django uses
python threads and PyLucene
> > uses PyLucene threads (a small wrapper on top of
normal python threads). Even
> > though they are just a small wrapper, they are
incompatible with each other.
> 
> PyLucene.PythonThread is a subclass of Python's
threading.Thread that 
> delegates the starting of the OS thread to libgcj so
that it can set it up 
> with the garbage collector. Internally, there are then
two thread instances, 
> one Python, one Java, that wrap the *same* OS thread.
Because a 
> PyLucene.PythonThread is a subclass of python's
threading.Thread it is fully 
> compatible with regular Python threads and the Python
threading APIs.
> 
> The problem with running PyLucene in a web framework
such as Django is making 
> sure the threads making PyLucene calls are of the right
class. I don't know 
> enough about Django to say how this is done but this
being python and open 
> source there has got to be a way.

The interesting question is: why did it work with my
previous deb i
built with gcj 4.1 (the one where highlighter didn't work)?

Django didn't crush and pylucene worked inside it...

> > The only solution would be to grep all the 'import
Thread' in Django source
> > code and substitute with 'import PythonThread as
Thread' (don't recall
> > offhand if that's the exact name).
> 
> That could work. Others have suggested to simply set
threading.Thread to 
> PyLucene.PythonThread. There may also be a way to
configure Django without 
> having to resort to such tricks.

what is the exact thing that needs changing.. i can't seem
to find
"import Thread", but a lot of "import
threading" in django...

bye
andraz

_______________________________________________
pylucene-dev mailing list
pylucene-devosafoundation.org
http://lists.osafoundation.org/mailman/listinfo/pylu
cene-dev
[1-10] [11-12]

about | contact  Other archives ( Real Estate discussion Medical topics )