List Info

Thread: performance experiment of PyLucene vs Lucene




performance experiment of PyLucene vs Lucene
user name
2007-05-10 23:03:42
[Title]
My awful performance experiment of PyLucene vs Lucene

[Results]
PyLucene ?= 0.5 Lucene(as to the search capacity)
with the samples program "SearchFiles.py" provided by PyLucene, and a java program tackling similar task, I found PyLucene show a awful result, that is, the average time for Pylucene in Searching is about twice that of JAVA-Lucene.

The best Java Result(365713ms for 6400 searches) (most result lays around 400000ms)
The best PyLucene(662815ms for 6400 searches)( mostly result lays around 680000ms)

[Prequsitive]
Intel-Pentium-D DuralCore 2.8GHZ
DDR-1G
centos(Linux) kernel 2.6.9
Lucene 2.1.0(ant/java) vs
PyLucene 2.1.0(lucene-java-2.1.0-509013, "_Pylucene.so" achieved from OSAF)
(even worse result is achieved with lower PyLucene versions)
Python 2.5.1 vs Java2 1.5.0_10

[Object : index files]
The data source includes a directory and 27000 or so files, size of 0.5kb to 20kb respectively.

The Index files is built by a Pylucene test-program, namely IndexFile.py(with the Path Pylucene-X.X/samples/, but is revised a littel by me, to change the "Store Attribute of Field:Content as NO", Since otherwise the memory cost would be so huge with original python program)

[object: Testcases]
A file with Name "Zop3" containing 6400 English words(as our search words), each within a line.

[Major Steps of two programe:Search.java vs xSearchIndex.py]
Simply Searching and Retriving performance comparion between the two brother.

[Peer Actions that will be summed up in our test]
1.Construct a index Searcher Object(SEARCH) in Java and python languages.
2.Use the Searcher to achieve a search result(HITS) from index already-exist.
3.LOOP within HITS document-object, while reading each field-value of result items.
4.Repeat Step1-3 for arbitary 6399 other similar testcases.
5.Get the Record of total consuming-time, which would be prequistive to achieve the average time.

Here goes with my program(xSearchFiles.py)(Search.java)
---- import part: xSearchFiles.py( one complete search procedure )----

def RunSearch(searcher, parser, word):
global logger, time_costing
local_parse = parser.parse
local_search = searcher.search
start = datetime.now()
hits = local_search(local_parse(word))

#map(Processor, hits)
for i in xrange(0, hits.length()):
getMethod = hits.doc(i).get
getMethod("name"), getMethod("path"), getMethod("contents")
end = datetime.now()
during = end - start
wss = ["[Result]", "[Time]"]
wss.insert(1, 't'+ str(hits.length()))
wss.append('t'+ str(during)+ 'n')
logger.writelines(wss)
time_costing += during.microseconds/1000

---- import part: Search.java( one complete search procedure) ----
clock.start();
for (int i = 0; m_words != null && i < m_words.length; i++)
{
int testonly = 0;
Query q = qp.parse(m_words[i]);
Hits h = is.search(q);
clock.suspend();
System.out.println("r" + i);
clock.resume();
for(int j = 0; j < h.length(); j ++)
{
h.doc(j).get("name");
h.doc(j).get("path");
h.doc(j).get("contens");
testonly = j;
}
}
clock.stop();
System.out.println("Total: " + clock.getTime() + "ms.");
...




Ãâ·ÑÊÔÍæ2006Öйú×î¼ÑÍøÂçÓÎÏ·--ÃλÃÎ÷ÓÎ
Re: performance experiment of PyLucene vs Lucene
country flaguser name
United Kingdom
2007-05-11 02:37:49
On Fri, May 11, 2007 at 12:03:42PM +0800, Liang Xing wrote:

<snip class="Description + Python Example"
/>

> ---- import part: Search.java( one complete search
procedure) ----
>   clock.start();
>   for (int i = 0; m_words != null && i <
m_words.length; i++)
>   {
>    int testonly = 0;
>    Query q = qp.parse(m_words[i]);
>    Hits h = is.search(q);
>    clock.suspend();
>    System.out.println("r" + i);
>    clock.resume();
>    for(int j = 0; j < h.length(); j ++)
>    {
>     h.doc(j).get("name"); 
>     h.doc(j).get("path");
>     h.doc(j).get("contens");
                    ^^^^^^^ Surely that should be contents -
is this a
                    typo in the mail or was this a copy
paste? Because
                    if this is a copy paste, and you're
really fetching
                    contens rather than contents, then that
might well
                    be why the java is seeming to go twice
as fast as
                    the python.
>     testonly = j; 
>    }
>   }
>   clock.stop();
>   System.out.println("Total: " +
clock.getTime() + "ms.");
> ..
> 

Thanks,
-- 
Brett Parker
_______________________________________________
pylucene-dev mailing list
pylucene-devosafoundation.org
http://lists.osafoundation.org/mailman/listinfo/pylu
cene-dev

[1-2]

about | contact  Other archives ( Real Estate discussion Medical topics )