|
List Info
Thread: performance experiment of PyLucene vs Lucene
|
|
| performance experiment of PyLucene vs
Lucene |

|
2007-05-10 23:03:42 |
|
[Title] My awful performance experiment of PyLucene vs Lucene
[Results] PyLucene ?= 0.5 Lucene(as to the search capacity) with the samples program "SearchFiles.py" provided by PyLucene, and a java program tackling similar task, I found PyLucene show a awful result, that is, the average time for Pylucene in Searching is about twice that of JAVA-Lucene.
The best Java Result(365713ms for 6400 searches) (most result lays around 400000ms) The best PyLucene(662815ms for 6400 searches)( mostly result lays around 680000ms)
[Prequsitive] Intel-Pentium-D DuralCore 2.8GHZ DDR-1G centos(Linux) kernel 2.6.9 Lucene 2.1.0(ant/java) vs PyLucene 2.1.0(lucene-java-2.1.0-509013, "_Pylucene.so" achieved from OSAF) (even worse result is achieved with lower PyLucene versions) Python 2.5.1 vs Java2 1.5.0_10
[Object : index files] The data source includes a directory and 27000 or so files, size of 0.5kb to 20kb respectively.
The Index files is built by a Pylucene test-program, namely IndexFile.py(with the Path Pylucene-X.X/samples/, but is revised a littel by me, to change the "Store Attribute of Field:Content as NO", Since otherwise the memory cost would be so huge with original python program)
[object: Testcases] A file with Name "Zop3" containing 6400 English words(as our search words), each within a line.
[Major Steps of two programe:Search.java vs xSearchIndex.py] Simply Searching and Retriving performance comparion between the two brother.
[Peer Actions that will be summed up in our test] 1.Construct a index Searcher Object(SEARCH) in Java and python languages. 2.Use the Searcher to achieve a search result(HITS) from index already-exist. 3.LOOP within HITS document-object, while reading each field-value of result items. 4.Repeat Step1-3 for arbitary 6399 other similar testcases. 5.Get the Record of total consuming-time, which would be prequistive to achieve the average time.
Here goes with my program(xSearchFiles.py)(Search.java) ---- import part: xSearchFiles.py( one complete search procedure )----
def RunSearch(searcher, parser, word): global logger, time_costing local_parse = parser.parse local_search = searcher.search start = datetime.now() hits = local_search(local_parse(word)) #map(Processor, hits) for i in xrange(0, hits.length()): getMethod = hits.doc(i).get getMethod("name"), getMethod("path"), getMethod("contents") end = datetime.now() during = end - start wss = ["[Result]", "[Time]"] wss.insert(1, 't'+ str(hits.length())) wss.append('t'+ str(during)+ 'n') logger.writelines(wss) time_costing += during.microseconds/1000
---- import part: Search.java( one complete search procedure) ---- clock.start(); for (int i = 0; m_words != null && i < m_words.length; i++) { int testonly = 0; Query q = qp.parse(m_words[i]); Hits h = is.search(q); clock.suspend(); System.out.println("r" + i); clock.resume(); for(int j = 0; j < h.length(); j ++) { h.doc(j).get("name"); h.doc(j).get("path"); h.doc(j).get("contens"); testonly = j; } } clock.stop(); System.out.println("Total: " + clock.getTime() + "ms."); ...
Ãâ·ÑÊÔÍæ2006Öйú×î¼ÑÍøÂçÓÎÏ·--ÃλÃÎ÷ÓÎ
|
| Re: performance experiment of PyLucene
vs Lucene |
  United Kingdom |
2007-05-11 02:37:49 |
On Fri, May 11, 2007 at 12:03:42PM +0800, Liang Xing wrote:
<snip class="Description + Python Example"
/>
> ---- import part: Search.java( one complete search
procedure) ----
> clock.start();
> for (int i = 0; m_words != null && i <
m_words.length; i++)
> {
> int testonly = 0;
> Query q = qp.parse(m_words[i]);
> Hits h = is.search(q);
> clock.suspend();
> System.out.println("r" + i);
> clock.resume();
> for(int j = 0; j < h.length(); j ++)
> {
> h.doc(j).get("name");
> h.doc(j).get("path");
> h.doc(j).get("contens");
^^^^^^^ Surely that should be contents -
is this a
typo in the mail or was this a copy
paste? Because
if this is a copy paste, and you're
really fetching
contens rather than contents, then that
might well
be why the java is seeming to go twice
as fast as
the python.
> testonly = j;
> }
> }
> clock.stop();
> System.out.println("Total: " +
clock.getTime() + "ms.");
> ..
>
Thanks,
--
Brett Parker
_______________________________________________
pylucene-dev mailing list
pylucene-dev osafoundation.org
http://lists.osafoundation.org/mailman/listinfo/pylu
cene-dev
|
|
[1-2]
|
|
|
about | contact Other archives ( Real Estate discussion Medical topics )
|