|
List Info
Thread: Created: (LUCENE-550) InstanciatedIndex - faster but memory consuming index
|
|
| Commented: (LUCENE-550)
InstanciatedIndex - faster but memory
consuming index |

|
2006-05-11 18:30:05 |
[ http://issues.apache.org/jira/brows
e/LUCENE-550?page=comments#action_12379124 ]
Doug Cutting commented on LUCENE-550:
-------------------------------------
This looks very promising. Unfortunately the code you
provide makes many incompatible API changes (e.g., turning
Term into an interface that has far fewer methods) removes
lots of useful javadoc, etc. So please don't expect it to
be committed soon!
A back-compatible way to add an interface is to add it above
the old class. So you might add a TermInteface,
AbstractTerm, and TermImpl, then change term to extend
TermImpl and deprecate it.
Then there's also the question of whether you really must
convert Term to an interface. I would not undertake that
change for aesthetic reasons. Is it really required to
achieve your goals? You should generally try hard to
minimize the size of your diffs and maximize the
back-compatiblity.
> InstanciatedIndex - faster but memory consuming index
> -----------------------------------------------------
>
> Key: LUCENE-550
> URL: http:
//issues.apache.org/jira/browse/LUCENE-550
> Project: Lucene - Java
> Type: New Feature
> Components: Store
> Versions: 1.9
> Reporter: Karl Wettin
> Attachments: Document.java, InstanciatedIndex.java,
Term.java, class_diagram.png, class_diagram.png,
src-1.9karl1_20060611.tar.gz, src.tar.gz,
src_20060509.tar.gz
>
> After fixing the bugs, it's now 4.5 -> 5 times the
speed. This is true for both at index and query time. Sorry
if I got your hopes up too much. There are still things to
be done though. Might not have time to do anything with this
until next month, so here is the code if anyone wants a
peek.
> Not good enough for Jira yet, but if someone wants to
fool around with it, here it is. The implementation passes a
TermEnum -> TermDocs -> Fields -> TermVector
comparation against the same data in a Directory.
> When it comes to features, offsets don't exists and
positions are stored ugly and has bugs.
> You might notice that norms are float[] and not byte[].
That is me who refactored it to see if it would do any good.
Bit shifting don't take many ticks, so I might just revert
that.
> I belive the code is quite self explaining.
> InstanciatedIndex ii = ..
> ii.new InstanciatedIndexReader();
> ii.addDocument(s).. replace IndexWriter for now.
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the
administrators:
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
http://www.atl
assian.com/software/jira
------------------------------------------------------------
---------
To unsubscribe, e-mail: java-dev-unsubscribe lucene.apache.org
For additional commands, e-mail: java-dev-help lucene.apache.org
|
|
| Commented: (LUCENE-550)
InstanciatedIndex - faster but memory
consuming index |

|
2006-05-11 18:46:12 |
[ http://issues.apache.org/jira/brows
e/LUCENE-550?page=comments#action_12379128 ]
Karl Wettin commented on LUCENE-550:
------------------------------------
Doug Cutting commented on LUCENE-550:
> This looks very promising. Unfortunately the code you
provide makes many incompatible API
> changes (e.g., turning Term into an interface that has
far fewer methods) removes lots of
> useful javadoc, etc. So please don't expect it to be
committed soon!
I agree, there is lots of work to be done on it. It was
eaiser for me to think clear when everything was seperated.
Basically there are only a few changes to the API that is
needed:
1. Document nor Term may be final.
2. Something other minor that I forgot about.
It can all be fixed, but is nothing that I prioritize right
now. If you feel it would be a nice thing for 2.0, tolk me
what changes you are OK with and gave me at least two weeks
notice I /might/ find time to back-factor the code.
> InstanciatedIndex - faster but memory consuming index
> -----------------------------------------------------
>
> Key: LUCENE-550
> URL: http:
//issues.apache.org/jira/browse/LUCENE-550
> Project: Lucene - Java
> Type: New Feature
> Components: Store
> Versions: 1.9
> Reporter: Karl Wettin
> Attachments: Document.java, InstanciatedIndex.java,
Term.java, class_diagram.png, class_diagram.png,
src-1.9karl1_20060611.tar.gz, src.tar.gz,
src_20060509.tar.gz
>
> After fixing the bugs, it's now 4.5 -> 5 times the
speed. This is true for both at index and query time. Sorry
if I got your hopes up too much. There are still things to
be done though. Might not have time to do anything with this
until next month, so here is the code if anyone wants a
peek.
> Not good enough for Jira yet, but if someone wants to
fool around with it, here it is. The implementation passes a
TermEnum -> TermDocs -> Fields -> TermVector
comparation against the same data in a Directory.
> When it comes to features, offsets don't exists and
positions are stored ugly and has bugs.
> You might notice that norms are float[] and not byte[].
That is me who refactored it to see if it would do any good.
Bit shifting don't take many ticks, so I might just revert
that.
> I belive the code is quite self explaining.
> InstanciatedIndex ii = ..
> ii.new InstanciatedIndexReader();
> ii.addDocument(s).. replace IndexWriter for now.
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the
administrators:
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
http://www.atl
assian.com/software/jira
------------------------------------------------------------
---------
To unsubscribe, e-mail: java-dev-unsubscribe lucene.apache.org
For additional commands, e-mail: java-dev-help lucene.apache.org
|
|
| Commented: (LUCENE-550)
InstanciatedIndex - faster but memory
consuming index |

|
2006-05-11 18:53:00 |
On Thu, 2006-05-11 at 18:46 +0000, Karl Wettin (JIRA) wrote:
> for 2.0, tolk me what
tell me what..
------------------------------------------------------------
---------
To unsubscribe, e-mail: java-dev-unsubscribe lucene.apache.org
For additional commands, e-mail: java-dev-help lucene.apache.org
|
|
| Updated: (LUCENE-550) InstanciatedIndex
- faster but memory consuming index |

|
2006-05-11 21:22:05 |
[ http://issues.apache.org/jira/browse/LUCENE-550?page=all
]
Karl Wettin updated LUCENE-550:
-------------------------------
Attachment: lucene.1.9-karl1.jpg
This is the diagram of InstanciatedIndex as of 1.9-karl1
> InstanciatedIndex - faster but memory consuming index
> -----------------------------------------------------
>
> Key: LUCENE-550
> URL: http:
//issues.apache.org/jira/browse/LUCENE-550
> Project: Lucene - Java
> Type: New Feature
> Components: Store
> Versions: 1.9
> Reporter: Karl Wettin
> Attachments: Document.java, InstanciatedIndex.java,
Term.java, class_diagram.png, class_diagram.png,
lucene.1.9-karl1.jpg, src-1.9karl1_20060611.tar.gz,
src.tar.gz, src_20060509.tar.gz
>
> After fixing the bugs, it's now 4.5 -> 5 times the
speed. This is true for both at index and query time. Sorry
if I got your hopes up too much. There are still things to
be done though. Might not have time to do anything with this
until next month, so here is the code if anyone wants a
peek.
> Not good enough for Jira yet, but if someone wants to
fool around with it, here it is. The implementation passes a
TermEnum -> TermDocs -> Fields -> TermVector
comparation against the same data in a Directory.
> When it comes to features, offsets don't exists and
positions are stored ugly and has bugs.
> You might notice that norms are float[] and not byte[].
That is me who refactored it to see if it would do any good.
Bit shifting don't take many ticks, so I might just revert
that.
> I belive the code is quite self explaining.
> InstanciatedIndex ii = ..
> ii.new InstanciatedIndexReader();
> ii.addDocument(s).. replace IndexWriter for now.
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the
administrators:
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
http://www.atl
assian.com/software/jira
------------------------------------------------------------
---------
To unsubscribe, e-mail: java-dev-unsubscribe lucene.apache.org
For additional commands, e-mail: java-dev-help lucene.apache.org
|
|
| Updated: (LUCENE-550) InstanciatedIndex
- faster but memory consuming index |

|
2006-05-27 11:43:30 |
[ http://issues.apache.org/jira/browse/LUCENE-550?page=all
]
Karl Wettin updated LUCENE-550:
-------------------------------
Attachment: instanciated_20060527.tar
This update makes InstanciatedIndex compatible with Lucene,
given that issue 580 and 581 is adopted.
It depends on generics and concurrent locks from J2SE 5.0.
Contains one update in Field:
public setFieldData(Object fieldData)
And one in Document:
public List<Field> getFields() {
return fields;
}
> InstanciatedIndex - faster but memory consuming index
> -----------------------------------------------------
>
> Key: LUCENE-550
> URL: http:
//issues.apache.org/jira/browse/LUCENE-550
> Project: Lucene - Java
> Type: New Feature
> Components: Store
> Versions: 1.9
> Reporter: Karl Wettin
> Attachments: Document.java, InstanciatedIndex.java,
Term.java, class_diagram.png, class_diagram.png,
instanciated_20060527.tar, lucene.1.9-karl1.jpg,
src-1.9karl1_20060611.tar.gz, src.tar.gz,
src_20060509.tar.gz
>
> After fixing the bugs, it's now 4.5 -> 5 times the
speed. This is true for both at index and query time. Sorry
if I got your hopes up too much. There are still things to
be done though. Might not have time to do anything with this
until next month, so here is the code if anyone wants a
peek.
> Not good enough for Jira yet, but if someone wants to
fool around with it, here it is. The implementation passes a
TermEnum -> TermDocs -> Fields -> TermVector
comparation against the same data in a Directory.
> When it comes to features, offsets don't exists and
positions are stored ugly and has bugs.
> You might notice that norms are float[] and not byte[].
That is me who refactored it to see if it would do any good.
Bit shifting don't take many ticks, so I might just revert
that.
> I belive the code is quite self explaining.
> InstanciatedIndex ii = ..
> ii.new InstanciatedIndexReader();
> ii.addDocument(s).. replace IndexWriter for now.
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the
administrators:
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
http://www.atl
assian.com/software/jira
------------------------------------------------------------
---------
To unsubscribe, e-mail: java-dev-unsubscribe lucene.apache.org
For additional commands, e-mail: java-dev-help lucene.apache.org
|
|
| Updated: (LUCENE-550) InstanciatedIndex
- faster but memory consuming index |

|
2006-05-29 04:05:30 |
[ http://issues.apache.org/jira/browse/LUCENE-550?page=all
]
Otis Gospodnetic updated LUCENE-550:
------------------------------------
Attachment: (was: src.tar.gz)
> InstanciatedIndex - faster but memory consuming index
> -----------------------------------------------------
>
> Key: LUCENE-550
> URL: http:
//issues.apache.org/jira/browse/LUCENE-550
> Project: Lucene - Java
> Type: New Feature
> Components: Store
> Versions: 1.9
> Reporter: Karl Wettin
> Attachments: class_diagram.png, class_diagram.png,
instanciated_20060527.tar, lucene.1.9-karl1.jpg
>
> After fixing the bugs, it's now 4.5 -> 5 times the
speed. This is true for both at index and query time. Sorry
if I got your hopes up too much. There are still things to
be done though. Might not have time to do anything with this
until next month, so here is the code if anyone wants a
peek.
> Not good enough for Jira yet, but if someone wants to
fool around with it, here it is. The implementation passes a
TermEnum -> TermDocs -> Fields -> TermVector
comparation against the same data in a Directory.
> When it comes to features, offsets don't exists and
positions are stored ugly and has bugs.
> You might notice that norms are float[] and not byte[].
That is me who refactored it to see if it would do any good.
Bit shifting don't take many ticks, so I might just revert
that.
> I belive the code is quite self explaining.
> InstanciatedIndex ii = ..
> ii.new InstanciatedIndexReader();
> ii.addDocument(s).. replace IndexWriter for now.
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the
administrators:
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
http://www.atl
assian.com/software/jira
------------------------------------------------------------
---------
To unsubscribe, e-mail: java-dev-unsubscribe lucene.apache.org
For additional commands, e-mail: java-dev-help lucene.apache.org
|
|
| Updated: (LUCENE-550) InstanciatedIndex
- faster but memory consuming index |

|
2006-05-29 04:03:31 |
[ http://issues.apache.org/jira/browse/LUCENE-550?page=all
]
Otis Gospodnetic updated LUCENE-550:
------------------------------------
Attachment: (was: InstanciatedIndex.java)
> InstanciatedIndex - faster but memory consuming index
> -----------------------------------------------------
>
> Key: LUCENE-550
> URL: http:
//issues.apache.org/jira/browse/LUCENE-550
> Project: Lucene - Java
> Type: New Feature
> Components: Store
> Versions: 1.9
> Reporter: Karl Wettin
> Attachments: class_diagram.png, class_diagram.png,
instanciated_20060527.tar, lucene.1.9-karl1.jpg, src.tar.gz
>
> After fixing the bugs, it's now 4.5 -> 5 times the
speed. This is true for both at index and query time. Sorry
if I got your hopes up too much. There are still things to
be done though. Might not have time to do anything with this
until next month, so here is the code if anyone wants a
peek.
> Not good enough for Jira yet, but if someone wants to
fool around with it, here it is. The implementation passes a
TermEnum -> TermDocs -> Fields -> TermVector
comparation against the same data in a Directory.
> When it comes to features, offsets don't exists and
positions are stored ugly and has bugs.
> You might notice that norms are float[] and not byte[].
That is me who refactored it to see if it would do any good.
Bit shifting don't take many ticks, so I might just revert
that.
> I belive the code is quite self explaining.
> InstanciatedIndex ii = ..
> ii.new InstanciatedIndexReader();
> ii.addDocument(s).. replace IndexWriter for now.
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the
administrators:
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
http://www.atl
assian.com/software/jira
------------------------------------------------------------
---------
To unsubscribe, e-mail: java-dev-unsubscribe lucene.apache.org
For additional commands, e-mail: java-dev-help lucene.apache.org
|
|
| Updated: (LUCENE-550) InstanciatedIndex
- faster but memory consuming index |

|
2006-05-29 04:03:34 |
[ http://issues.apache.org/jira/browse/LUCENE-550?page=all
]
Otis Gospodnetic updated LUCENE-550:
------------------------------------
Attachment: (was: src_20060509.tar.gz)
> InstanciatedIndex - faster but memory consuming index
> -----------------------------------------------------
>
> Key: LUCENE-550
> URL: http:
//issues.apache.org/jira/browse/LUCENE-550
> Project: Lucene - Java
> Type: New Feature
> Components: Store
> Versions: 1.9
> Reporter: Karl Wettin
> Attachments: class_diagram.png, class_diagram.png,
instanciated_20060527.tar, lucene.1.9-karl1.jpg, src.tar.gz
>
> After fixing the bugs, it's now 4.5 -> 5 times the
speed. This is true for both at index and query time. Sorry
if I got your hopes up too much. There are still things to
be done though. Might not have time to do anything with this
until next month, so here is the code if anyone wants a
peek.
> Not good enough for Jira yet, but if someone wants to
fool around with it, here it is. The implementation passes a
TermEnum -> TermDocs -> Fields -> TermVector
comparation against the same data in a Directory.
> When it comes to features, offsets don't exists and
positions are stored ugly and has bugs.
> You might notice that norms are float[] and not byte[].
That is me who refactored it to see if it would do any good.
Bit shifting don't take many ticks, so I might just revert
that.
> I belive the code is quite self explaining.
> InstanciatedIndex ii = ..
> ii.new InstanciatedIndexReader();
> ii.addDocument(s).. replace IndexWriter for now.
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the
administrators:
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
http://www.atl
assian.com/software/jira
------------------------------------------------------------
---------
To unsubscribe, e-mail: java-dev-unsubscribe lucene.apache.org
For additional commands, e-mail: java-dev-help lucene.apache.org
|
|
| Updated: (LUCENE-550) InstanciatedIndex
- faster but memory consuming index |

|
2006-05-29 04:03:33 |
[ http://issues.apache.org/jira/browse/LUCENE-550?page=all
]
Otis Gospodnetic updated LUCENE-550:
------------------------------------
Attachment: (was: src-1.9karl1_20060611.tar.gz)
> InstanciatedIndex - faster but memory consuming index
> -----------------------------------------------------
>
> Key: LUCENE-550
> URL: http:
//issues.apache.org/jira/browse/LUCENE-550
> Project: Lucene - Java
> Type: New Feature
> Components: Store
> Versions: 1.9
> Reporter: Karl Wettin
> Attachments: class_diagram.png, class_diagram.png,
instanciated_20060527.tar, lucene.1.9-karl1.jpg, src.tar.gz
>
> After fixing the bugs, it's now 4.5 -> 5 times the
speed. This is true for both at index and query time. Sorry
if I got your hopes up too much. There are still things to
be done though. Might not have time to do anything with this
until next month, so here is the code if anyone wants a
peek.
> Not good enough for Jira yet, but if someone wants to
fool around with it, here it is. The implementation passes a
TermEnum -> TermDocs -> Fields -> TermVector
comparation against the same data in a Directory.
> When it comes to features, offsets don't exists and
positions are stored ugly and has bugs.
> You might notice that norms are float[] and not byte[].
That is me who refactored it to see if it would do any good.
Bit shifting don't take many ticks, so I might just revert
that.
> I belive the code is quite self explaining.
> InstanciatedIndex ii = ..
> ii.new InstanciatedIndexReader();
> ii.addDocument(s).. replace IndexWriter for now.
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the
administrators:
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
http://www.atl
assian.com/software/jira
------------------------------------------------------------
---------
To unsubscribe, e-mail: java-dev-unsubscribe lucene.apache.org
For additional commands, e-mail: java-dev-help lucene.apache.org
|
|
| Updated: (LUCENE-550) InstanciatedIndex
- faster but memory consuming index |

|
2006-05-29 04:03:32 |
[ http://issues.apache.org/jira/browse/LUCENE-550?page=all
]
Otis Gospodnetic updated LUCENE-550:
------------------------------------
Attachment: (was: Term.java)
> InstanciatedIndex - faster but memory consuming index
> -----------------------------------------------------
>
> Key: LUCENE-550
> URL: http:
//issues.apache.org/jira/browse/LUCENE-550
> Project: Lucene - Java
> Type: New Feature
> Components: Store
> Versions: 1.9
> Reporter: Karl Wettin
> Attachments: class_diagram.png, class_diagram.png,
instanciated_20060527.tar, lucene.1.9-karl1.jpg, src.tar.gz
>
> After fixing the bugs, it's now 4.5 -> 5 times the
speed. This is true for both at index and query time. Sorry
if I got your hopes up too much. There are still things to
be done though. Might not have time to do anything with this
until next month, so here is the code if anyone wants a
peek.
> Not good enough for Jira yet, but if someone wants to
fool around with it, here it is. The implementation passes a
TermEnum -> TermDocs -> Fields -> TermVector
comparation against the same data in a Directory.
> When it comes to features, offsets don't exists and
positions are stored ugly and has bugs.
> You might notice that norms are float[] and not byte[].
That is me who refactored it to see if it would do any good.
Bit shifting don't take many ticks, so I might just revert
that.
> I belive the code is quite self explaining.
> InstanciatedIndex ii = ..
> ii.new InstanciatedIndexReader();
> ii.addDocument(s).. replace IndexWriter for now.
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the
administrators:
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
http://www.atl
assian.com/software/jira
------------------------------------------------------------
---------
To unsubscribe, e-mail: java-dev-unsubscribe lucene.apache.org
For additional commands, e-mail: java-dev-help lucene.apache.org
|
|
|
|