|
List Info
Thread: Created: (LUCENE-626) Adaptive, user query session analyzing spell checker.
|
|
| Created: (LUCENE-626) Adaptive, user
query session analyzing spell checker. |

|
2006-07-13 09:20:29 |
Adaptive, user query session analyzing spell checker.
-----------------------------------------------------
Key: LUCENE-626
URL: http:
//issues.apache.org/jira/browse/LUCENE-626
Project: Lucene - Java
Type: New Feature
Components: Search
Reporter: Karl Wettin
Priority: Minor
Attachments: spellcheck_0.0.1.tar.gz
From javadocs:
This is an adaptive, user query session analyzing spell
checker. In plain words, a word and phrase dictionary that
will learn from how users act while searching.
Be aware, this is a beta version. It is not finished, but
yeilds great results if you have enough user activity, RAM
and a faily narrow document corpus. The RAM problem can be
fixed if you implement your own subclass of SpellChecker as
the abstract methods of this class are the CRUD methods.
This will most probably change to a strategy class in future
version.
TODO:
1. Gram up results to detect compositewords that should not
be composite words, and vice verse.
2. Train a gramed token (markov) chain with output from an
expectation maximization algorithm (weka clusters?) parallel
to a closest path (A* or bredth first?) to allow contextual
suggestions on queries that never was placed.
Usage:
Training
At user query time, create an instance of QueryResults
containg the query string, number of hits and a time stamp.
Add it to a chronologically ordered list in the user session
(LinkedList makes sense) that you pass on to
train(sessionQueries) as the session times out.
You also want to call the bootstrap() method every 100000
queries or so.
Spell checking
Call getSuggestions(query) and look at the results. Don't
modify it! This method call will be hidden in a facade in
future version.
Note that the spell checker is case sensitive, so you want
to clean up query the same way when you train as when you
request the suggestions.
I recommend something like query =
query.toLowerCase().replaceAll(" ", "
").trim()
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the
administrators:
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
http://www.atl
assian.com/software/jira
------------------------------------------------------------
---------
To unsubscribe, e-mail: java-dev-unsubscribe lucene.apache.org
For additional commands, e-mail: java-dev-help lucene.apache.org
|
|
| Created: (LUCENE-626) Adaptive, user
query session analyzing spell checker. |

|
2006-07-14 09:55:21 |
On Thu, 2006-07-13 at 09:20 +0000, Karl Wettin (JIRA) wrote:
> Adaptive, user query session analyzing spell checker.
I have a database with 3 million+ real user queries (session
id,
timestamp, query and hits) if anyone is interested in
fooling around
with the code. And if there is an interest, I might just
manage to
convince the owners to contribute the data to Apache.
------------------------------------------------------------
---------
To unsubscribe, e-mail: java-dev-unsubscribe lucene.apache.org
For additional commands, e-mail: java-dev-help lucene.apache.org
|
|
| Updated: (LUCENE-626) Adaptive, user
query session analyzing spell checker. |

|
2006-07-26 01:02:15 |
[ http://issues.apache.org/jira/browse/LUCENE-626?page=all
]
Karl Wettin updated LUCENE-626:
-------------------------------
Attachment: spellcheck_20060725.tar.gz
Bugfixes in bootstrap() and correction sequence extraction.
A couple of optimizations.
Negative training (didNotMean), but no automatic detection
yet. I'm evaluation a couple of solutions. So perhaps next
time(tm).
> Adaptive, user query session analyzing spell checker.
> -----------------------------------------------------
>
> Key: LUCENE-626
> URL: http:
//issues.apache.org/jira/browse/LUCENE-626
> Project: Lucene - Java
> Issue Type: New Feature
> Components: Search
> Reporter: Karl Wettin
> Priority: Minor
> Attachments: spellcheck_0.0.1.tar.gz,
spellcheck_20060725.tar.gz
>
>
> From javadocs:
> This is an adaptive, user query session analyzing
spell checker. In plain words, a word and phrase dictionary
that will learn from how users act while searching.
> Be aware, this is a beta version. It is not finished,
but yeilds great results if you have enough user activity,
RAM and a faily narrow document corpus. The RAM problem can
be fixed if you implement your own subclass of SpellChecker
as the abstract methods of this class are the CRUD methods.
This will most probably change to a strategy class in future
version.
> TODO:
> 1. Gram up results to detect compositewords that should
not be composite words, and vice verse.
> 2. Train a gramed token (markov) chain with output from
an expectation maximization algorithm (weka clusters?)
parallel to a closest path (A* or bredth first?) to allow
contextual suggestions on queries that never was placed.
> Usage:
> Training
> At user query time, create an instance of QueryResults
containg the query string, number of hits and a time stamp.
Add it to a chronologically ordered list in the user session
(LinkedList makes sense) that you pass on to
train(sessionQueries) as the session times out.
> You also want to call the bootstrap() method every
100000 queries or so.
> Spell checking
> Call getSuggestions(query) and look at the results.
Don't modify it! This method call will be hidden in a
facade in future version.
> Note that the spell checker is case sensitive, so you
want to clean up query the same way when you train as when
you request the suggestions.
> I recommend something like query =
query.toLowerCase().replaceAll(" ", "
").trim()
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the
administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atl
assian.com/software/jira
------------------------------------------------------------
---------
To unsubscribe, e-mail: java-dev-unsubscribe lucene.apache.org
For additional commands, e-mail: java-dev-help lucene.apache.org
|
|
| Updated: (LUCENE-626) Adaptive, user
query session analyzing spell checker. |

|
2006-08-04 06:01:15 |
[ http://issues.apache.org/jira/browse/LUCENE-626?page=all
]
Karl Wettin updated LUCENE-626:
-------------------------------
Attachment: spellcheck_20060804.tar.gz
beta 3
total rewrite with focus on adaptation.
session search sequence extraction, training and suggesting
are now seperate classes passed to the spell checker.
still require lots of user interaction to build a sufficient
dictionary.
has no optimization. bootstrap has been removed and will
probably re-appear in future default suggestion scheme
instead. should be fast enough.
now also comes with some junit test cases.
default implementations are quite simple, but effective:
strips suggestive data (trained suggestive- and test
phrases) from punctuation and whitespace in order to find
incorrect composite and decomposed words. e.g. "the
davinci code" --> "the da vinci code",
"a clock work orange" --> "a clockwork
orage".
beta 4 will focus on training- and suggestion classes that
works on secondary trie populated with known good data
extracted from corpus, navigated with edit distance. perhaps
a forest-type trie to allow any starting point in a phrase.
OR
beta 4 will focus on discrimiating trained queries to build
clusters and suggest (facet) classes parallell to a plain
text suggestion. that would be a major ram-consumer and
require lots of manual tweaking per implemenation, but a
cool enough feature.
time will tell.
> Adaptive, user query session analyzing spell checker.
> -----------------------------------------------------
>
> Key: LUCENE-626
> URL: http:
//issues.apache.org/jira/browse/LUCENE-626
> Project: Lucene - Java
> Issue Type: New Feature
> Components: Search
> Reporter: Karl Wettin
> Priority: Minor
> Attachments: spellcheck_0.0.1.tar.gz,
spellcheck_20060725.tar.gz, spellcheck_20060804.tar.gz
>
>
> From javadocs:
> This is an adaptive, user query session analyzing
spell checker. In plain words, a word and phrase dictionary
that will learn from how users act while searching.
> Be aware, this is a beta version. It is not finished,
but yeilds great results if you have enough user activity,
RAM and a faily narrow document corpus. The RAM problem can
be fixed if you implement your own subclass of SpellChecker
as the abstract methods of this class are the CRUD methods.
This will most probably change to a strategy class in future
version.
> TODO:
> 1. Gram up results to detect compositewords that should
not be composite words, and vice verse.
> 2. Train a gramed token (markov) chain with output from
an expectation maximization algorithm (weka clusters?)
parallel to a closest path (A* or bredth first?) to allow
contextual suggestions on queries that never was placed.
> Usage:
> Training
> At user query time, create an instance of QueryResults
containg the query string, number of hits and a time stamp.
Add it to a chronologically ordered list in the user session
(LinkedList makes sense) that you pass on to
train(sessionQueries) as the session times out.
> You also want to call the bootstrap() method every
100000 queries or so.
> Spell checking
> Call getSuggestions(query) and look at the results.
Don't modify it! This method call will be hidden in a
facade in future version.
> Note that the spell checker is case sensitive, so you
want to clean up query the same way when you train as when
you request the suggestions.
> I recommend something like query =
query.toLowerCase().replaceAll(" ", "
").trim()
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the
administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atl
assian.com/software/jira
------------------------------------------------------------
---------
To unsubscribe, e-mail: java-dev-unsubscribe lucene.apache.org
For additional commands, e-mail: java-dev-help lucene.apache.org
|
|
| Updated: (LUCENE-626) Adaptive, user
query session analyzing spell checker. |

|
2006-08-04 21:16:15 |
[ http://issues.apache.org/jira/browse/LUCENE-626?page=all
]
Karl Wettin updated LUCENE-626:
-------------------------------
Attachment: spellcheck_20060804_2.tar.gz
oups, i attached the old code last time.
> Adaptive, user query session analyzing spell checker.
> -----------------------------------------------------
>
> Key: LUCENE-626
> URL: http:
//issues.apache.org/jira/browse/LUCENE-626
> Project: Lucene - Java
> Issue Type: New Feature
> Components: Search
> Reporter: Karl Wettin
> Priority: Minor
> Attachments: spellcheck_0.0.1.tar.gz,
spellcheck_20060725.tar.gz, spellcheck_20060804.tar.gz,
spellcheck_20060804_2.tar.gz
>
>
> From javadocs:
> This is an adaptive, user query session analyzing
spell checker. In plain words, a word and phrase dictionary
that will learn from how users act while searching.
> Be aware, this is a beta version. It is not finished,
but yeilds great results if you have enough user activity,
RAM and a faily narrow document corpus. The RAM problem can
be fixed if you implement your own subclass of SpellChecker
as the abstract methods of this class are the CRUD methods.
This will most probably change to a strategy class in future
version.
> TODO:
> 1. Gram up results to detect compositewords that should
not be composite words, and vice verse.
> 2. Train a gramed token (markov) chain with output from
an expectation maximization algorithm (weka clusters?)
parallel to a closest path (A* or bredth first?) to allow
contextual suggestions on queries that never was placed.
> Usage:
> Training
> At user query time, create an instance of QueryResults
containg the query string, number of hits and a time stamp.
Add it to a chronologically ordered list in the user session
(LinkedList makes sense) that you pass on to
train(sessionQueries) as the session times out.
> You also want to call the bootstrap() method every
100000 queries or so.
> Spell checking
> Call getSuggestions(query) and look at the results.
Don't modify it! This method call will be hidden in a
facade in future version.
> Note that the spell checker is case sensitive, so you
want to clean up query the same way when you train as when
you request the suggestions.
> I recommend something like query =
query.toLowerCase().replaceAll(" ", "
").trim()
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the
administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atl
assian.com/software/jira
------------------------------------------------------------
---------
To unsubscribe, e-mail: java-dev-unsubscribe lucene.apache.org
For additional commands, e-mail: java-dev-help lucene.apache.org
|
|
| Updated: (LUCENE-626) Adaptive, user
query session analyzing spell checker. |

|
2007-01-29 19:00:58 |
[ https://issues.apache.org/jira/browse/LUCENE-626?page=com.
atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Karl Wettin updated LUCENE-626:
-------------------------------
Attachment: (was: spellcheck_0.0.1.tar.gz)
> Adaptive, user query session analyzing spell checker.
> -----------------------------------------------------
>
> Key: LUCENE-626
> URL: http
s://issues.apache.org/jira/browse/LUCENE-626
> Project: Lucene - Java
> Issue Type: New Feature
> Components: Search
> Reporter: Karl Wettin
> Priority: Minor
> Attachments: spellcheck_20060804.tar.gz,
spellcheck_20060804_2.tar.gz
>
>
> From javadocs:
> This is an adaptive, user query session analyzing
spell checker. In plain words, a word and phrase dictionary
that will learn from how users act while searching.
> Be aware, this is a beta version. It is not finished,
but yeilds great results if you have enough user activity,
RAM and a faily narrow document corpus. The RAM problem can
be fixed if you implement your own subclass of SpellChecker
as the abstract methods of this class are the CRUD methods.
This will most probably change to a strategy class in future
version.
> TODO:
> 1. Gram up results to detect compositewords that should
not be composite words, and vice verse.
> 2. Train a gramed token (markov) chain with output from
an expectation maximization algorithm (weka clusters?)
parallel to a closest path (A* or bredth first?) to allow
contextual suggestions on queries that never was placed.
> Usage:
> Training
> At user query time, create an instance of QueryResults
containg the query string, number of hits and a time stamp.
Add it to a chronologically ordered list in the user session
(LinkedList makes sense) that you pass on to
train(sessionQueries) as the session times out.
> You also want to call the bootstrap() method every
100000 queries or so.
> Spell checking
> Call getSuggestions(query) and look at the results.
Don't modify it! This method call will be hidden in a facade
in future version.
> Note that the spell checker is case sensitive, so you
want to clean up query the same way when you train as when
you request the suggestions.
> I recommend something like query =
query.toLowerCase().replaceAll(" ", "
").trim()
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue
online.
------------------------------------------------------------
---------
To unsubscribe, e-mail: java-dev-unsubscribe lucene.apache.org
For additional commands, e-mail: java-dev-help lucene.apache.org
|
|
| Updated: (LUCENE-626) Adaptive, user
query session analyzing spell checker. |

|
2007-01-29 19:01:08 |
[ https://issues.apache.org/jira/browse/LUCENE-626?page=com.
atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Karl Wettin updated LUCENE-626:
-------------------------------
Attachment: (was: spellcheck_20060725.tar.gz)
> Adaptive, user query session analyzing spell checker.
> -----------------------------------------------------
>
> Key: LUCENE-626
> URL: http
s://issues.apache.org/jira/browse/LUCENE-626
> Project: Lucene - Java
> Issue Type: New Feature
> Components: Search
> Reporter: Karl Wettin
> Priority: Minor
> Attachments: spellcheck_20060804.tar.gz,
spellcheck_20060804_2.tar.gz
>
>
> From javadocs:
> This is an adaptive, user query session analyzing
spell checker. In plain words, a word and phrase dictionary
that will learn from how users act while searching.
> Be aware, this is a beta version. It is not finished,
but yeilds great results if you have enough user activity,
RAM and a faily narrow document corpus. The RAM problem can
be fixed if you implement your own subclass of SpellChecker
as the abstract methods of this class are the CRUD methods.
This will most probably change to a strategy class in future
version.
> TODO:
> 1. Gram up results to detect compositewords that should
not be composite words, and vice verse.
> 2. Train a gramed token (markov) chain with output from
an expectation maximization algorithm (weka clusters?)
parallel to a closest path (A* or bredth first?) to allow
contextual suggestions on queries that never was placed.
> Usage:
> Training
> At user query time, create an instance of QueryResults
containg the query string, number of hits and a time stamp.
Add it to a chronologically ordered list in the user session
(LinkedList makes sense) that you pass on to
train(sessionQueries) as the session times out.
> You also want to call the bootstrap() method every
100000 queries or so.
> Spell checking
> Call getSuggestions(query) and look at the results.
Don't modify it! This method call will be hidden in a facade
in future version.
> Note that the spell checker is case sensitive, so you
want to clean up query the same way when you train as when
you request the suggestions.
> I recommend something like query =
query.toLowerCase().replaceAll(" ", "
").trim()
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue
online.
------------------------------------------------------------
---------
To unsubscribe, e-mail: java-dev-unsubscribe lucene.apache.org
For additional commands, e-mail: java-dev-help lucene.apache.org
|
|
| Updated: (LUCENE-626) Adaptive, user
query session analyzing spell checker. |

|
2007-01-29 19:01:08 |
[ https://issues.apache.org/jira/browse/LUCENE-626?page=com.
atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Karl Wettin updated LUCENE-626:
-------------------------------
Attachment: (was: spellcheck_20060804.tar.gz)
> Adaptive, user query session analyzing spell checker.
> -----------------------------------------------------
>
> Key: LUCENE-626
> URL: http
s://issues.apache.org/jira/browse/LUCENE-626
> Project: Lucene - Java
> Issue Type: New Feature
> Components: Search
> Reporter: Karl Wettin
> Priority: Minor
> Attachments: spellcheck_20060804_2.tar.gz
>
>
> From javadocs:
> This is an adaptive, user query session analyzing
spell checker. In plain words, a word and phrase dictionary
that will learn from how users act while searching.
> Be aware, this is a beta version. It is not finished,
but yeilds great results if you have enough user activity,
RAM and a faily narrow document corpus. The RAM problem can
be fixed if you implement your own subclass of SpellChecker
as the abstract methods of this class are the CRUD methods.
This will most probably change to a strategy class in future
version.
> TODO:
> 1. Gram up results to detect compositewords that should
not be composite words, and vice verse.
> 2. Train a gramed token (markov) chain with output from
an expectation maximization algorithm (weka clusters?)
parallel to a closest path (A* or bredth first?) to allow
contextual suggestions on queries that never was placed.
> Usage:
> Training
> At user query time, create an instance of QueryResults
containg the query string, number of hits and a time stamp.
Add it to a chronologically ordered list in the user session
(LinkedList makes sense) that you pass on to
train(sessionQueries) as the session times out.
> You also want to call the bootstrap() method every
100000 queries or so.
> Spell checking
> Call getSuggestions(query) and look at the results.
Don't modify it! This method call will be hidden in a facade
in future version.
> Note that the spell checker is case sensitive, so you
want to clean up query the same way when you train as when
you request the suggestions.
> I recommend something like query =
query.toLowerCase().replaceAll(" ", "
").trim()
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue
online.
------------------------------------------------------------
---------
To unsubscribe, e-mail: java-dev-unsubscribe lucene.apache.org
For additional commands, e-mail: java-dev-help lucene.apache.org
|
|
| Updated: (LUCENE-626) Adaptive, user
query session analyzing spell checker. |

|
2007-01-30 06:25:34 |
[ https://issues.apache.org/jira/browse/LUCENE-626?page=com.
atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Karl Wettin updated LUCENE-626:
-------------------------------
Attachment: (was: spellcheck_20060804_2.tar.gz)
> Adaptive, user query session analyzing spell checker.
> -----------------------------------------------------
>
> Key: LUCENE-626
> URL: http
s://issues.apache.org/jira/browse/LUCENE-626
> Project: Lucene - Java
> Issue Type: New Feature
> Components: Search
> Reporter: Karl Wettin
> Priority: Minor
> Attachments: spellchecker.diff
>
>
> From javadocs:
> This is an adaptive, user query session analyzing
spell checker. In plain words, a word and phrase dictionary
that will learn from how users act while searching.
> Be aware, this is a beta version. It is not finished,
but yeilds great results if you have enough user activity,
RAM and a faily narrow document corpus. The RAM problem can
be fixed if you implement your own subclass of SpellChecker
as the abstract methods of this class are the CRUD methods.
This will most probably change to a strategy class in future
version.
> TODO:
> 1. Gram up results to detect compositewords that should
not be composite words, and vice verse.
> 2. Train a gramed token (markov) chain with output from
an expectation maximization algorithm (weka clusters?)
parallel to a closest path (A* or bredth first?) to allow
contextual suggestions on queries that never was placed.
> Usage:
> Training
> At user query time, create an instance of QueryResults
containg the query string, number of hits and a time stamp.
Add it to a chronologically ordered list in the user session
(LinkedList makes sense) that you pass on to
train(sessionQueries) as the session times out.
> You also want to call the bootstrap() method every
100000 queries or so.
> Spell checking
> Call getSuggestions(query) and look at the results.
Don't modify it! This method call will be hidden in a facade
in future version.
> Note that the spell checker is case sensitive, so you
want to clean up query the same way when you train as when
you request the suggestions.
> I recommend something like query =
query.toLowerCase().replaceAll(" ", "
").trim()
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue
online.
------------------------------------------------------------
---------
To unsubscribe, e-mail: java-dev-unsubscribe lucene.apache.org
For additional commands, e-mail: java-dev-help lucene.apache.org
|
|
| Updated: (LUCENE-626) Adaptive, user
query session analyzing spell checker. |

|
2007-01-30 06:25:34 |
[ https://issues.apache.org/jira/browse/LUCENE-626?page=com.
atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Karl Wettin updated LUCENE-626:
-------------------------------
Attachment: spellchecker.diff
It uses the ngram spell checker for queries yet not
corrected by users, but it handles more than one word at the
time, and it inspects the term position vector if available.
This way it can also rearange input to the most probable
order.
addDocument(indexWriter, field, "heroes of might
and magic III complete");
addDocument(indexWriter, field, "it might be the
best game ever made");
assertEquals("heroes of might and magic",
suggester.didYouMean("hereos of magic and
might"));
assertEquals("heroes of might and magic",
suggester.didYouMean("hereos of light and
magic"));
assertEquals("heroes might magic",
suggester.didYouMean("magic light heros"));
assertEquals("best game made",
suggester.didYouMean("game best made"));
assertEquals("game made",
suggester.didYouMean("made game"));
assertEquals("game made",
suggester.didYouMean("made lame"));
assertEquals(null, suggester.didYouMean("may
game"));
Once someone clicks on a suggestion (you have to report this
back to the suggester) it will get a higher priority. If the
person reports interest in one or many of the results in the
followed suggested query, it will get an even higher
priority. If something is suggested but not clicked on, then
the priority will go down. When the priority reaches a lower
threadshold, it will no loger be suggested, and the next
best suggestion will appear. And so on.
To change the query manually is the same thing as clicking
on a suggestions, given it is similar enough and withing a
certain timeframe.
assertEquals("homm",
suggester.didYouMean("heroes of might and
magic"));
assertEquals("heroes of might and magic",
suggester.didYouMean("heroes of night and
magic"));
assertEquals("homm",
suggester.didYouMean("heroes of might and
magic"));
assertEquals("heroes of might and magic",
suggester.didYouMean("homm"));
The data is stored in a Map<String /*query*/,
List<Suggestion>>, and the default implementation
strips the query from p. That should help with
composite and decomposite, amongst much.
assertEquals("the da vinci code",
suggester.didYouMean("thedavincicode"));
assertEquals("the da vinci code",
suggester.didYouMean("the dav-inci code"));
assertEquals("heroes of might and magic",
suggester.didYouMean("heroes ofnight andmagic"));
It seems as the ngram spell check tests is broken - requires
the removed class English. I've re-introduced it in
Lucene-550.
I will not work further on this patch and issue. It will be
added to Lucene-550 for chaching and such.
> Adaptive, user query session analyzing spell checker.
> -----------------------------------------------------
>
> Key: LUCENE-626
> URL: http
s://issues.apache.org/jira/browse/LUCENE-626
> Project: Lucene - Java
> Issue Type: New Feature
> Components: Search
> Reporter: Karl Wettin
> Priority: Minor
> Attachments: spellchecker.diff
>
>
> From javadocs:
> This is an adaptive, user query session analyzing
spell checker. In plain words, a word and phrase dictionary
that will learn from how users act while searching.
> Be aware, this is a beta version. It is not finished,
but yeilds great results if you have enough user activity,
RAM and a faily narrow document corpus. The RAM problem can
be fixed if you implement your own subclass of SpellChecker
as the abstract methods of this class are the CRUD methods.
This will most probably change to a strategy class in future
version.
> TODO:
> 1. Gram up results to detect compositewords that should
not be composite words, and vice verse.
> 2. Train a gramed token (markov) chain with output from
an expectation maximization algorithm (weka clusters?)
parallel to a closest path (A* or bredth first?) to allow
contextual suggestions on queries that never was placed.
> Usage:
> Training
> At user query time, create an instance of QueryResults
containg the query string, number of hits and a time stamp.
Add it to a chronologically ordered list in the user session
(LinkedList makes sense) that you pass on to
train(sessionQueries) as the session times out.
> You also want to call the bootstrap() method every
100000 queries or so.
> Spell checking
> Call getSuggestions(query) and look at the results.
Don't modify it! This method call will be hidden in a facade
in future version.
> Note that the spell checker is case sensitive, so you
want to clean up query the same way when you train as when
you request the suggestions.
> I recommend something like query =
query.toLowerCase().replaceAll(" ", "
").trim()
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue
online.
------------------------------------------------------------
---------
To unsubscribe, e-mail: java-dev-unsubscribe lucene.apache.org
For additional commands, e-mail: java-dev-help lucene.apache.org
|
|
|
|