|
List Info
Thread: Created: (LUCENE-794) Beginnings of a span based highlighter
|
|
| Created: (LUCENE-794) Beginnings of a
span based highlighter |

|
2007-02-03 06:49:05 |
Beginnings of a span based highlighter
--------------------------------------
Key: LUCENE-794
URL: http
s://issues.apache.org/jira/browse/LUCENE-794
Project: Lucene - Java
Issue Type: Improvement
Components: Other
Environment: There are prob a few Java 1.5
requirements (generics) that could easily be removed.
Reporter: Mark Miller
Priority: Minor
This is some test code to start the work of adding a span
based highlighting approach to the existing highlighter in
contrib. See http:
//issues.apache.org/jira/browse/LUCENE-403 for some
background.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue
online.
------------------------------------------------------------
---------
To unsubscribe, e-mail: java-dev-unsubscribe lucene.apache.org
For additional commands, e-mail: java-dev-help lucene.apache.org
|
|
| Updated: (LUCENE-794) Beginnings of a
span based highlighter |

|
2007-02-03 06:51:05 |
[ https://issues.apache.org/jira/browse/LUCENE-794?page=com.
atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Mark Miller updated LUCENE-794:
-------------------------------
Attachment: Formatter.java
Encoder.java
DefaultEncoder.java
> Beginnings of a span based highlighter
> --------------------------------------
>
> Key: LUCENE-794
> URL: http
s://issues.apache.org/jira/browse/LUCENE-794
> Project: Lucene - Java
> Issue Type: Improvement
> Components: Other
> Environment: There are prob a few Java 1.5
requirements (generics) that could easily be removed.
> Reporter: Mark Miller
> Priority: Minor
> Attachments: DefaultEncoder.java, Encoder.java,
Formatter.java, Highlighter.java, QuerySpansExtractor.java,
SimpleFormatter.java
>
>
> This is some test code to start the work of adding a
span based highlighting approach to the existing highlighter
in contrib. See http:
//issues.apache.org/jira/browse/LUCENE-403 for some
background.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue
online.
------------------------------------------------------------
---------
To unsubscribe, e-mail: java-dev-unsubscribe lucene.apache.org
For additional commands, e-mail: java-dev-help lucene.apache.org
|
|
| Updated: (LUCENE-794) Beginnings of a
span based highlighter |

|
2007-02-03 06:51:05 |
[ https://issues.apache.org/jira/browse/LUCENE-794?page=com.
atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Mark Miller updated LUCENE-794:
-------------------------------
Attachment: SimpleFormatter.java
QuerySpansExtractor.java
Highlighter.java
> Beginnings of a span based highlighter
> --------------------------------------
>
> Key: LUCENE-794
> URL: http
s://issues.apache.org/jira/browse/LUCENE-794
> Project: Lucene - Java
> Issue Type: Improvement
> Components: Other
> Environment: There are prob a few Java 1.5
requirements (generics) that could easily be removed.
> Reporter: Mark Miller
> Priority: Minor
> Attachments: DefaultEncoder.java, Encoder.java,
Formatter.java, Highlighter.java, QuerySpansExtractor.java,
SimpleFormatter.java
>
>
> This is some test code to start the work of adding a
span based highlighting approach to the existing highlighter
in contrib. See http:
//issues.apache.org/jira/browse/LUCENE-403 for some
background.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue
online.
------------------------------------------------------------
---------
To unsubscribe, e-mail: java-dev-unsubscribe lucene.apache.org
For additional commands, e-mail: java-dev-help lucene.apache.org
|
|
| Updated: (LUCENE-794) Beginnings of a
span based highlighter |

|
2007-02-03 06:53:05 |
[ https://issues.apache.org/jira/browse/LUCENE-794?page=com.
atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Mark Miller updated LUCENE-794:
-------------------------------
Attachment: HighlighterTest.java
> Beginnings of a span based highlighter
> --------------------------------------
>
> Key: LUCENE-794
> URL: http
s://issues.apache.org/jira/browse/LUCENE-794
> Project: Lucene - Java
> Issue Type: Improvement
> Components: Other
> Environment: There are prob a few Java 1.5
requirements (generics) that could easily be removed.
> Reporter: Mark Miller
> Priority: Minor
> Attachments: DefaultEncoder.java, Encoder.java,
Formatter.java, Highlighter.java, HighlighterTest.java,
QuerySpansExtractor.java, SimpleFormatter.java
>
>
> This is some test code to start the work of adding a
span based highlighting approach to the existing highlighter
in contrib. See http:
//issues.apache.org/jira/browse/LUCENE-403 for some
background.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue
online.
------------------------------------------------------------
---------
To unsubscribe, e-mail: java-dev-unsubscribe lucene.apache.org
For additional commands, e-mail: java-dev-help lucene.apache.org
|
|
| Updated: (LUCENE-794) Beginnings of a
span based highlighter |

|
2007-02-03 06:55:05 |
[ https://issues.apache.org/jira/browse/LUCENE-794?page=com.
atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Mark Miller updated LUCENE-794:
-------------------------------
Description:
This is some test code to start the work of adding a span
based highlighting approach to the existing highlighter in
contrib. See http:
//issues.apache.org/jira/browse/LUCENE-403 for some
background.
There is a dependency on MemoryIndex.
was:This is some test code to start the work of adding a
span based highlighting approach to the existing highlighter
in contrib. See http:
//issues.apache.org/jira/browse/LUCENE-403 for some
background.
> Beginnings of a span based highlighter
> --------------------------------------
>
> Key: LUCENE-794
> URL: http
s://issues.apache.org/jira/browse/LUCENE-794
> Project: Lucene - Java
> Issue Type: Improvement
> Components: Other
> Environment: There are prob a few Java 1.5
requirements (generics) that could easily be removed.
> Reporter: Mark Miller
> Priority: Minor
> Attachments: DefaultEncoder.java, Encoder.java,
Formatter.java, Highlighter.java, HighlighterTest.java,
QuerySpansExtractor.java, SimpleFormatter.java
>
>
> This is some test code to start the work of adding a
span based highlighting approach to the existing highlighter
in contrib. See http:
//issues.apache.org/jira/browse/LUCENE-403 for some
background.
> There is a dependency on MemoryIndex.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue
online.
------------------------------------------------------------
---------
To unsubscribe, e-mail: java-dev-unsubscribe lucene.apache.org
For additional commands, e-mail: java-dev-help lucene.apache.org
|
|
| Commented: (LUCENE-794) Beginnings of a
span based highlighter |
  United States |
2007-02-04 12:27:05 |
[ https://issues.apache.org/jira/browse/
LUCENE-794?page=com.atlassian.jira.plugin.system.issuetabpan
els:comment-tabpanel#action_12470074 ]
Mark Miller commented on LUCENE-794:
------------------------------------
There are two highlighting modes: highlight entire spans or
highlight first and last word of each span. For the
highlight first and last word of span it would probably be
better to change QuerySpansExtractor.getSpansFromPhraseQuery
so that it creates a series of near spans instead of a
single near span with multiple clauses.
> Beginnings of a span based highlighter
> --------------------------------------
>
> Key: LUCENE-794
> URL: http
s://issues.apache.org/jira/browse/LUCENE-794
> Project: Lucene - Java
> Issue Type: Improvement
> Components: Other
> Environment: There are prob a few Java 1.5
requirements (generics) that could easily be removed.
> Reporter: Mark Miller
> Priority: Minor
> Attachments: DefaultEncoder.java, Encoder.java,
Formatter.java, Highlighter.java, HighlighterTest.java,
QuerySpansExtractor.java, SimpleFormatter.java
>
>
> This is some test code to start the work of adding a
span based highlighting approach to the existing highlighter
in contrib. See http:
//issues.apache.org/jira/browse/LUCENE-403 for some
background.
> There is a dependency on MemoryIndex.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue
online.
------------------------------------------------------------
---------
To unsubscribe, e-mail: java-dev-unsubscribe lucene.apache.org
For additional commands, e-mail: java-dev-help lucene.apache.org
|
|
| Commented: (LUCENE-794) Beginnings of a
span based highlighter |
  United States |
2007-02-04 15:29:05 |
[ https://issues.apache.org/jira/browse/
LUCENE-794?page=com.atlassian.jira.plugin.system.issuetabpan
els:comment-tabpanel#action_12470090 ]
Mark Harwood commented on LUCENE-794:
-------------------------------------
Looks like a good start, Mark - thanks for contributing
this!
I've had a quick play and have identified the following
issues:
1) Fieldname "contents" shouldn't be hardcoded
into the Highlighter - different analyzers can behave
differently for different fields (see
PerFieldAnalyzerWrapper). Either pass a fieldname parameter
or do as the existing highlighter does and take a
TokenStream. The latter approach has the advantage of being
able to avoid re-analysis and make use of any stored
TermVectors (see TokenSources.java)
2) Analyzers which produce overlapping tokens (see Synonym
analyzer in existing highlighter Junit test) are problematic
in the existing code. I remember the "TokenGroup"
class in the existing highlighter was an approach to help
cater for these "overlap" scenarios.
3) Without wishing to resurrect the whole 1.4 vs 1.5 debate
I beleive Lucene still targets Java 1.4.
To rectify these points it's not clear to me if it would be
quicker to use your code or adapt the existing highlighter
code to use spans.
Thoughts?
Thanks, again,
Mark
> Beginnings of a span based highlighter
> --------------------------------------
>
> Key: LUCENE-794
> URL: http
s://issues.apache.org/jira/browse/LUCENE-794
> Project: Lucene - Java
> Issue Type: Improvement
> Components: Other
> Environment: There are prob a few Java 1.5
requirements (generics) that could easily be removed.
> Reporter: Mark Miller
> Priority: Minor
> Attachments: DefaultEncoder.java, Encoder.java,
Formatter.java, Highlighter.java, HighlighterTest.java,
QuerySpansExtractor.java, SimpleFormatter.java
>
>
> This is some test code to start the work of adding a
span based highlighting approach to the existing highlighter
in contrib. See http:
//issues.apache.org/jira/browse/LUCENE-403 for some
background.
> There is a dependency on MemoryIndex.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue
online.
------------------------------------------------------------
---------
To unsubscribe, e-mail: java-dev-unsubscribe lucene.apache.org
For additional commands, e-mail: java-dev-help lucene.apache.org
|
|
| Updated: (LUCENE-794) Beginnings of a
span based highlighter |
  United States |
2007-02-04 17:18:05 |
[ https://issues.apache.org/jira/browse/LUCENE-794?page=com.
atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Mark Miller updated LUCENE-794:
-------------------------------
Environment: (was: There are prob a few Java 1.5
requirements (generics) that could easily be removed.)
> Beginnings of a span based highlighter
> --------------------------------------
>
> Key: LUCENE-794
> URL: http
s://issues.apache.org/jira/browse/LUCENE-794
> Project: Lucene - Java
> Issue Type: Improvement
> Components: Other
> Reporter: Mark Miller
> Priority: Minor
> Attachments: DefaultEncoder.java, Encoder.java,
Formatter.java, Highlighter.java, HighlighterTest.java,
QuerySpansExtractor.java, SimpleFormatter.java
>
>
> This is some test code to start the work of adding a
span based highlighting approach to the existing highlighter
in contrib. See http:
//issues.apache.org/jira/browse/LUCENE-403 for some
background.
> There is a dependency on MemoryIndex.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue
online.
------------------------------------------------------------
---------
To unsubscribe, e-mail: java-dev-unsubscribe lucene.apache.org
For additional commands, e-mail: java-dev-help lucene.apache.org
|
|
| Updated: (LUCENE-794) Beginnings of a
span based highlighter |
  United States |
2007-02-04 17:20:06 |
[ https://issues.apache.org/jira/browse/LUCENE-794?page=com.
atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Mark Miller updated LUCENE-794:
-------------------------------
Attachment: HighlighterTest.java
Highlighter.java
> Beginnings of a span based highlighter
> --------------------------------------
>
> Key: LUCENE-794
> URL: http
s://issues.apache.org/jira/browse/LUCENE-794
> Project: Lucene - Java
> Issue Type: Improvement
> Components: Other
> Reporter: Mark Miller
> Priority: Minor
> Attachments: DefaultEncoder.java, Encoder.java,
Formatter.java, Highlighter.java, Highlighter.java,
HighlighterTest.java, HighlighterTest.java,
QuerySpansExtractor.java, SimpleFormatter.java
>
>
> This is some test code to start the work of adding a
span based highlighting approach to the existing highlighter
in contrib. See http:
//issues.apache.org/jira/browse/LUCENE-403 for some
background.
> There is a dependency on MemoryIndex.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue
online.
------------------------------------------------------------
---------
To unsubscribe, e-mail: java-dev-unsubscribe lucene.apache.org
For additional commands, e-mail: java-dev-help lucene.apache.org
|
|
| Updated: (LUCENE-794) Beginnings of a
span based highlighter |
  United States |
2007-02-04 17:22:05 |
[ https://issues.apache.org/jira/browse/LUCENE-794?page=com.
atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Mark Miller updated LUCENE-794:
-------------------------------
Attachment: MemoryIndex.java
> Beginnings of a span based highlighter
> --------------------------------------
>
> Key: LUCENE-794
> URL: http
s://issues.apache.org/jira/browse/LUCENE-794
> Project: Lucene - Java
> Issue Type: Improvement
> Components: Other
> Reporter: Mark Miller
> Priority: Minor
> Attachments: DefaultEncoder.java, Encoder.java,
Formatter.java, Highlighter.java, Highlighter.java,
HighlighterTest.java, HighlighterTest.java,
MemoryIndex.java, QuerySpansExtractor.java,
SimpleFormatter.java
>
>
> This is some test code to start the work of adding a
span based highlighting approach to the existing highlighter
in contrib. See http:
//issues.apache.org/jira/browse/LUCENE-403 for some
background.
> There is a dependency on MemoryIndex.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue
online.
------------------------------------------------------------
---------
To unsubscribe, e-mail: java-dev-unsubscribe lucene.apache.org
For additional commands, e-mail: java-dev-help lucene.apache.org
|
|
|
|