List Info

Thread: Created: (LUCENE-794) Beginnings of a span based highlighter




Created: (LUCENE-794) Beginnings of a span based highlighter
user name
2007-02-03 06:49:05
Beginnings of a span based highlighter
--------------------------------------

                 Key: LUCENE-794
                 URL: http
s://issues.apache.org/jira/browse/LUCENE-794
             Project: Lucene - Java
          Issue Type: Improvement
          Components: Other
         Environment: There are prob a few Java 1.5
requirements (generics) that could easily be removed.
            Reporter: Mark Miller
            Priority: Minor


This is some test code to start the work of adding a span
based highlighting approach to the existing highlighter in
contrib. See http:
//issues.apache.org/jira/browse/LUCENE-403 for some
background.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue
online.


------------------------------------------------------------
---------
To unsubscribe, e-mail: java-dev-unsubscribelucene.apache.org
For additional commands, e-mail: java-dev-helplucene.apache.org


Updated: (LUCENE-794) Beginnings of a span based highlighter
user name
2007-02-03 06:51:05
     [ https://issues.apache.org/jira/browse/LUCENE-794?page=com.
atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Mark Miller updated LUCENE-794:
-------------------------------

    Attachment: Formatter.java
                Encoder.java
                DefaultEncoder.java

> Beginnings of a span based highlighter
> --------------------------------------
>
>                 Key: LUCENE-794
>                 URL: http
s://issues.apache.org/jira/browse/LUCENE-794
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Other
>         Environment: There are prob a few Java 1.5
requirements (generics) that could easily be removed.
>            Reporter: Mark Miller
>            Priority: Minor
>         Attachments: DefaultEncoder.java, Encoder.java,
Formatter.java, Highlighter.java, QuerySpansExtractor.java,
SimpleFormatter.java
>
>
> This is some test code to start the work of adding a
span based highlighting approach to the existing highlighter
in contrib. See http:
//issues.apache.org/jira/browse/LUCENE-403 for some
background.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue
online.


------------------------------------------------------------
---------
To unsubscribe, e-mail: java-dev-unsubscribelucene.apache.org
For additional commands, e-mail: java-dev-helplucene.apache.org


Updated: (LUCENE-794) Beginnings of a span based highlighter
user name
2007-02-03 06:51:05
     [ https://issues.apache.org/jira/browse/LUCENE-794?page=com.
atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Mark Miller updated LUCENE-794:
-------------------------------

    Attachment: SimpleFormatter.java
                QuerySpansExtractor.java
                Highlighter.java

> Beginnings of a span based highlighter
> --------------------------------------
>
>                 Key: LUCENE-794
>                 URL: http
s://issues.apache.org/jira/browse/LUCENE-794
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Other
>         Environment: There are prob a few Java 1.5
requirements (generics) that could easily be removed.
>            Reporter: Mark Miller
>            Priority: Minor
>         Attachments: DefaultEncoder.java, Encoder.java,
Formatter.java, Highlighter.java, QuerySpansExtractor.java,
SimpleFormatter.java
>
>
> This is some test code to start the work of adding a
span based highlighting approach to the existing highlighter
in contrib. See http:
//issues.apache.org/jira/browse/LUCENE-403 for some
background.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue
online.


------------------------------------------------------------
---------
To unsubscribe, e-mail: java-dev-unsubscribelucene.apache.org
For additional commands, e-mail: java-dev-helplucene.apache.org


Updated: (LUCENE-794) Beginnings of a span based highlighter
user name
2007-02-03 06:53:05
     [ https://issues.apache.org/jira/browse/LUCENE-794?page=com.
atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Mark Miller updated LUCENE-794:
-------------------------------

    Attachment: HighlighterTest.java

> Beginnings of a span based highlighter
> --------------------------------------
>
>                 Key: LUCENE-794
>                 URL: http
s://issues.apache.org/jira/browse/LUCENE-794
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Other
>         Environment: There are prob a few Java 1.5
requirements (generics) that could easily be removed.
>            Reporter: Mark Miller
>            Priority: Minor
>         Attachments: DefaultEncoder.java, Encoder.java,
Formatter.java, Highlighter.java, HighlighterTest.java,
QuerySpansExtractor.java, SimpleFormatter.java
>
>
> This is some test code to start the work of adding a
span based highlighting approach to the existing highlighter
in contrib. See http:
//issues.apache.org/jira/browse/LUCENE-403 for some
background.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue
online.


------------------------------------------------------------
---------
To unsubscribe, e-mail: java-dev-unsubscribelucene.apache.org
For additional commands, e-mail: java-dev-helplucene.apache.org


Updated: (LUCENE-794) Beginnings of a span based highlighter
user name
2007-02-03 06:55:05
     [ https://issues.apache.org/jira/browse/LUCENE-794?page=com.
atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Mark Miller updated LUCENE-794:
-------------------------------

    Description: 
This is some test code to start the work of adding a span
based highlighting approach to the existing highlighter in
contrib. See http:
//issues.apache.org/jira/browse/LUCENE-403 for some
background.

There is a dependency on MemoryIndex.

  was:This is some test code to start the work of adding a
span based highlighting approach to the existing highlighter
in contrib. See http:
//issues.apache.org/jira/browse/LUCENE-403 for some
background.


> Beginnings of a span based highlighter
> --------------------------------------
>
>                 Key: LUCENE-794
>                 URL: http
s://issues.apache.org/jira/browse/LUCENE-794
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Other
>         Environment: There are prob a few Java 1.5
requirements (generics) that could easily be removed.
>            Reporter: Mark Miller
>            Priority: Minor
>         Attachments: DefaultEncoder.java, Encoder.java,
Formatter.java, Highlighter.java, HighlighterTest.java,
QuerySpansExtractor.java, SimpleFormatter.java
>
>
> This is some test code to start the work of adding a
span based highlighting approach to the existing highlighter
in contrib. See http:
//issues.apache.org/jira/browse/LUCENE-403 for some
background.
> There is a dependency on MemoryIndex.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue
online.


------------------------------------------------------------
---------
To unsubscribe, e-mail: java-dev-unsubscribelucene.apache.org
For additional commands, e-mail: java-dev-helplucene.apache.org


Commented: (LUCENE-794) Beginnings of a span based highlighter
country flaguser name
United States
2007-02-04 12:27:05
    [ https://issues.apache.org/jira/browse/
LUCENE-794?page=com.atlassian.jira.plugin.system.issuetabpan
els:comment-tabpanel#action_12470074 ] 

Mark Miller commented on LUCENE-794:
------------------------------------

There are two highlighting modes: highlight entire spans or
highlight first and last word of each span. For the
highlight first and last word of span it would probably be
better to change QuerySpansExtractor.getSpansFromPhraseQuery
so that it creates a series of near spans instead of a
single near span with multiple clauses.

> Beginnings of a span based highlighter
> --------------------------------------
>
>                 Key: LUCENE-794
>                 URL: http
s://issues.apache.org/jira/browse/LUCENE-794
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Other
>         Environment: There are prob a few Java 1.5
requirements (generics) that could easily be removed.
>            Reporter: Mark Miller
>            Priority: Minor
>         Attachments: DefaultEncoder.java, Encoder.java,
Formatter.java, Highlighter.java, HighlighterTest.java,
QuerySpansExtractor.java, SimpleFormatter.java
>
>
> This is some test code to start the work of adding a
span based highlighting approach to the existing highlighter
in contrib. See http:
//issues.apache.org/jira/browse/LUCENE-403 for some
background.
> There is a dependency on MemoryIndex.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue
online.


------------------------------------------------------------
---------
To unsubscribe, e-mail: java-dev-unsubscribelucene.apache.org
For additional commands, e-mail: java-dev-helplucene.apache.org


Commented: (LUCENE-794) Beginnings of a span based highlighter
country flaguser name
United States
2007-02-04 15:29:05
    [ https://issues.apache.org/jira/browse/
LUCENE-794?page=com.atlassian.jira.plugin.system.issuetabpan
els:comment-tabpanel#action_12470090 ] 

Mark Harwood commented on LUCENE-794:
-------------------------------------

Looks like a good start, Mark - thanks for contributing
this!

I've had a quick play and have identified the following
issues:

1) Fieldname "contents" shouldn't be hardcoded
into the Highlighter - different analyzers can behave
differently for different fields (see
PerFieldAnalyzerWrapper). Either pass a fieldname parameter
or do as the existing highlighter does and take a
TokenStream. The latter approach has the advantage of being
able to avoid re-analysis and make use of any stored
TermVectors (see TokenSources.java)
2) Analyzers which produce overlapping tokens (see Synonym
analyzer in existing highlighter Junit test) are problematic
in the existing code. I remember the "TokenGroup"
class in the existing highlighter was an approach to help
cater for these "overlap" scenarios.
3) Without wishing to resurrect the whole 1.4 vs 1.5 debate
I beleive Lucene still targets Java 1.4. 

To rectify these points it's not clear to me if it would be
quicker to use your code or adapt the existing highlighter
code to use spans.
Thoughts?

Thanks, again,
Mark





 

> Beginnings of a span based highlighter
> --------------------------------------
>
>                 Key: LUCENE-794
>                 URL: http
s://issues.apache.org/jira/browse/LUCENE-794
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Other
>         Environment: There are prob a few Java 1.5
requirements (generics) that could easily be removed.
>            Reporter: Mark Miller
>            Priority: Minor
>         Attachments: DefaultEncoder.java, Encoder.java,
Formatter.java, Highlighter.java, HighlighterTest.java,
QuerySpansExtractor.java, SimpleFormatter.java
>
>
> This is some test code to start the work of adding a
span based highlighting approach to the existing highlighter
in contrib. See http:
//issues.apache.org/jira/browse/LUCENE-403 for some
background.
> There is a dependency on MemoryIndex.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue
online.


------------------------------------------------------------
---------
To unsubscribe, e-mail: java-dev-unsubscribelucene.apache.org
For additional commands, e-mail: java-dev-helplucene.apache.org


Updated: (LUCENE-794) Beginnings of a span based highlighter
country flaguser name
United States
2007-02-04 17:18:05
     [ https://issues.apache.org/jira/browse/LUCENE-794?page=com.
atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Mark Miller updated LUCENE-794:
-------------------------------

    Environment:     (was: There are prob a few Java 1.5
requirements (generics) that could easily be removed.)

> Beginnings of a span based highlighter
> --------------------------------------
>
>                 Key: LUCENE-794
>                 URL: http
s://issues.apache.org/jira/browse/LUCENE-794
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Other
>            Reporter: Mark Miller
>            Priority: Minor
>         Attachments: DefaultEncoder.java, Encoder.java,
Formatter.java, Highlighter.java, HighlighterTest.java,
QuerySpansExtractor.java, SimpleFormatter.java
>
>
> This is some test code to start the work of adding a
span based highlighting approach to the existing highlighter
in contrib. See http:
//issues.apache.org/jira/browse/LUCENE-403 for some
background.
> There is a dependency on MemoryIndex.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue
online.


------------------------------------------------------------
---------
To unsubscribe, e-mail: java-dev-unsubscribelucene.apache.org
For additional commands, e-mail: java-dev-helplucene.apache.org


Updated: (LUCENE-794) Beginnings of a span based highlighter
country flaguser name
United States
2007-02-04 17:20:06
     [ https://issues.apache.org/jira/browse/LUCENE-794?page=com.
atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Mark Miller updated LUCENE-794:
-------------------------------

    Attachment: HighlighterTest.java
                Highlighter.java

> Beginnings of a span based highlighter
> --------------------------------------
>
>                 Key: LUCENE-794
>                 URL: http
s://issues.apache.org/jira/browse/LUCENE-794
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Other
>            Reporter: Mark Miller
>            Priority: Minor
>         Attachments: DefaultEncoder.java, Encoder.java,
Formatter.java, Highlighter.java, Highlighter.java,
HighlighterTest.java, HighlighterTest.java,
QuerySpansExtractor.java, SimpleFormatter.java
>
>
> This is some test code to start the work of adding a
span based highlighting approach to the existing highlighter
in contrib. See http:
//issues.apache.org/jira/browse/LUCENE-403 for some
background.
> There is a dependency on MemoryIndex.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue
online.


------------------------------------------------------------
---------
To unsubscribe, e-mail: java-dev-unsubscribelucene.apache.org
For additional commands, e-mail: java-dev-helplucene.apache.org


Updated: (LUCENE-794) Beginnings of a span based highlighter
country flaguser name
United States
2007-02-04 17:22:05
     [ https://issues.apache.org/jira/browse/LUCENE-794?page=com.
atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Mark Miller updated LUCENE-794:
-------------------------------

    Attachment: MemoryIndex.java

> Beginnings of a span based highlighter
> --------------------------------------
>
>                 Key: LUCENE-794
>                 URL: http
s://issues.apache.org/jira/browse/LUCENE-794
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Other
>            Reporter: Mark Miller
>            Priority: Minor
>         Attachments: DefaultEncoder.java, Encoder.java,
Formatter.java, Highlighter.java, Highlighter.java,
HighlighterTest.java, HighlighterTest.java,
MemoryIndex.java, QuerySpansExtractor.java,
SimpleFormatter.java
>
>
> This is some test code to start the work of adding a
span based highlighting approach to the existing highlighter
in contrib. See http:
//issues.apache.org/jira/browse/LUCENE-403 for some
background.
> There is a dependency on MemoryIndex.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue
online.


------------------------------------------------------------
---------
To unsubscribe, e-mail: java-dev-unsubscribelucene.apache.org
For additional commands, e-mail: java-dev-helplucene.apache.org


[1-10] [11-20] [21-30] [31-40] [41-50] [51-60] [61-70] [71-80] [81-83]

about | contact  Other archives ( Real Estate discussion Medical topics )