FAQ
I believe I have heard that Span queries provide some way to access
document offset information for their hits somehow. Does anyone know if
this is true, and if so, how I would go about it?

Alternatively (preferably actually) does the surround code from the SVN
development area have a way of returning offsets for the matching hits?

I believe the current highlighter code matches all query terms in a hit
document, not just those satisfying a query criteria. I need a more
precise way to access the hit term offsets. I am working on hit
highlighting, hit excepts and summaries, and compound queries (is this
called search vectors?). I am still working through the surround code in
dev. to see if that gives me the compound queries I need.

I am willing to spend a few days to work on implementing adding offsets
to the returned hits (or something similar) if this is not currently
available. It is something I need, even at the cost of search efficiency.
Thanks

Sean



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Search Discussions

  • Markharw00d at Sep 6, 2005 at 6:52 am
    I believe I have heard that Span queries provide some way to access
    document offset information for their hits somehow.

    See http://marc.theaimsgroup.com/?l=lucene-user&m=112496111224218&w=2

    Faithfully selecting extracts based *exactly* on query criteria will be
    hard given complex queries eg with nested Boolean logic.

    The current highlighter matches based on ANY query terms found in the
    provided doc text
    The proposal above matches based on any spans/phrases/terms

    Both options still fail to take into account any boolean logic and show
    the real basis for the match eg the query
    (author:"Doug Cutting"AND title:"Lucene in Action") OR (author:Erik
    AND author:Otis)
    would still highlight references to "Doug Cutting" and "Lucene In
    Action" for the LIA book, despite the fact that the match was actually
    for Erik and Otis (the true authors).
    For most people this is a problem they can live with.

    Cheers
    Mark



    ___________________________________________________________
    To help you stay safe and secure online, we've developed the all new Yahoo! Security Centre. http://uk.security.yahoo.com

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Paul Elschot at Sep 6, 2005 at 7:15 am

    On Tuesday 06 September 2005 08:52, markharw00d wrote:
    I believe I have heard that Span queries provide some way to access
    document offset information for their hits somehow.

    See http://marc.theaimsgroup.com/?l=lucene-user&m=112496111224218&w=2

    Faithfully selecting extracts based *exactly* on query criteria will be
    hard given complex queries eg with nested Boolean logic.

    The current highlighter matches based on ANY query terms found in the
    provided doc text
    The proposal above matches based on any spans/phrases/terms

    Both options still fail to take into account any boolean logic and show
    the real basis for the match eg the query
    (author:"Doug Cutting"AND title:"Lucene in Action") OR (author:Erik
    AND author:Otis)
    would still highlight references to "Doug Cutting" and "Lucene In
    Action" for the LIA book, despite the fact that the match was actually
    for Erik and Otis (the true authors).
    For most people this is a problem they can live with.
    The person who solves that might also write a SpanAndQuery :)

    Regards,
    Paul Elschot


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Sean O'Connor at Sep 7, 2005 at 5:33 am
    Thanks for the input. I am looking at the suggested links now. If I make
    any progress I will return to see if any of my work would be appropriate
    to contribute back.

    Sean


    Paul Elschot wrote:
    On Tuesday 06 September 2005 08:52, markharw00d wrote:

    I believe I have heard that Span queries provide some way to access
    document offset information for their hits somehow.

    See http://marc.theaimsgroup.com/?l=lucene-user&m=112496111224218&w=2

    Faithfully selecting extracts based *exactly* on query criteria will be
    hard given complex queries eg with nested Boolean logic.

    The current highlighter matches based on ANY query terms found in the
    provided doc text
    The proposal above matches based on any spans/phrases/terms

    Both options still fail to take into account any boolean logic and show
    the real basis for the match eg the query
    (author:"Doug Cutting"AND title:"Lucene in Action") OR (author:Erik
    AND author:Otis)
    would still highlight references to "Doug Cutting" and "Lucene In
    Action" for the LIA book, despite the fact that the match was actually
    for Erik and Otis (the true authors).
    For most people this is a problem they can live with.
    The person who solves that might also write a SpanAndQuery :)

    Regards,
    Paul Elschot


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org




    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Paul Elschot at Sep 6, 2005 at 7:11 am

    On Tuesday 06 September 2005 08:21, Sean O'Connor wrote:
    I believe I have heard that Span queries provide some way to access
    document offset information for their hits somehow. Does anyone know if
    this is true, and if so, how I would go about it?

    Alternatively (preferably actually) does the surround code from the SVN
    development area have a way of returning offsets for the matching hits?
    Using getSpans(reader) on the span query will provide the Spans that
    match the query. A Spans iterates through begin/end offset pairs within
    the matching docs. This is provided by Lucene.
    I believe the current highlighter code matches all query terms in a hit
    document, not just those satisfying a query criteria. I need a more
    precise way to access the hit term offsets. I am working on hit
    highlighting, hit excepts and summaries, and compound queries (is this
    called search vectors?). I am still working through the surround code in
    dev. to see if that gives me the compound queries I need.

    I am willing to spend a few days to work on implementing adding offsets
    to the returned hits (or something similar) if this is not currently
    available. It is something I need, even at the cost of search efficiency.
    See also the thread on better highlighting that started on 25 August
    and this:
    http://issues.apache.org/bugzilla/show_bug.cgi?id=35518

    Regards,
    Paul Elschot


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupjava-user @
categorieslucene
postedSep 6, '05 at 6:21a
activeSep 7, '05 at 5:33a
posts5
users3
websitelucene.apache.org

People

Translate

site design / logo © 2022 Grokbase