FAQ
Hello,

I have indexed documents with two fields, "ARTICLE" for an article of text
and "PUB_DATE" for the article's publication date.

Given a specific single word, I want to search my index for all documents
that contain this word within the last two weeks, and have them sorted by
date:

TermQuery tq = new TermQuery(new Term("ARTICLE",mySearchWord));
Calendar cal = Calendar.getInstance();
// Date of last two weeks
cal.add(Calendar.DATE, -14);
ConstantScoreRangeQuery csrq = new
ConstantScoreRangeQuery("PUB_DATE",DateTools.dateToString(cal.getTime(),DateTools.Resolution.HOUR),null,true,true);
BooleanQuery bq = new BooleanQuery();
bq.add(tq, BooleanClause.Occur.MUST);
bq.add(csrq, BooleanClause.Occur.MUST);
TopFieldDocs docs = searcher.search(bq, null, 10, new Sort("PUB_DATE"));

My goal now is to search through the recovered documents an obtain the Term
instances (each term position) within each document and retrieve the payload
data associated with each Term instance.

The trouble I am having is in getting access to the TermPositions following
such a query.
If I only needed to query on a single term (without my date restriction), I
could easily do (and have done) this:

SpanTermQuery query = new SpanTermQuery(new Term("ARTICLE",mySearchWord));
TermSpans spans = (TermSpans) query.getSpans(indexReader);
tp = spans.getPositions();

and then iterate over each position calling

tp.getPayload(dataBuffer,0);

for example.

But alas, I cannot seem to get access to any TermPositions from my above
BooleanQuery.
I have looked into the contributed SpanExtractorClass but
ConstantScoreRangeQuery seems unsupported
and I am at a loos as to how to best use Spans here.

Any help appreciated,

C>T>

--
TH!NKMAP

Christopher Tignor | Senior Software Architect
155 Spring Street NY, NY 10012
p.212-285-8600 x385 f.212-285-8999

Search Discussions

  • Chris Hostetter at Sep 24, 2009 at 7:22 pm
    : But alas, I cannot seem to get access to any TermPositions from my above
    : BooleanQuery.

    I would suggest refactoring your "date" restriction into a Filter (there's
    fairly easy to use Filter that wraps a Query) and then execute a
    SPanTermQuery just as you describe.


    -Hoss


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Christopher Tignor at Sep 24, 2009 at 9:52 pm
    thanks for the tip.

    I don't see a way to integrate the QueryWrapperFilter (or any Filter) into
    SpanTermQuery.getSpans(indexReader) however.
    I can use a SpanQuery with an IndexSearcher as per susual but that leaves me
    back where I started. Any thoughts?

    Also, I will need to sort these results by date so that the most recent,
    say 5 are returned...

    thanks again,

    C>T>

    On Thu, Sep 24, 2009 at 3:22 PM, Chris Hostetter
    wrote:
    : But alas, I cannot seem to get access to any TermPositions from my above
    : BooleanQuery.

    I would suggest refactoring your "date" restriction into a Filter (there's
    fairly easy to use Filter that wraps a Query) and then execute a
    SPanTermQuery just as you describe.


    -Hoss


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

    --
    TH!NKMAP

    Christopher Tignor | Senior Software Architect
    155 Spring Street NY, NY 10012
    p.212-285-8600 x385 f.212-285-8999
  • Mark Miller at Sep 24, 2009 at 11:45 pm
    I should beef up that spans extractor - it can actually work on the
    constantscore multi term queries (the base ones that now have a constant
    score mode in 2.9), just like the Highlighter does. That class really
    belongs in contrib probably.

    You can use the filter and the spanquery to get the results, then do
    your getSpans call on those results to get the positions (the same way
    you might have used the span extractor, or that the highlighter works -
    a doc at a time on the already matching docs).

    --
    - Mark

    http://www.lucidimagination.com



    Christopher Tignor wrote:
    thanks for the tip.

    I don't see a way to integrate the QueryWrapperFilter (or any Filter) into
    SpanTermQuery.getSpans(indexReader) however.
    I can use a SpanQuery with an IndexSearcher as per susual but that leaves me
    back where I started. Any thoughts?

    Also, I will need to sort these results by date so that the most recent,
    say 5 are returned...

    thanks again,

    C>T>

    On Thu, Sep 24, 2009 at 3:22 PM, Chris Hostetter
    wrote:

    : But alas, I cannot seem to get access to any TermPositions from my above
    : BooleanQuery.

    I would suggest refactoring your "date" restriction into a Filter (there's
    fairly easy to use Filter that wraps a Query) and then execute a
    SPanTermQuery just as you describe.


    -Hoss


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org





    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Grant Ingersoll at Sep 25, 2009 at 12:19 pm
    Yeah, it probably makes sense to add getSpans(IR, Filter)
    On Sep 24, 2009, at 5:51 PM, Christopher Tignor wrote:

    thanks for the tip.

    I don't see a way to integrate the QueryWrapperFilter (or any
    Filter) into
    SpanTermQuery.getSpans(indexReader) however.
    I can use a SpanQuery with an IndexSearcher as per susual but that
    leaves me
    back where I started. Any thoughts?

    Also, I will need to sort these results by date so that the most
    recent,
    say 5 are returned...

    thanks again,

    C>T>

    On Thu, Sep 24, 2009 at 3:22 PM, Chris Hostetter
    wrote:
    : But alas, I cannot seem to get access to any TermPositions from
    my above
    : BooleanQuery.

    I would suggest refactoring your "date" restriction into a Filter
    (there's
    fairly easy to use Filter that wraps a Query) and then execute a
    SPanTermQuery just as you describe.


    -Hoss


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

    --
    TH!NKMAP

    Christopher Tignor | Senior Software Architect
    155 Spring Street NY, NY 10012
    p.212-285-8600 x385 f.212-285-8999
    --------------------------
    Grant Ingersoll
    http://www.lucidimagination.com/

    Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)
    using Solr/Lucene:
    http://www.lucidimagination.com/search


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupjava-user @
categorieslucene
postedSep 24, '09 at 4:49p
activeSep 25, '09 at 12:19p
posts5
users4
websitelucene.apache.org

People

Translate

site design / logo © 2022 Grokbase