FAQ
I have a set of documents that all have a "timestamp" field which is stored as a long integer number. The field is indexed in my Lucene index as a number using NumericField with a precision step of 8:

Field field = new NumericField("timestamp", 8);
field.setLongValue( timestampValue);

I do this so I can do numeric range queries to retrieve all documents that fall within a specific time range.

The query I construct has two parts to it, a query, and a filter. I get the document hits as follows:

IndexReader reader = ...... some index reader.....
IndexSearcher searcher = new IndexSearcher(reader);

Filter filter = NumericRangeFilter.newLongRange("timestamp", 8, startTime, endTime, false, true);
Query query = new MatchAllDocsQuery();
searcher.search( query, filter, myCollector); // My collector is a super class of Collector - saves all Hits

Occasionally, I have a single document with a very specific timestamp I want to retrieve. Suppose that timestamp is timeX, I will create the filter as follows:

Filter filter = NumericRangeFilter.newLongRange("timestamp", 8, timeX-1, timeX, false, true);

But with this filter, the document that should be found is never found. I have even tried expanding the time range as follows, but with no success:

Filter filter = NumericRangeFilter.newLongRange("timestamp", 8, timeX-1, timeX+500, false, true);

Strangely, a filter that should NOT have found the document actually did find the document:

Filter filter = NumericRangeFilter.newLongRange("timestamp", 8, timeX, timeX+1000, false, true);

This filter should NOT have found the document since the minInclusive argument is false.

I have also noticed that sometimes when I have several documents with exactly the same timestamp, a query will return some, but not all, of the documents.

I have also tried to use a NumericRangeQuery as follows:

Query query = NumericRangeQuery.newLongRange("timestamp", 8, timeX-1, timeX, false, true);
searcher.search( query, null, myCollector);

This also fails to return my document(s).

Am I doing something wrong here? Have I misunderstood how this is supposed to work? Has anyone else had problems like this?


Thanks for any help or guidance or tips you can give me,

-Daniel Sanders

Search Discussions

  • Uwe Schindler at Sep 23, 2010 at 8:04 pm
    Hi,

    Can you provide some self-containing testcase that shows your problem? In
    most cases those problems are caused by not committing changes to
    IndexWriter before opening the IndexReader.

    Additionally, if you only want to look for exactly one timestamp (like a
    TermQuery), use a NumericRangeQuery with upper+lower inclusive = true and
    use the specific value to search for as both upper and lower.

    You may also hit a bug, that's already solved in SVN (it happens when the
    lower bound is near Long.MAX_VALUE or the upper bound near Long.MIN_VALUE):
    https://issues.apache.org/jira/browse/LUCENE-2541

    Uwe

    -----
    Uwe Schindler
    H.-H.-Meier-Allee 63, D-28213 Bremen
    http://www.thetaphi.de
    eMail: uwe@thetaphi.de

    -----Original Message-----
    From: Daniel Sanders
    Sent: Thursday, September 23, 2010 12:23 PM
    To: java-user@lucene.apache.org
    Subject: Problem with Numeric range query.


    I have a set of documents that all have a "timestamp" field which is
    stored as a
    long integer number. The field is indexed in my Lucene index as a number
    using NumericField with a precision step of 8:

    Field field = new NumericField("timestamp", 8);
    field.setLongValue( timestampValue);

    I do this so I can do numeric range queries to retrieve all documents that fall
    within a specific time range.

    The query I construct has two parts to it, a query, and a filter. I get the
    document hits as follows:

    IndexReader reader = ...... some index reader.....
    IndexSearcher searcher = new IndexSearcher(reader);

    Filter filter = NumericRangeFilter.newLongRange("timestamp", 8,
    startTime,
    endTime, false, true);
    Query query = new MatchAllDocsQuery();
    searcher.search( query, filter, myCollector); // My collector is a
    super class of
    Collector - saves all Hits

    Occasionally, I have a single document with a very specific timestamp I want to
    retrieve. Suppose that timestamp is timeX, I will create the filter as follows:
    Filter filter = NumericRangeFilter.newLongRange("timestamp", 8, timeX-1,
    timeX, false, true);

    But with this filter, the document that should be found is never found. I have
    even tried expanding the time range as follows, but with no success:

    Filter filter = NumericRangeFilter.newLongRange("timestamp", 8, timeX-1,
    timeX+500, false, true);

    Strangely, a filter that should NOT have found the document actually did find
    the document:

    Filter filter = NumericRangeFilter.newLongRange("timestamp", 8, timeX,
    timeX+1000, false, true);

    This filter should NOT have found the document since the minInclusive
    argument is false.

    I have also noticed that sometimes when I have several documents with exactly
    the same timestamp, a query will return some, but not all, of the
    documents.
    I have also tried to use a NumericRangeQuery as follows:

    Query query = NumericRangeQuery.newLongRange("timestamp", 8, timeX-1,
    timeX, false, true);
    searcher.search( query, null, myCollector);

    This also fails to return my document(s).

    Am I doing something wrong here? Have I misunderstood how this is supposed
    to work? Has anyone else had problems like this?


    Thanks for any help or guidance or tips you can give me,

    -Daniel Sanders

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Daniel Sanders at Sep 23, 2010 at 8:49 pm
    Thank you for your timely response.

    It's going to take me longer to create an isolated test case you can test this with. I will see what I can do.

    In the meantime, I have some follow up information in response to your other suggestions.

    1) I don't think my problem is that the IndexWriter has not committed the document. Here's why:


    In my test case, I first retrieve a document using a different lucene query on a different field. From that document I extract the value for timestamp field and then perform the NumericRangeQuery on that value as described below. I was doing as a way to create a unit test that would verify that the NumericRangeQuery was working properly. I think the fact that first query found the document is evidence that the IndexWriter had committed the document. Hence, I would expect that if I follow that query with a NumericRangeQuery it should be able to find the same document.

    2) I also don't think my problem is values near Long.MIN_VALUE or Long.MAX_VALUE. My values are all timestamps, which are positive integers that are not anywhere near those two extremes. The values originally come from the java.util.Date.getTime() method.

    3) I will try the upper+lower inclusive = true and using same value for min and max, although I don't see how that will change anything. I have actually debugged through the code for NumericRangeQuery, and if minInclusive == false, then min is incremented, and if maxInclusive == false, then max is decremented. So my query:

    NumericRangeQuery.newLongRange("timestamp",8,timeX-1,timeX,false,true)

    is essentially equivalent to the query you suggest trying:

    NumericRangeQuery.newLongRange("timestamp",8,timeX,timeX,true,true)

    right?

    -Daniel Sanders

    "Uwe Schindler" <uwe@thetaphi.de> 9/23/2010 2:04 PM >>>
    Hi,

    Can you provide some self-containing testcase that shows your problem? In
    most cases those problems are caused by not committing changes to
    IndexWriter before opening the IndexReader.

    Additionally, if you only want to look for exactly one timestamp (like a
    TermQuery), use a NumericRangeQuery with upper+lower inclusive = true and
    use the specific value to search for as both upper and lower.

    You may also hit a bug, that's already solved in SVN (it happens when the
    lower bound is near Long.MAX_VALUE or the upper bound near Long.MIN_VALUE):
    https://issues.apache.org/jira/browse/LUCENE-2541

    Uwe

    -----
    Uwe Schindler
    H.-H.-Meier-Allee 63, D-28213 Bremen
    http://www.thetaphi.de
    eMail: uwe@thetaphi.de

    -----Original Message-----
    From: Daniel Sanders
    Sent: Thursday, September 23, 2010 12:23 PM
    To: java-user@lucene.apache.org
    Subject: Problem with Numeric range query.


    I have a set of documents that all have a "timestamp" field which is
    stored as a
    long integer number. The field is indexed in my Lucene index as a number
    using NumericField with a precision step of 8:

    Field field = new NumericField("timestamp", 8);
    field.setLongValue( timestampValue);

    I do this so I can do numeric range queries to retrieve all documents that fall
    within a specific time range.

    The query I construct has two parts to it, a query, and a filter. I get the
    document hits as follows:

    IndexReader reader = ...... some index reader.....
    IndexSearcher searcher = new IndexSearcher(reader);

    Filter filter = NumericRangeFilter.newLongRange("timestamp", 8,
    startTime,
    endTime, false, true);
    Query query = new MatchAllDocsQuery();
    searcher.search( query, filter, myCollector); // My collector is a
    super class of
    Collector - saves all Hits

    Occasionally, I have a single document with a very specific timestamp I want to
    retrieve. Suppose that timestamp is timeX, I will create the filter as follows:
    Filter filter = NumericRangeFilter.newLongRange("timestamp", 8, timeX-1,
    timeX, false, true);

    But with this filter, the document that should be found is never found. I have
    even tried expanding the time range as follows, but with no success:

    Filter filter = NumericRangeFilter.newLongRange("timestamp", 8, timeX-1,
    timeX+500, false, true);

    Strangely, a filter that should NOT have found the document actually did find
    the document:

    Filter filter = NumericRangeFilter.newLongRange("timestamp", 8, timeX,
    timeX+1000, false, true);

    This filter should NOT have found the document since the minInclusive
    argument is false.

    I have also noticed that sometimes when I have several documents with exactly
    the same timestamp, a query will return some, but not all, of the
    documents.
    I have also tried to use a NumericRangeQuery as follows:

    Query query = NumericRangeQuery.newLongRange("timestamp", 8, timeX-1,
    timeX, false, true);
    searcher.search( query, null, myCollector);

    This also fails to return my document(s).

    Am I doing something wrong here? Have I misunderstood how this is supposed
    to work? Has anyone else had problems like this?


    Thanks for any help or guidance or tips you can give me,

    -Daniel Sanders

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Uwe Schindler at Sep 23, 2010 at 9:03 pm
    Hi,
    Thank you for your timely response. :-)
    It's going to take me longer to create an isolated test case you can test this
    with. I will see what I can do.
    That would be fine. Often with a simple test those errors disappear, because
    they are problem in the logic somewhere else :) But you should in all cases
    try this.
    In the meantime, I have some follow up information in response to your other
    suggestions.

    1) I don't think my problem is that the IndexWriter has not committed the
    document. Here's why:


    In my test case, I first retrieve a document using a different lucene
    query on a
    different field. From that document I extract the value for timestamp field and
    then perform the NumericRangeQuery on that value as described below. I was
    doing as a way to create a unit test that would verify that the
    NumericRangeQuery was working properly. I think the fact that first query
    found the document is evidence that the IndexWriter had committed the
    document. Hence, I would expect that if I follow that query with a
    NumericRangeQuery it should be able to find the same document.
    Yes. But are you sure, that the timestamp is also indexed? If it's stored
    only, it would not find that. Or maybe the other way round.
    2) I also don't think my problem is values near Long.MIN_VALUE or
    Long.MAX_VALUE. My values are all timestamps, which are positive integers
    that are not anywhere near those two extremes. The values originally come
    from the java.util.Date.getTime() method.

    3) I will try the upper+lower inclusive = true and using same value for min and
    max, although I don't see how that will change anything. I have actually
    debugged through the code for NumericRangeQuery, and if minInclusive ==
    false, then min is incremented, and if maxInclusive == false, then max is
    decremented. So my query:

    NumericRangeQuery.newLongRange("timestamp",8,timeX-1,timeX,false,true)

    is essentially equivalent to the query you suggest trying:

    NumericRangeQuery.newLongRange("timestamp",8,timeX,timeX,true,true)

    right?
    Yes, it is the same. The Lucene test
    TestNumericRangeQuery64.testOneMatchQuery() verifies the upper=lower
    inclusive=true thing.

    Uwe


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Daniel Sanders at Sep 23, 2010 at 9:11 pm
    I'm certain the timestamp field is being indexed. It is created as follows:

    Document doc = new Document();
    ....
    NumericField timeField = new NumericField("timestamp", 8); // Defaults to indexing = true.
    timeField.setLongValue( timeX);
    doc.add( timeField);
    ...
    indexWriter.addDocument(doc);
    ...
    indexWriter.commit();

    -Daniel


    "Uwe Schindler" <uwe@thetaphi.de> 9/23/2010 3:02 PM >>>
    Hi,
    Thank you for your timely response. :-)
    It's going to take me longer to create an isolated test case you can test this
    with. I will see what I can do.
    That would be fine. Often with a simple test those errors disappear, because
    they are problem in the logic somewhere else :) But you should in all cases
    try this.
    In the meantime, I have some follow up information in response to your other
    suggestions.

    1) I don't think my problem is that the IndexWriter has not committed the
    document. Here's why:


    In my test case, I first retrieve a document using a different lucene
    query on a
    different field. From that document I extract the value for timestamp field and
    then perform the NumericRangeQuery on that value as described below. I was
    doing as a way to create a unit test that would verify that the
    NumericRangeQuery was working properly. I think the fact that first query
    found the document is evidence that the IndexWriter had committed the
    document. Hence, I would expect that if I follow that query with a
    NumericRangeQuery it should be able to find the same document.
    Yes. But are you sure, that the timestamp is also indexed? If it's stored
    only, it would not find that. Or maybe the other way round.
    2) I also don't think my problem is values near Long.MIN_VALUE or
    Long.MAX_VALUE. My values are all timestamps, which are positive integers
    that are not anywhere near those two extremes. The values originally come
    from the java.util.Date.getTime() method.

    3) I will try the upper+lower inclusive = true and using same value for min and
    max, although I don't see how that will change anything. I have actually
    debugged through the code for NumericRangeQuery, and if minInclusive ==
    false, then min is incremented, and if maxInclusive == false, then max is
    decremented. So my query:

    NumericRangeQuery.newLongRange("timestamp",8,timeX-1,timeX,false,true)

    is essentially equivalent to the query you suggest trying:

    NumericRangeQuery.newLongRange("timestamp",8,timeX,timeX,true,true)

    right?
    Yes, it is the same. The Lucene test
    TestNumericRangeQuery64.testOneMatchQuery() verifies the upper=lower
    inclusive=true thing.

    Uwe


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupjava-user @
categorieslucene
postedSep 23, '10 at 7:23p
activeSep 23, '10 at 9:11p
posts5
users2
websitelucene.apache.org

2 users in discussion

Daniel Sanders: 3 posts Uwe Schindler: 2 posts

People

Translate

site design / logo © 2022 Grokbase