FAQ
I've encountered some very puzzling Lucene behavior (I'm using 1.3dev1, StandardAnalyzer, QueryParser).

My indexed documents have, among other fields, two Text fields (indexed, tokenized, stored) called "pub_date" and "id". These two fields have similar values. A typical pub_date value is "20021121" and a typical id value is "20021121_4477".

When I search on pub_date, everything is normal. The search for the query "pub_date:20021121" responds in less than a second with about 200 hits (only 25 of which are displayed). Changing the query to "pub_date:2002112*" generates about 1000 hits with about the same response time. Changing it to "pub_date:200211*" increases the hits to about 5000 with little increase in response time. Finally, using "pub_date:2002*" generates about 35,000 hits and requires about two seconds or so.

However,..... when I try the same kinds of searches the 'id' field, the results are *very* different.

Searching against the query "id:20021121*" generates the same approximately 200 hits in less than a second. Using the query "id:2002112*" generates about 1000 hits, but it takes about two seconds. Changing the query to "id:200211*" produces about 500 hits, but requires about seven seconds. Generalizing the query to "id:2002*" causes my application to crash with an out of memory error.

As mentioned above, the two fields are very similar in content and are indexed in the same way. One behaves as expected but the other one doesn't work well at all.

Any ideas on what's going on here? Is "id" some kind of reserved Lucene word?

Regards,

Terry

Search Discussions

  • Terry Steichen at Nov 25, 2002 at 7:41 pm
    I was in error about the structures. The "pub_date" field is indexed,
    stored but *not* tokenized. When I changed the "id" field to that form, it
    then appeared to worked fine (based on a very small sample - will test more
    soon).

    Can anyone help me understand why the search takes such a huge amount of
    memory and processing time if the field is tokenized?

    Regards,

    Terry

    ----- Original Message -----
    From: "Terry Steichen" <terry@net-frame.com>
    To: "Lucene Users Group" <lucene-user@jakarta.apache.org>
    Sent: Monday, November 25, 2002 12:27 PM
    Subject: Is "id" a special case?


    I've encountered some very puzzling Lucene behavior (I'm using 1.3dev1,
    StandardAnalyzer, QueryParser).

    My indexed documents have, among other fields, two Text fields (indexed,
    tokenized, stored) called "pub_date" and "id". These two fields have
    similar values. A typical pub_date value is "20021121" and a typical id
    value is "20021121_4477".

    When I search on pub_date, everything is normal. The search for the query
    "pub_date:20021121" responds in less than a second with about 200 hits (only
    25 of which are displayed). Changing the query to "pub_date:2002112*"
    generates about 1000 hits with about the same response time. Changing it to
    "pub_date:200211*" increases the hits to about 5000 with little increase in
    response time. Finally, using "pub_date:2002*" generates about 35,000 hits
    and requires about two seconds or so.

    However,..... when I try the same kinds of searches the 'id' field, the
    results are *very* different.

    Searching against the query "id:20021121*" generates the same approximately
    200 hits in less than a second. Using the query "id:2002112*" generates
    about 1000 hits, but it takes about two seconds. Changing the query to
    "id:200211*" produces about 500 hits, but requires about seven seconds.
    Generalizing the query to "id:2002*" causes my application to crash with an
    out of memory error.

    As mentioned above, the two fields are very similar in content and are
    indexed in the same way. One behaves as expected but the other one doesn't
    work well at all.

    Any ideas on what's going on here? Is "id" some kind of reserved Lucene
    word?

    Regards,

    Terry





    --
    To unsubscribe, e-mail:
    For additional commands, e-mail:

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupjava-user @
categorieslucene
postedNov 25, '02 at 5:29p
activeNov 25, '02 at 7:41p
posts2
users1
websitelucene.apache.org

1 user in discussion

Terry Steichen: 2 posts

People

Translate

site design / logo © 2022 Grokbase