I've encountered some very puzzling Lucene behavior (I'm using 1.3dev1, StandardAnalyzer, QueryParser).
My indexed documents have, among other fields, two Text fields (indexed, tokenized, stored) called "pub_date" and "id". These two fields have similar values. A typical pub_date value is "20021121" and a typical id value is "20021121_4477".
When I search on pub_date, everything is normal. The search for the query "pub_date:20021121" responds in less than a second with about 200 hits (only 25 of which are displayed). Changing the query to "pub_date:2002112*" generates about 1000 hits with about the same response time. Changing it to "pub_date:200211*" increases the hits to about 5000 with little increase in response time. Finally, using "pub_date:2002*" generates about 35,000 hits and requires about two seconds or so.
However,..... when I try the same kinds of searches the 'id' field, the results are *very* different.
Searching against the query "id:20021121*" generates the same approximately 200 hits in less than a second. Using the query "id:2002112*" generates about 1000 hits, but it takes about two seconds. Changing the query to "id:200211*" produces about 500 hits, but requires about seven seconds. Generalizing the query to "id:2002*" causes my application to crash with an out of memory error.
As mentioned above, the two fields are very similar in content and are indexed in the same way. One behaves as expected but the other one doesn't work well at all.
Any ideas on what's going on here? Is "id" some kind of reserved Lucene word?