have a custom analyzer and a slightly hacked query parser, but in
general it's the basic add document/remove document/query documents
cycle.
In my system, I'm indexing a store of external documents, maintaining
an index for full-text querying. However, I might be turned off when
documents are added, and then when I'm restarted, I'm going to need to
determine the timestamp of the last document added to the index so that
I can pick up where I left off.
There are three approaches to doing this, two using Lucene. I don't
know how I would do the two Lucene approaches, or even if they're
possible.
1. Just keep a file in parallel with the index, reading and writing the
timestamp of the last indexed document in it. I know how to do this,
but I don't like the idea of keeping a separate file.
2. Drop a timestamp onto each document as it's indexed. I've attached
timestamp fields to documents in the past so that I could do range
queries on them. However, I don't know how to do a query like "the
document with the latest timestamp" or even if that's possible.
3. Create a dummy document (with some unique field identifier so you
could quickly query for it) with a field "last timestamp". This is a
"global value storage" approach, as you could just store any field with
any value on it. But I'd be updating this timestamp field a lot, which
means that every time I updated the index I'd have to remove this
special document and reindex it. Is there any way to update the value
of a field in a document directly in the index without removing and
adding it again to the index? The field I'd want to update would just
be stored, not indexed or tokenized.
Thanks for your help in guiding my exploration into the capabilities of
Lucene.
Avi
--
Avi 'rlwimi' Drissman
[email protected]
Argh! This darn mail server is trunca
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]