FAQ
I would like to try to replace our external storage of documents with Lucene stored field, so a few questions before we proceed:

Background: We store currently complete documents in a simple binary file and only keep offsets into this file as a Stored field in Lucene index. Documents (compressed) are short, avg 300-400bytes per document

What we would like to try is to store these documents as binary (compressed with simple/fast static huffman + dictionary) stored Fields in order to make maintenance of this setup easier as Lucene already does maintain updates and fetching of these fields, also indexing works very well from multiple threads now. Complete Document would be the only Stored field we have in index (I know about lazy loading).
Simply what we do today is , search index, get offsets from found documents and than fetch them from binary file.

So the questions would be:

1. Does anyone see any theoretical/practical reason why our homemade "fetch offset from lucene stored field-> jump to this offset in document" would be faster than "fetch stored field from Lucene directly"

2. We use Lucene Index wit MMAP directory now, so the concern is that index could grow too large for MMAP with stored field like that. Is there a way to say, "do not use MMAP Directory for stored Fields, rather FSDirectory". I think not, but it is worth to ask as I think this could be useful thing... Imagine to be able to say "Load terms with RAM Directory, postings wit MMAP and stored fields with FSDirectory"... of course this is only for searching, not indexing.

Thanks a lot.
Eks




___________________________________________________________
Yahoo! Answers - Got a question? Someone out there knows the answer. Try it
now.
http://uk.answers.yahoo.com/


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Search Discussions

  • Andrzej Bialecki at Feb 13, 2008 at 2:05 pm

    eks dev wrote:

    2. We use Lucene Index wit MMAP directory now, so the concern is
    that index could grow too large for MMAP with stored field like that.
    Is there a way to say, "do not use MMAP Directory for stored Fields,
    rather FSDirectory". I think not, but it is worth to ask as I think
    this could be useful thing... Imagine to be able to say "Load terms
    with RAM Directory, postings wit MMAP and stored fields with
    FSDirectory"... of course this is only for searching, not indexing.

    IMHO it should be possible to do this by implementing a Directory
    subclass that can be configured to use mmap / ram / fs for specific
    index file types (e.g. tis, tii, fdt, prx and so on) - you should be
    able to cut & paste large chunks of each directory code to start the
    implementation.


    --
    Best regards,
    Andrzej Bialecki <><
    ___. ___ ___ ___ _ _ __________________________________
    [__ || __|__/|__||\/| Information Retrieval, Semantic Web
    ___|||__|| \| || | Embedded Unix, System Integration
    http://www.sigram.com Contact: info at sigram dot com


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupjava-user @
categorieslucene
postedFeb 13, '08 at 1:22p
activeFeb 13, '08 at 2:05p
posts2
users2
websitelucene.apache.org

2 users in discussion

Eks dev: 1 post Andrzej Bialecki: 1 post

People

Translate

site design / logo © 2022 Grokbase