FAQ
We are using a sort of nosql environment and deleting 200 gig on one machine from the database is fast, but then we go and delete 5 gigs of indexes that were created and it takes forever!!!!

Is there any option in lucene to make it so it uses LARGER files and less count of files so it is easier to maintain and wipe out an index much faster?

Thanks,
Dean

This message and any attachments are intended only for the use of the addressee and may contain information that is privileged and confidential. If the reader of the message is not the intended recipient or an authorized representative of the intended recipient, you are hereby notified that any dissemination of this communication is strictly prohibited. If you have received this communication in error, please notify us immediately by e-mail and delete the message and any attachments from your system.

Search Discussions

  • Shai Erera at Jul 5, 2011 at 6:12 pm
    Hi Dean

    Could you share a little more information about those indexes (and your
    problem in general), such as:
    * Is there one index, or 8M indexes?
    * How many files do those indexes contain? Do you use compound file format?
    * What is the command/API you use to delete the indexes?
    * Lucene version, IndexWriter settings etc.

    Shai
    On Tue, Jul 5, 2011 at 6:50 PM, Hiller, Dean x66079 wrote:

    We are using a sort of nosql environment and deleting 200 gig on one
    machine from the database is fast, but then we go and delete 5 gigs of
    indexes that were created and it takes forever!!!!

    Is there any option in lucene to make it so it uses LARGER files and less
    count of files so it is easier to maintain and wipe out an index much
    faster?

    Thanks,
    Dean

    This message and any attachments are intended only for the use of the
    addressee and may contain information that is privileged and confidential.
    If the reader of the message is not the intended recipient or an authorized
    representative of the intended recipient, you are hereby notified that any
    dissemination of this communication is strictly prohibited. If you have
    received this communication in error, please notify us immediately by e-mail
    and delete the message and any attachments from your system.
  • Toke Eskildsen at Jul 6, 2011 at 8:11 am

    On Tue, 2011-07-05 at 17:50 +0200, Hiller, Dean x66079 wrote:
    We are using a sort of nosql environment and deleting 200 gig on one machine from the database is fast, but then we go and delete 5 gigs of indexes that were created and it takes forever!!!!
    8 million indexes is at a minimum 16 (24?) million files. If you are
    using a conventional harddisk for that, then yes, it takes forever.
    SSD is the answer to that problem, but then again, SSD is the answer to
    most IO-performance problems.

    Just a quick sanity check: I hope you are not storing the individual
    index folders under the same root folder? If you have
    indexes/index0000001/
    indexes/index0000002/
    indexes/index0000003/
    ...
    indexes/index8000000/
    in the same folder, you are asking for trouble since most file systems
    don't perform well with folders with millions of entries. If that is the
    case, split them in sub folders for every X order of magnitude, such as
    indexes/000/000/index0000001/
    indexes/000/000/index0000002/
    indexes/000/000/index0000003/
    ...
    indexes/000/500/index0500001/
    ...
    indexes/008/000/index8000000/


    Having that many tiny indexes sets off an alarm bell for me. That's
    quite a special use of Lucene you've got going.
    Is there any option in lucene to make it so it uses LARGER files and less count of files so it is easier to maintain and wipe out an index much faster?
    Use compound files, optimize to single segment. If I understand
    correctly, your indexes are tiny, so this should not give any noticeable
    performance hit.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupjava-user @
categorieslucene
postedJul 5, '11 at 3:51p
activeJul 6, '11 at 8:11a
posts3
users3
websitelucene.apache.org

People

Translate

site design / logo © 2022 Grokbase