FAQ
As i know, the time effciency of creating index is non-linearity with the size of documents. For example, if the size of indexes is 1G, the time cost is 2 hours, If the size of indexes is 10G, the time cost may be 30 hours. Who can tell me what is the reason? Any tips will be appreciated.


___________________________________________________________
好玩贺卡等你发,邮箱贺卡全新上线!
http://card.mail.cn.yahoo.com/

Search Discussions

  • Fang_li at Feb 18, 2009 at 8:09 am
    Did you try?
    The cost of index merging grows when indexes are getting bigger.
    Try to limit the max document size in a segment by setting setMaxMergeDocs in IndexWriter.

    -----Original Message-----
    From: 治江 王
    Sent: Monday, February 16, 2009 1:49 PM
    To: java-user@lucene.apache.org
    Subject: the efficiency of creating indexes

    As i know, the time effciency of creating index is non-linearity with the size of documents. For example, if the size of indexes is 1G, the time cost is 2 hours, If the size of indexes is 10G, the time cost may be 30 hours. Who can tell me what is the reason? Any tips will be appreciated.


    ___________________________________________________________
    好玩贺卡等你发,邮箱贺卡全新上线!
    http://card.mail.cn.yahoo.com/

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Michael McCandless at Feb 18, 2009 at 11:25 am
    If not for merging, I believe indexing is simply linear.

    Merging adds only a logarithmic (in total index size) cost.

    Using as large an IndexWriter RAM buffer as you can will minimize the
    amount of merging. (Also increasing mergeFactor, or decreasing
    maxMergeMB/Docs, but these will impact search performance as well).

    Mike

    emc.com wrote:
    Did you try?
    The cost of index merging grows when indexes are getting bigger.
    Try to limit the max document size in a segment by setting
    setMaxMergeDocs in IndexWriter.

    -----Original Message-----
    From: 治江 王
    Sent: Monday, February 16, 2009 1:49 PM
    To: java-user@lucene.apache.org
    Subject: the efficiency of creating indexes

    As i know, the time effciency of creating index is non-linearity
    with the size of documents. For example, if the size of indexes is
    1G, the time cost is 2 hours, If the size of indexes is 10G, the
    time cost may be 30 hours. Who can tell me what is the reason? Any
    tips will be appreciated.


    ___________________________________________________________
    好玩贺卡等你发,邮箱贺卡全新上线!
    http://card.mail.cn.yahoo.com/

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupjava-user @
categorieslucene
postedFeb 16, '09 at 5:49a
activeFeb 18, '09 at 11:25a
posts3
users3
websitelucene.apache.org

People

Translate

site design / logo © 2022 Grokbase