FAQ
Hello all,

I am trying to use Lucene for doing incremental indexing of the order
of million of records daily using a single machine (P4 2.4Ghz 1 GB
RAM). I do get messages updated every few minutes based on which I
need to update the index.

I am using a StandardAnalyzer and writing documents using IndexWriter
(FSDirectory) using the following structure:

Document document = new Document();
document.add(Field.Keyword(INDEX_MESG_ID, LongField.longToString(mesg_id)));
document.add(Field.UnStored(INDEX_MESG_TITLE, mesgTitle));
document.add(Field.UnStored(INDEX_MESG_DESCRIPTION,mesgDescription));
document.add(Field.Keyword(INDEX_MESG_DATE,mesgDate));

mesgTitle is string of the order of 20-50 words
mesgDescription is a string of the order of 500-1000 words.

I profiled my application and the index writer is able to write 16
messages per second, which is too low for the order that I want to
achieve. I have tried various options:
- increasing mergeFactor and minMergeDocs from 10 to 100 to 1000 didn't help.
- Changed indexwriter from FSDirectory to RAMDirectory and later
synchronizing changes to FSDirectory. Even this didn't help.

It will be great if someone can give me pointers for increasing the
efficiency of index writer process. Can someone give me practical
examples of how much maximum efficiency can be achieved for similar
process on a single machine?

Thanks

Regards
Sunil

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Search Discussions

  • Sunil goyal at Mar 22, 2005 at 1:39 pm
    Hello all,

    I am trying to use Lucene for doing incremental indexing of the order
    of million of records daily using a single machine (P4 2.4Ghz 1 GB
    RAM). I do get messages updated every few minutes based on which I
    need to update the index.

    I am using a StandardAnalyzer and writing documents using IndexWriter
    (FSDirectory) using the following structure:

    Document document = new Document();
    document.add(Field.Keyword(INDEX_MESG_ID, LongField.longToString(mesg_id)));
    document.add(Field.UnStored(INDEX_MESG_TITLE, mesgTitle));
    document.add(Field.UnStored(INDEX_MESG_DESCRIPTION,mesgDescription));
    document.add(Field.Keyword(INDEX_MESG_DATE,mesgDate));

    mesgTitle is string of the order of 20-50 words
    mesgDescription is a string of the order of 500-1000 words.

    I profiled my application and the index writer is able to write 16
    messages per second, which is too low for the order that I want to
    achieve. I have tried various options:
    - increasing mergeFactor and minMergeDocs from 10 to 100 to 1000 didn't help.
    - Changed indexwriter from FSDirectory to RAMDirectory and later
    synchronizing changes to FSDirectory. Even this didn't help.

    It will be great if someone can give me pointers for increasing the
    efficiency of index writer process. Can someone give me practical
    examples of how much maximum efficiency can be achieved for similar
    process on a single machine?

    Thanks

    Regards
    Sunil

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupjava-user @
categorieslucene
postedMar 22, '05 at 1:07p
activeMar 22, '05 at 1:39p
posts2
users1
websitelucene.apache.org

1 user in discussion

Sunil goyal: 2 posts

People

Translate

site design / logo © 2022 Grokbase