FAQ
Hello all,

I am trying to use Lucene for doing incremental indexing of the order
of million of records daily using a single machine (P4 2.4Ghz 1 GB
RAM). I do get messages updated every few minutes based on which I
need to update the index.

I am using a StandardAnalyzer and writing documents using IndexWriter
(FSDirectory) using the following structure:

Document document = new Document();
document.add(Field.Keyword(INDEX_MESG_ID, LongField.longToString(mesg_id)));
document.add(Field.UnStored(INDEX_MESG_TITLE, mesgTitle));
document.add(Field.UnStored(INDEX_MESG_DESCRIPTION,mesgDescription));
document.add(Field.Keyword(INDEX_MESG_DATE,mesgDate));

mesgTitle is string of the order of 20-50 words
mesgDescription is a string of the order of 500-1000 words.

I profiled my application and the index writer is able to write 16
messages per second, which is too low for the order that I want to
achieve. I have tried various options:
- increasing mergeFactor and minMergeDocs from 10 to 100 to 1000 didn't help.
- Changed indexwriter from FSDirectory to RAMDirectory and later
synchronizing changes to FSDirectory. Even this didn't help.

It will be great if someone can give me pointers for increasing the
efficiency of index writer process. Can someone give me practical
examples of how much maximum efficiency can be achieved for similar
process on a single machine?

Thanks

Regards
Sunil

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Search Discussions

  • Otis Gospodnetic at Mar 23, 2005 at 8:50 pm
    That sounds a little low. I'll assume that you profiled your whole
    application, which leaves room for something else slowing things down,
    and I'll suggest you write a standalone application whose only task is
    to index documents as quickly as possible. Hm, this reminds me that
    I've written stuff like that twice, once for O'Reilly's OnJava site:
    http://www.onjava.com/lpt/a/3273 and once for Lucene in Action. Both
    contain free code, which should help you figure out if it's really
    Lucene or your other code.

    If you write something that's standalone I could run it on my machine
    and report the results.

    Oh, and check this: http://lucene.apache.org/java/docs/benchmarks.html

    Otis


    --- sunil goyal wrote:
    Hello all,

    I am trying to use Lucene for doing incremental indexing of the order
    of million of records daily using a single machine (P4 2.4Ghz 1 GB
    RAM). I do get messages updated every few minutes based on which I
    need to update the index.

    I am using a StandardAnalyzer and writing documents using IndexWriter
    (FSDirectory) using the following structure:

    Document document = new Document();
    document.add(Field.Keyword(INDEX_MESG_ID,
    LongField.longToString(mesg_id)));
    document.add(Field.UnStored(INDEX_MESG_TITLE, mesgTitle));
    document.add(Field.UnStored(INDEX_MESG_DESCRIPTION,mesgDescription));
    document.add(Field.Keyword(INDEX_MESG_DATE,mesgDate));

    mesgTitle is string of the order of 20-50 words
    mesgDescription is a string of the order of 500-1000 words.

    I profiled my application and the index writer is able to write 16
    messages per second, which is too low for the order that I want to
    achieve. I have tried various options:
    - increasing mergeFactor and minMergeDocs from 10 to 100 to 1000
    didn't help.
    - Changed indexwriter from FSDirectory to RAMDirectory and later
    synchronizing changes to FSDirectory. Even this didn't help.

    It will be great if someone can give me pointers for increasing the
    efficiency of index writer process. Can someone give me practical
    examples of how much maximum efficiency can be achieved for similar
    process on a single machine?

    Thanks

    Regards
    Sunil

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupjava-user @
categorieslucene
postedMar 23, '05 at 7:50a
activeMar 23, '05 at 8:50p
posts2
users2
websitelucene.apache.org

2 users in discussion

Sunil goyal: 1 post Otis Gospodnetic: 1 post

People

Translate

site design / logo © 2022 Grokbase