FAQ
hi,
I have a ValueSourceQuery that makes use of a stored field. The field contains roughly 27.27 million untokenized terms.
The average length of each term is 8 digits.
The first search always takes around 5 minutes, and it is due to the createValue function in the FieldCacheImpl.
The search is executed on a RAID5 disk array of 15k rpm.


any hints as to make the fieldcache createvalue faster ? I have tried a bigger cache size for BufferedIndexReader (8kb or more) ,
but the time it took for createValue to execute is still in the realm of 4, 5 minutes.


thanks

_________________________________________________________________
5 GB 超大容量 、創新便捷、安全防護垃圾郵件和病毒 — 立即升級 Windows Live Hotmail
http://mail.live.com

Search Discussions

  • Anshum at May 20, 2008 at 5:33 am
    Hey Alex,
    I guess you haven't tried warming up the engine before putting it to use.
    Though one of the simpler implementation, you could try warming up the
    engine first by sending a few searches and then put it to use (put it into
    the serving machine loop). You could also do a little bit of preprocessing
    while initializing the daemon rather than waiting for the search to hit it.
    I hope I understood the problem correctly here, else would have to look into
    it.

    --
    Anshum

    2008/5/20 Alex <chy1013m1@hotmail.com>:
    hi,
    I have a ValueSourceQuery that makes use of a stored field. The field
    contains roughly 27.27 million untokenized terms.
    The average length of each term is 8 digits.
    The first search always takes around 5 minutes, and it is due to the
    createValue function in the FieldCacheImpl.
    The search is executed on a RAID5 disk array of 15k rpm.


    any hints as to make the fieldcache createvalue faster ? I have tried a
    bigger cache size for BufferedIndexReader (8kb or more) ,
    but the time it took for createValue to execute is still in the realm of 4,
    5 minutes.


    thanks

    _________________________________________________________________
    5 GB 超大容量 、創新便捷、安全防護垃圾郵件和病毒 — 立即升級 Windows Live Hotmail
    http://mail.live.com


    --
    --
    The facts expressed here belong to everybody, the opinions to me.
    The distinction is yours to draw............
  • Alex at May 20, 2008 at 6:54 am
    Hi,
    thanks for the reply. Yes, after the first slow search, subsequent searches have good performance.

    I guess the issue is why exactally, is createValue taking so long, or should it take so long (4 ~ 5 minutes ).
    Given roughly 27million terms, each of roughly 8 characters long and few other bytes for the TermInfo record,
    a modern disk can easily read over the portion of the index (the .frq portion ) in a few seconds. Also,
    when I use tools like dstat, I see bunch of 1kb reads initiated while running createValue.



    Date: Tue, 20 May 2008 11:02:38 +0530
    From: anshumg@gmail.com
    To: java-user@lucene.apache.org
    Subject: Re: slow FieldCacheImpl.createValue

    Hey Alex,
    I guess you haven't tried warming up the engine before putting it to use.
    Though one of the simpler implementation, you could try warming up the
    engine first by sending a few searches and then put it to use (put it into
    the serving machine loop). You could also do a little bit of preprocessing
    while initializing the daemon rather than waiting for the search to hit it.
    I hope I understood the problem correctly here, else would have to look into
    it.

    --
    Anshum

    _________________________________________________________________
    用部落格分享照片、影音、趣味小工具和最愛清單,盡情秀出你自己 — Windows Live Spaces
    http://spaces.live.com/
  • Jason Rutherglen at May 20, 2008 at 2:38 pm
    https://issues.apache.org/jira/browse/LUCENE-1278 solves this problem
    On Tue, May 20, 2008 at 1:32 AM, Anshum wrote:

    Hey Alex,
    I guess you haven't tried warming up the engine before putting it to use.
    Though one of the simpler implementation, you could try warming up the
    engine first by sending a few searches and then put it to use (put it into
    the serving machine loop). You could also do a little bit of preprocessing
    while initializing the daemon rather than waiting for the search to hit it.
    I hope I understood the problem correctly here, else would have to look
    into
    it.

    --
    Anshum

    2008/5/20 Alex <chy1013m1@hotmail.com>:
    hi,
    I have a ValueSourceQuery that makes use of a stored field. The field
    contains roughly 27.27 million untokenized terms.
    The average length of each term is 8 digits.
    The first search always takes around 5 minutes, and it is due to the
    createValue function in the FieldCacheImpl.
    The search is executed on a RAID5 disk array of 15k rpm.


    any hints as to make the fieldcache createvalue faster ? I have tried a
    bigger cache size for BufferedIndexReader (8kb or more) ,
    but the time it took for createValue to execute is still in the realm of 4,
    5 minutes.


    thanks

    _________________________________________________________________
    5 GB 超大容量 、創新便捷、安全防護垃圾郵件和病毒 — 立即升級 Windows Live Hotmail
    http://mail.live.com


    --
    --
    The facts expressed here belong to everybody, the opinions to me.
    The distinction is yours to draw............
  • Chris Lu at May 20, 2008 at 6:12 pm
    This should have a great boost to performance. Any plan to merge it into the
    main brance instead of patch?

    --
    Chris Lu
    -------------------------
    Instant Scalable Full-Text Search On Any Database/Application
    site: http://www.dbsight.net
    demo: http://search.dbsight.com
    Lucene Database Search in 3 minutes:
    http://wiki.dbsight.com/index.php?title=Create_Lucene_Database_Search_in_3_minutes
    DBSight customer, a shopping comparison site, (anonymous per request) got
    2.6 Million Euro funding!
    On Tue, May 20, 2008 at 7:37 AM, Jason Rutherglen wrote:

    https://issues.apache.org/jira/browse/LUCENE-1278 solves this problem
    On Tue, May 20, 2008 at 1:32 AM, Anshum wrote:

    Hey Alex,
    I guess you haven't tried warming up the engine before putting it to use.
    Though one of the simpler implementation, you could try warming up the
    engine first by sending a few searches and then put it to use (put it into
    the serving machine loop). You could also do a little bit of
    preprocessing
    while initializing the daemon rather than waiting for the search to hit it.
    I hope I understood the problem correctly here, else would have to look
    into
    it.

    --
    Anshum

    2008/5/20 Alex <chy1013m1@hotmail.com>:
    hi,
    I have a ValueSourceQuery that makes use of a stored field. The field
    contains roughly 27.27 million untokenized terms.
    The average length of each term is 8 digits.
    The first search always takes around 5 minutes, and it is due to the
    createValue function in the FieldCacheImpl.
    The search is executed on a RAID5 disk array of 15k rpm.


    any hints as to make the fieldcache createvalue faster ? I have tried a
    bigger cache size for BufferedIndexReader (8kb or more) ,
    but the time it took for createValue to execute is still in the realm
    of
    4,
    5 minutes.


    thanks

    _________________________________________________________________
    5 GB 超大容量 、創新便捷、安全防護垃圾郵件和病毒 — 立即升級 Windows Live Hotmail
    http://mail.live.com


    --
    --
    The facts expressed here belong to everybody, the opinions to me.
    The distinction is yours to draw............

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupjava-user @
categorieslucene
postedMay 19, '08 at 6:57p
activeMay 20, '08 at 6:12p
posts5
users4
websitelucene.apache.org

People

Translate

site design / logo © 2022 Grokbase