FAQ
Hi,
I think I've read that there is a limit for de index, may be 2Gb for fat
machines. If this is right I ask you for good resources (webs or books)
about programming search engines to know about the techniques used by big
search engines to search among such huge data.

Thanks
--
View this message in context: http://www.nabble.com/Max-size-of-index--How-do-search-engines-avoid-this--tp23594241p23594241.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Search Discussions

  • Mark harwood at May 18, 2009 at 10:18 am
    techniques used by big search engines to search among such huge data.
    Two keywords here - partitioning and replication.

    Partitioning is breaking the content down into shards and assigning shards to servers. These can then be queried in parallel to make search response times independent of the data volumes being searched. I seem to remember a quote that a single Google search currently gets spread across ~1,000 servers in parallel.

    Replication is about handling user volumes - take each shard and assign it to many replica servers then load balance requests across them to spread the load. This also gives you redundancy and helps in recovery from machine failure.


    You may want to take a look at Solr to help you with this.

    Cheers
    Mark





    ----- Original Message ----
    From: raistlink <elamas@gmail.com>
    To: java-user@lucene.apache.org
    Sent: Monday, 18 May, 2009 10:42:02
    Subject: Max size of index? How do search engines avoid this?


    Hi,
    I think I've read that there is a limit for de index, may be 2Gb for fat
    machines. If this is right I ask you for good resources (webs or books)
    about programming search engines to know about the techniques used by big
    search engines to search among such huge data.

    Thanks
    --
    View this message in context: http://www.nabble.com/Max-size-of-index--How-do-search-engines-avoid-this--tp23594241p23594241.html
    Sent from the Lucene - Java Users mailing list archive at Nabble.com.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org




    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Danil ŢORIN at May 18, 2009 at 10:22 am
    2GB size is a limitation of OS and/or file systems, not of the index
    as supported by Lucene.
    There is some other kind of limitation in Lucene: number of documents
    < 2147483648
    However the size of the lucene index may reach tens and hundreds of GB
    way before that.

    If you are thinking about BIG indexes, you should forget windows+fat32.

    On linux with i've seen big indexes, like 80M of relatively small
    documents, about 50Gb on disk
    with reasonable performance (on pretty cheap machine)

    If you need more documents, better performance, etc, you need to
    partition your index into
    several smaller indexes running on separate hosts, call them in
    parallel and then merge results in a single resultset.

    This way of operation is not "built-in" into Lucene, but you can
    relativelly easy build a customized wrapper to do that.

    AFAIK something simmilar powers google: each box handles about 10M
    docs, there are thousands of boxes which do searches in parallel.
    On Mon, May 18, 2009 at 12:42, raistlink wrote:

    Hi,
    I think I've read that there is a limit for de index, may be 2Gb for fat
    machines. If this is right I ask you for good resources (webs or books)
    about programming search engines to know about the techniques used by big
    search engines to search among such huge data.

    Thanks
    --
    View this message in context: http://www.nabble.com/Max-size-of-index--How-do-search-engines-avoid-this--tp23594241p23594241.html
    Sent from the Lucene - Java Users mailing list archive at Nabble.com.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupjava-user @
categorieslucene
postedMay 18, '09 at 9:42a
activeMay 18, '09 at 10:22a
posts3
users3
websitelucene.apache.org

People

Translate

site design / logo © 2022 Grokbase