FAQ
I have 2 indexes and I would like to move index for a few 'selected' and
'specified' terms from one of the indexes to the other.
Would some one have an idea on how to do it?
Actually, I am looking at splitting my index on keywords (terms) and would
like a single index be distributed over 2 smaller indexes after it has been
created.
How can I do it?

--
--
The facts expressed here belong to everybody, the opinions to me.
The distinction is yours to draw............

Search Discussions

  • Otis Gospodnetic at Jun 20, 2008 at 2:36 am
    Hi,

    I don't think there are tools for taking a single index and sharding it. So you'll have to create a new index and remove what you ened to remove from the old big index. I could be wrong :)


    Otis
    --
    Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch

    ----- Original Message ----
    From: Anshum <anshumg@gmail.com>
    To: java-user@lucene.apache.org
    Sent: Wednesday, June 18, 2008 9:12:57 AM
    Subject: Copying a part of index and index structure

    I have 2 indexes and I would like to move index for a few 'selected' and
    'specified' terms from one of the indexes to the other.
    Would some one have an idea on how to do it?
    Actually, I am looking at splitting my index on keywords (terms) and would
    like a single index be distributed over 2 smaller indexes after it has been
    created.
    How can I do it?

    --
    --
    The facts expressed here belong to everybody, the opinions to me.
    The distinction is yours to draw............

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Anshum at Jun 20, 2008 at 4:52 am
    Hey Otis,

    I guess lucene API would only help me remove documents from an Index and not
    'terms'. I need to remove terms from the index for all documents. any clue
    as to how to get it done? I'm currently analyzing the internal index
    structure. really need to get it done and if it works out I guess we'd be
    closer to having a kinda distributed lucene index.

    --
    Anshum
    On Fri, Jun 20, 2008 at 8:04 AM, Otis Gospodnetic wrote:

    Hi,

    I don't think there are tools for taking a single index and sharding it.
    So you'll have to create a new index and remove what you ened to remove
    from the old big index. I could be wrong :)


    Otis
    --
    Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch

    ----- Original Message ----
    From: Anshum <anshumg@gmail.com>
    To: java-user@lucene.apache.org
    Sent: Wednesday, June 18, 2008 9:12:57 AM
    Subject: Copying a part of index and index structure

    I have 2 indexes and I would like to move index for a few 'selected' and
    'specified' terms from one of the indexes to the other.
    Would some one have an idea on how to do it?
    Actually, I am looking at splitting my index on keywords (terms) and would
    like a single index be distributed over 2 smaller indexes after it has been
    created.
    How can I do it?

    --
    --
    The facts expressed here belong to everybody, the opinions to me.
    The distinction is yours to draw............

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

    --
    --
    The facts expressed here belong to everybody, the opinions to me.
    The distinction is yours to draw............
  • Otis Gospodnetic at Jun 20, 2008 at 5:56 am
    Hi,

    Not doable with Lucene as far as I know. I'm not even certain you would want to split by term. What would that do TF IDF in your distributed search? What's wrong with splitting t the doc level? There are about half a dozen distributed (Lucene) search solutions floating around, why not reuse them?


    Otis
    --
    Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch

    ----- Original Message ----
    From: Anshum <anshumg@gmail.com>
    To: java-user@lucene.apache.org
    Sent: Friday, June 20, 2008 12:52:03 AM
    Subject: Re: Copying a part of index and index structure

    Hey Otis,

    I guess lucene API would only help me remove documents from an Index and not
    'terms'. I need to remove terms from the index for all documents. any clue
    as to how to get it done? I'm currently analyzing the internal index
    structure. really need to get it done and if it works out I guess we'd be
    closer to having a kinda distributed lucene index.

    --
    Anshum

    On Fri, Jun 20, 2008 at 8:04 AM, Otis Gospodnetic <
    otis_gospodnetic@yahoo.com> wrote:
    Hi,

    I don't think there are tools for taking a single index and sharding it.
    So you'll have to create a new index and remove what you ened to remove
    from the old big index. I could be wrong :)


    Otis
    --
    Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch

    ----- Original Message ----
    From: Anshum
    To: java-user@lucene.apache.org
    Sent: Wednesday, June 18, 2008 9:12:57 AM
    Subject: Copying a part of index and index structure

    I have 2 indexes and I would like to move index for a few 'selected' and
    'specified' terms from one of the indexes to the other.
    Would some one have an idea on how to do it?
    Actually, I am looking at splitting my index on keywords (terms) and would
    like a single index be distributed over 2 smaller indexes after it has been
    created.
    How can I do it?

    --
    --
    The facts expressed here belong to everybody, the opinions to me.
    The distinction is yours to draw............

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

    --
    --
    The facts expressed here belong to everybody, the opinions to me.
    The distinction is yours to draw............

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Andrzej Bialecki at Jun 20, 2008 at 7:05 am

    Otis Gospodnetic wrote:
    Hi,

    Not doable with Lucene as far as I know. I'm not even certain you
    would want to split by term. What would that do TF IDF in your
    distributed search? What's wrong with splitting t the doc level?
    There are about half a dozen distributed (Lucene) search solutions
    floating around, why not reuse them?
    Distributed search that uses the splitting by term is an alternative to
    splitting by doc known in literature. However, a few recent research
    papers that I'm familiar with indicate that this is eventually a more
    complex and less efficient option than splitting by doc.

    --
    Best regards,
    Andrzej Bialecki <><
    ___. ___ ___ ___ _ _ __________________________________
    [__ || __|__/|__||\/| Information Retrieval, Semantic Web
    ___|||__|| \| || | Embedded Unix, System Integration
    http://www.sigram.com Contact: info at sigram dot com


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Anshum at Jun 20, 2008 at 7:28 am
    Hey Andrzej,
    Could you tell me as to what research suggests this and why is it this way?
    My calculation says the average load on each server would go down as I would
    know what server to query for an index term as opposed to querying all
    servers for terms.
    I'm looking for a solution wherein I could break up the index based any
    criteria and know what index to query for any input (and not query indexes
    that would lead to zero results).

    --
    Anshum

    On Fri, Jun 20, 2008 at 12:33 PM, Andrzej Bialecki wrote:

    Otis Gospodnetic wrote:
    Hi,

    Not doable with Lucene as far as I know. I'm not even certain you
    would want to split by term. What would that do TF IDF in your
    distributed search? What's wrong with splitting t the doc level?
    There are about half a dozen distributed (Lucene) search solutions
    floating around, why not reuse them?
    Distributed search that uses the splitting by term is an alternative to
    splitting by doc known in literature. However, a few recent research papers
    that I'm familiar with indicate that this is eventually a more complex and
    less efficient option than splitting by doc.

    --
    Best regards,
    Andrzej Bialecki <><
    ___. ___ ___ ___ _ _ __________________________________
    [__ || __|__/|__||\/| Information Retrieval, Semantic Web
    ___|||__|| \| || | Embedded Unix, System Integration
    http://www.sigram.com Contact: info at sigram dot com



    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

    --
    --
    The facts expressed here belong to everybody, the opinions to me.
    The distinction is yours to draw............
  • Eric Bowman at Jun 20, 2008 at 12:27 pm

    Anshum wrote:
    Hey Andrzej,
    Could you tell me as to what research suggests this and why is it this way?
    My calculation says the average load on each server would go down as I would
    know what server to query for an index term as opposed to querying all
    servers for terms.
    I'm looking for a solution wherein I could break up the index based any
    criteria and know what index to query for any input (and not query indexes
    that would lead to zero results).
    It is perhaps heresy on this mailing list, but GridGain makes this kind
    of thing really easy. Obviously you could roll your own with Hadoop as
    well.

    In this case you would simply have multiple indexes, each deployed to a
    different grid node, and a load balancing SPI that sent requests to the
    right grid node or nodes.

    cheers,
    Eric

    --
    Eric Bowman
    Boboco Ltd
    ebowman@boboco.ie
    http://www.boboco.ie/ebowman/pubkey.pgp
    +35318394189/+353872801532


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Andrzej Bialecki at Jun 20, 2008 at 6:30 pm

    Anshum wrote:
    Hey Andrzej,
    Could you tell me as to what research suggests this and why is it this way?
    My calculation says the average load on each server would go down as I would
    know what server to query for an index term as opposed to querying all
    servers for terms.
    I'm looking for a solution wherein I could break up the index based any
    criteria and know what index to query for any input (and not query indexes
    that would lead to zero results).
    * Ricardo Baeza-Yates, Carlos Castillo, Flavio Junqueira, Vassilis
    Plachouras, Fabrizio Silvestri, 2007: Challenges on Distributed Web
    Retrieval: "The disadvantage of term partitioning is having to build
    initially the entire global index. This does not scale well, and it is
    not useful in actual large scale Web search engines. There are, however,
    some advantages of this approach in the query processing phase. Webber
    et al. show that term partitioning results in lower utilization of
    resources [49]. More specifically, it significantly reduces the number of
    disk accesses and the volume of data exchanged. Document partitioning
    however is still better in terms of throughput, because of an uneven
    distribution of work load in term partitioning."

    * Claudine Badue, Ricardo Baeza-Yates, 2001: Distributed Query
    Processing Using Partitioned Inverted Files (note that their conclusion
    that global partitioning is more efficient than local partitioning is
    based on a crucial assumption of being able to distribute the load
    efficiently. Other papers indicate that this is a very complex issue).

    * Claudine Badue, Ramurti Barbosa, Paulo Golgher: Distributed Processing
    of Conjunctive Queries. This paper evaluates the bottlenecks in an
    engine with local index partitioning.

    * Justin Zobel, Alistair Moffat, 2006: Inverted Files for Text Search
    Engines

    * Claudio Lucchese, Salvatore Orlando, Raffaele Perego, Fabrizio
    Silvestri, 2006: Mining Query Logs to Optimize Index Partitioning in
    Parallel Web Search Engines

    * Ronny Lempel, Shlomo Moran, 2002: Optimizing Result Prefetching in Web
    Search Engines with Segmented Indices

    ... and quite a few other papers that I don't remember now ... please do
    a search for "distributed IR" on ACM or Citeseer.

    --
    Best regards,
    Andrzej Bialecki <><
    ___. ___ ___ ___ _ _ __________________________________
    [__ || __|__/|__||\/| Information Retrieval, Semantic Web
    ___|||__|| \| || | Embedded Unix, System Integration
    http://www.sigram.com Contact: info at sigram dot com


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Anshum at Jun 20, 2008 at 7:08 am
    Hey Otis,

    Could you suggest a few good distributed (lucene) search solutions? (Open
    Source)
    Yes, I do want to split by terms as the math tells a story. :)
    TF IDF would be handled separately. I'd just use a different cluster of
    machines to store the index instead of having the search run on the same
    machine that stores the index.

    --
    Anshum
    On Fri, Jun 20, 2008 at 11:26 AM, Otis Gospodnetic wrote:

    Hi,

    Not doable with Lucene as far as I know. I'm not even certain you would
    want to split by term. What would that do TF IDF in your distributed
    search? What's wrong with splitting t the doc level? There are about half
    a dozen distributed (Lucene) search solutions floating around, why not reuse
    them?


    Otis
    --
    Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch

    ----- Original Message ----
    From: Anshum <anshumg@gmail.com>
    To: java-user@lucene.apache.org
    Sent: Friday, June 20, 2008 12:52:03 AM
    Subject: Re: Copying a part of index and index structure

    Hey Otis,

    I guess lucene API would only help me remove documents from an Index and not
    'terms'. I need to remove terms from the index for all documents. any clue
    as to how to get it done? I'm currently analyzing the internal index
    structure. really need to get it done and if it works out I guess we'd be
    closer to having a kinda distributed lucene index.

    --
    Anshum

    On Fri, Jun 20, 2008 at 8:04 AM, Otis Gospodnetic <
    otis_gospodnetic@yahoo.com> wrote:
    Hi,

    I don't think there are tools for taking a single index and sharding
    it.
    So you'll have to create a new index and remove what you ened to
    remove
    from the old big index. I could be wrong :)


    Otis
    --
    Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch

    ----- Original Message ----
    From: Anshum
    To: java-user@lucene.apache.org
    Sent: Wednesday, June 18, 2008 9:12:57 AM
    Subject: Copying a part of index and index structure

    I have 2 indexes and I would like to move index for a few 'selected'
    and
    'specified' terms from one of the indexes to the other.
    Would some one have an idea on how to do it?
    Actually, I am looking at splitting my index on keywords (terms) and would
    like a single index be distributed over 2 smaller indexes after it
    has
    been
    created.
    How can I do it?

    --
    --
    The facts expressed here belong to everybody, the opinions to me.
    The distinction is yours to draw............

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

    --
    --
    The facts expressed here belong to everybody, the opinions to me.
    The distinction is yours to draw............

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

    --
    --
    The facts expressed here belong to everybody, the opinions to me.
    The distinction is yours to draw............
  • j.L at Jun 20, 2008 at 8:23 am
    i think u can use solr to solve it.

    u just merge ur search result from 2 solr Instance(2 indexes).

    it is very simple and u can distribute it.
    On Wed, Jun 18, 2008 at 9:12 PM, Anshum wrote:

    I have 2 indexes and I would like to move index for a few 'selected' and
    'specified' terms from one of the indexes to the other.
    Would some one have an idea on how to do it?
    Actually, I am looking at splitting my index on keywords (terms) and would
    like a single index be distributed over 2 smaller indexes after it has been
    created.
    How can I do it?

    --
    --
    The facts expressed here belong to everybody, the opinions to me.
    The distinction is yours to draw............


    --
    regards
    j.L

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupjava-user @
categorieslucene
postedJun 18, '08 at 1:13p
activeJun 20, '08 at 6:30p
posts10
users5
websitelucene.apache.org

People

Translate

site design / logo © 2022 Grokbase