FAQ
Hi,



I am a newbie to Lucene. I have a question for making a query that associate
2 index files:



- One index has the content index for a list of documents and a key to the
document. That means the Lucene document of this index contains 2 fields:

the 'content' and the 'key'.

- another index has the some data indexed associated with the 'key' in the
previous index. The Lucene document of this index contains several fields:

the 'who' that contains some data and the 'key' that _points_ to the
document in the first index.



Sample data:

Index_1: [key] [content]

Abc "blah blah 123 ..."

Xyz "123 321 a nice day ..."



Index_2: [who] [accessed] [key]

David 1/1/2007 Abc

Someone 1/2/2005 Abc

Guess 12/1/2000 Xyz

Harry 1/1/2008 Abc

Sandra 1/1/2003 Xyz



As shown, the [key] field in Index_2 has repeated value that _points_ to the
[key] values in Index_1. How do I make a query for the following:



Find out all documents in Index_2:

- [who] is in range of 'David' to 'Guess' and

- [accessed] in range '1/1/1900' to '1/1/2010' and

- [key] associated [content] in Index_1 that contains the term 'blah'



I know this is more SQL like query. Is Lucene capable of doing this type of
query that needs associations among index files?



Thanks in advance.



- m

Search Discussions

  • Erick Erickson at May 6, 2008 at 4:36 pm
    You don't. You really have to roll your own solution here, there's
    no "inter-index" awareness that I know of in Lucene.

    Typically, people either do a half-half solution (that is, put the
    text search in Lucene and leave the DB parts in the DB) or
    de-normalize the data in a Lucene index so you don't have
    to even try to do things cross-index.

    And then there's Marcelo Ochoa (I may have the spelling wrong) who's
    put together some way to embed Lucene in an Oracle database, but
    that's magic to me.

    Why do you want to arrange your index this way in the first place?
    Perhaps there's a more specific answer waiting out there...

    Best
    Erick
    On Tue, May 6, 2008 at 12:14 PM, Michael Siu wrote:

    Hi,



    I am a newbie to Lucene. I have a question for making a query that
    associate
    2 index files:



    - One index has the content index for a list of documents and a key to the
    document. That means the Lucene document of this index contains 2 fields:

    the 'content' and the 'key'.

    - another index has the some data indexed associated with the 'key' in the
    previous index. The Lucene document of this index contains several fields:

    the 'who' that contains some data and the 'key' that _points_ to the
    document in the first index.



    Sample data:

    Index_1: [key] [content]

    Abc "blah blah 123 ..."

    Xyz "123 321 a nice day ..."



    Index_2: [who] [accessed] [key]

    David 1/1/2007 Abc

    Someone 1/2/2005 Abc

    Guess 12/1/2000 Xyz

    Harry 1/1/2008 Abc

    Sandra 1/1/2003 Xyz



    As shown, the [key] field in Index_2 has repeated value that _points_ to
    the
    [key] values in Index_1. How do I make a query for the following:



    Find out all documents in Index_2:

    - [who] is in range of 'David' to 'Guess' and

    - [accessed] in range '1/1/1900' to '1/1/2010' and

    - [key] associated [content] in Index_1 that contains the term 'blah'



    I know this is more SQL like query. Is Lucene capable of doing this type
    of
    query that needs associations among index files?



    Thanks in advance.



    - m






  • Michael Siu at May 6, 2008 at 7:03 pm
    My problem is: the [content] value can be huge. Duplicating it in more than
    one index document waste disk space (and search time?). In additions, when
    new documents are added to the second index, it will be faster to just index
    the linked [content] once (in first index file) and any subsequent reference
    to the same [content] will not need to re-index.

    In fact, I do not really need to physically separate the indices into 2
    files because Lucene supports Heterogeneous documents in an index file. I
    have no idea of how that work in a search. Does anyone know?

    Thanks in advance.



    -----Original Message-----
    From: Erick Erickson
    Sent: Tuesday, May 06, 2008 9:36 AM
    To: java-user@lucene.apache.org
    Subject: Re: How to make a query that associates 2 index files

    You don't. You really have to roll your own solution here, there's
    no "inter-index" awareness that I know of in Lucene.

    Typically, people either do a half-half solution (that is, put the
    text search in Lucene and leave the DB parts in the DB) or
    de-normalize the data in a Lucene index so you don't have
    to even try to do things cross-index.

    And then there's Marcelo Ochoa (I may have the spelling wrong) who's
    put together some way to embed Lucene in an Oracle database, but
    that's magic to me.

    Why do you want to arrange your index this way in the first place?
    Perhaps there's a more specific answer waiting out there...

    Best
    Erick
    On Tue, May 6, 2008 at 12:14 PM, Michael Siu wrote:

    Hi,



    I am a newbie to Lucene. I have a question for making a query that
    associate
    2 index files:



    - One index has the content index for a list of documents and a key to the
    document. That means the Lucene document of this index contains 2 fields:

    the 'content' and the 'key'.

    - another index has the some data indexed associated with the 'key' in the
    previous index. The Lucene document of this index contains several fields:

    the 'who' that contains some data and the 'key' that _points_ to the
    document in the first index.



    Sample data:

    Index_1: [key] [content]

    Abc "blah blah 123 ..."

    Xyz "123 321 a nice day ..."



    Index_2: [who] [accessed] [key]

    David 1/1/2007 Abc

    Someone 1/2/2005 Abc

    Guess 12/1/2000 Xyz

    Harry 1/1/2008 Abc

    Sandra 1/1/2003 Xyz



    As shown, the [key] field in Index_2 has repeated value that _points_ to
    the
    [key] values in Index_1. How do I make a query for the following:



    Find out all documents in Index_2:

    - [who] is in range of 'David' to 'Guess' and

    - [accessed] in range '1/1/1900' to '1/1/2010' and

    - [key] associated [content] in Index_1 that contains the term 'blah'



    I know this is more SQL like query. Is Lucene capable of doing this type
    of
    query that needs associations among index files?



    Thanks in advance.



    - m







    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Erick Erickson at May 6, 2008 at 7:47 pm
    Sure, just include different fields in different docs in your index.
    Then, when you search since each term is on a field, docs without
    that field are excluded from the search.

    But this is really not very different in terms of a solution than
    your earlier one. You still have the issue of searching the index
    once to get the keys, then using those keys as part of another
    search.

    Are you saying that there is a many-to-one mapping to your content
    index? And how much data are we talking about here? 10M, 100G?
    this makes an enormous difference in your options.

    I hope you're aware that Lucene doc IDs can change as the index is
    modified, so I assume your "keys" are something you created and
    NOT the lucene doc IDs.

    Best
    Erick
    On Tue, May 6, 2008 at 3:03 PM, Michael Siu wrote:

    My problem is: the [content] value can be huge. Duplicating it in more
    than
    one index document waste disk space (and search time?). In additions, when
    new documents are added to the second index, it will be faster to just
    index
    the linked [content] once (in first index file) and any subsequent
    reference
    to the same [content] will not need to re-index.

    In fact, I do not really need to physically separate the indices into 2
    files because Lucene supports Heterogeneous documents in an index file. I
    have no idea of how that work in a search. Does anyone know?

    Thanks in advance.



    -----Original Message-----
    From: Erick Erickson
    Sent: Tuesday, May 06, 2008 9:36 AM
    To: java-user@lucene.apache.org
    Subject: Re: How to make a query that associates 2 index files

    You don't. You really have to roll your own solution here, there's
    no "inter-index" awareness that I know of in Lucene.

    Typically, people either do a half-half solution (that is, put the
    text search in Lucene and leave the DB parts in the DB) or
    de-normalize the data in a Lucene index so you don't have
    to even try to do things cross-index.

    And then there's Marcelo Ochoa (I may have the spelling wrong) who's
    put together some way to embed Lucene in an Oracle database, but
    that's magic to me.

    Why do you want to arrange your index this way in the first place?
    Perhaps there's a more specific answer waiting out there...

    Best
    Erick
    On Tue, May 6, 2008 at 12:14 PM, Michael Siu wrote:

    Hi,



    I am a newbie to Lucene. I have a question for making a query that
    associate
    2 index files:



    - One index has the content index for a list of documents and a key to the
    document. That means the Lucene document of this index contains 2 fields:
    the 'content' and the 'key'.

    - another index has the some data indexed associated with the 'key' in the
    previous index. The Lucene document of this index contains several fields:
    the 'who' that contains some data and the 'key' that _points_ to the
    document in the first index.



    Sample data:

    Index_1: [key] [content]

    Abc "blah blah 123 ..."

    Xyz "123 321 a nice day ..."



    Index_2: [who] [accessed] [key]

    David 1/1/2007 Abc

    Someone 1/2/2005 Abc

    Guess 12/1/2000 Xyz

    Harry 1/1/2008 Abc

    Sandra 1/1/2003 Xyz



    As shown, the [key] field in Index_2 has repeated value that _points_ to
    the
    [key] values in Index_1. How do I make a query for the following:



    Find out all documents in Index_2:

    - [who] is in range of 'David' to 'Guess' and

    - [accessed] in range '1/1/1900' to '1/1/2010' and

    - [key] associated [content] in Index_1 that contains the term 'blah'



    I know this is more SQL like query. Is Lucene capable of doing this type
    of
    query that needs associations among index files?



    Thanks in advance.



    - m







    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Michael Siu at May 7, 2008 at 12:02 am
    Yes, there is many-to-one mapping to the content index. And the size of
    content data is varying say from 1K to multiple Gs. That why it is not wise
    to repeat the same content in a index document.

    Thanks for telling that the doc IDs are not constant. Yes, the keys to
    content are generated on the fly and be unique per different content.

    Thanks.
    -m

    -----Original Message-----
    From: Erick Erickson
    Sent: Tuesday, May 06, 2008 12:46 PM
    To: java-user@lucene.apache.org
    Subject: Re: How to make a query that associates 2 index files

    Sure, just include different fields in different docs in your index.
    Then, when you search since each term is on a field, docs without
    that field are excluded from the search.

    But this is really not very different in terms of a solution than
    your earlier one. You still have the issue of searching the index
    once to get the keys, then using those keys as part of another
    search.

    Are you saying that there is a many-to-one mapping to your content
    index? And how much data are we talking about here? 10M, 100G?
    this makes an enormous difference in your options.

    I hope you're aware that Lucene doc IDs can change as the index is
    modified, so I assume your "keys" are something you created and
    NOT the lucene doc IDs.

    Best
    Erick
    On Tue, May 6, 2008 at 3:03 PM, Michael Siu wrote:

    My problem is: the [content] value can be huge. Duplicating it in more
    than
    one index document waste disk space (and search time?). In additions, when
    new documents are added to the second index, it will be faster to just
    index
    the linked [content] once (in first index file) and any subsequent
    reference
    to the same [content] will not need to re-index.

    In fact, I do not really need to physically separate the indices into 2
    files because Lucene supports Heterogeneous documents in an index file. I
    have no idea of how that work in a search. Does anyone know?

    Thanks in advance.



    -----Original Message-----
    From: Erick Erickson
    Sent: Tuesday, May 06, 2008 9:36 AM
    To: java-user@lucene.apache.org
    Subject: Re: How to make a query that associates 2 index files

    You don't. You really have to roll your own solution here, there's
    no "inter-index" awareness that I know of in Lucene.

    Typically, people either do a half-half solution (that is, put the
    text search in Lucene and leave the DB parts in the DB) or
    de-normalize the data in a Lucene index so you don't have
    to even try to do things cross-index.

    And then there's Marcelo Ochoa (I may have the spelling wrong) who's
    put together some way to embed Lucene in an Oracle database, but
    that's magic to me.

    Why do you want to arrange your index this way in the first place?
    Perhaps there's a more specific answer waiting out there...

    Best
    Erick
    On Tue, May 6, 2008 at 12:14 PM, Michael Siu wrote:

    Hi,



    I am a newbie to Lucene. I have a question for making a query that
    associate
    2 index files:



    - One index has the content index for a list of documents and a key to the
    document. That means the Lucene document of this index contains 2 fields:
    the 'content' and the 'key'.

    - another index has the some data indexed associated with the 'key' in the
    previous index. The Lucene document of this index contains several fields:
    the 'who' that contains some data and the 'key' that _points_ to the
    document in the first index.



    Sample data:

    Index_1: [key] [content]

    Abc "blah blah 123 ..."

    Xyz "123 321 a nice day ..."



    Index_2: [who] [accessed] [key]

    David 1/1/2007 Abc

    Someone 1/2/2005 Abc

    Guess 12/1/2000 Xyz

    Harry 1/1/2008 Abc

    Sandra 1/1/2003 Xyz



    As shown, the [key] field in Index_2 has repeated value that _points_ to
    the
    [key] values in Index_1. How do I make a query for the following:



    Find out all documents in Index_2:

    - [who] is in range of 'David' to 'Guess' and

    - [accessed] in range '1/1/1900' to '1/1/2010' and

    - [key] associated [content] in Index_1 that contains the term 'blah'



    I know this is more SQL like query. Is Lucene capable of doing this type
    of
    query that needs associations among index files?



    Thanks in advance.



    - m







    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Chris Lu at May 6, 2008 at 4:39 pm
    No easy way unless you merge your 2 indexes into:

    Index: [who] [accessed] [key] [content]


    David 1/1/2007 Abc "blah blah 123 ..."

    Someone 1/2/2005 Abc "blah blah 123 ..."

    Guess 12/1/2000 Xyz "123 321 a nice day ..."

    Harry 1/1/2008 Abc "blah blah 123 ..."

    Sandra 1/1/2003 Xyz "123 321 a nice day ..."


    Anyway, Lucene index is more like Database Index. It's only efficient for
    particular query execution paths.
    If you have special requirements, you will have to re-structure your index
    for performance.

    --
    Chris Lu
    -------------------------
    Instant Scalable Full-Text Search On Any Database/Application
    site: http://www.dbsight.net
    demo: http://search.dbsight.com
    Lucene Database Search in 3 minutes:
    http://wiki.dbsight.com/index.php?title=Create_Lucene_Database_Search_in_3_minutes
    DBSight customer, a shopping comparison site, (anonymous per request) got
    2.6 Million Euro funding!
    On Tue, May 6, 2008 at 9:14 AM, Michael Siu wrote:

    Hi,



    I am a newbie to Lucene. I have a question for making a query that
    associate
    2 index files:



    - One index has the content index for a list of documents and a key to the
    document. That means the Lucene document of this index contains 2 fields:

    the 'content' and the 'key'.

    - another index has the some data indexed associated with the 'key' in the
    previous index. The Lucene document of this index contains several fields:

    the 'who' that contains some data and the 'key' that _points_ to the
    document in the first index.



    Sample data:

    Index_1: [key] [content]

    Abc "blah blah 123 ..."

    Xyz "123 321 a nice day ..."



    Index_2: [who] [accessed] [key]

    David 1/1/2007 Abc

    Someone 1/2/2005 Abc

    Guess 12/1/2000 Xyz

    Harry 1/1/2008 Abc

    Sandra 1/1/2003 Xyz



    As shown, the [key] field in Index_2 has repeated value that _points_ to
    the
    [key] values in Index_1. How do I make a query for the following:



    Find out all documents in Index_2:

    - [who] is in range of 'David' to 'Guess' and

    - [accessed] in range '1/1/1900' to '1/1/2010' and

    - [key] associated [content] in Index_1 that contains the term 'blah'



    I know this is more SQL like query. Is Lucene capable of doing this type
    of
    query that needs associations among index files?



    Thanks in advance.



    - m






Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupjava-user @
categorieslucene
postedMay 6, '08 at 4:14p
activeMay 7, '08 at 12:02a
posts6
users3
websitelucene.apache.org

People

Translate

site design / logo © 2022 Grokbase