Then, when you search since each term is on a field, docs without
that field are excluded from the search.
But this is really not very different in terms of a solution than
your earlier one. You still have the issue of searching the index
once to get the keys, then using those keys as part of another
search.
Are you saying that there is a many-to-one mapping to your content
index? And how much data are we talking about here? 10M, 100G?
this makes an enormous difference in your options.
I hope you're aware that Lucene doc IDs can change as the index is
modified, so I assume your "keys" are something you created and
NOT the lucene doc IDs.
Best
Erick
On Tue, May 6, 2008 at 3:03 PM, Michael Siu wrote:
My problem is: the [content] value can be huge. Duplicating it in more
than
one index document waste disk space (and search time?). In additions, when
new documents are added to the second index, it will be faster to just
index
the linked [content] once (in first index file) and any subsequent
reference
to the same [content] will not need to re-index.
In fact, I do not really need to physically separate the indices into 2
files because Lucene supports Heterogeneous documents in an index file. I
have no idea of how that work in a search. Does anyone know?
Thanks in advance.
-----Original Message-----
From: Erick Erickson
Sent: Tuesday, May 06, 2008 9:36 AM
To: [email protected]
Subject: Re: How to make a query that associates 2 index files
You don't. You really have to roll your own solution here, there's
no "inter-index" awareness that I know of in Lucene.
Typically, people either do a half-half solution (that is, put the
text search in Lucene and leave the DB parts in the DB) or
de-normalize the data in a Lucene index so you don't have
to even try to do things cross-index.
And then there's Marcelo Ochoa (I may have the spelling wrong) who's
put together some way to embed Lucene in an Oracle database, but
that's magic to me.
Why do you want to arrange your index this way in the first place?
Perhaps there's a more specific answer waiting out there...
Best
Erick
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
My problem is: the [content] value can be huge. Duplicating it in more
than
one index document waste disk space (and search time?). In additions, when
new documents are added to the second index, it will be faster to just
index
the linked [content] once (in first index file) and any subsequent
reference
to the same [content] will not need to re-index.
In fact, I do not really need to physically separate the indices into 2
files because Lucene supports Heterogeneous documents in an index file. I
have no idea of how that work in a search. Does anyone know?
Thanks in advance.
-----Original Message-----
From: Erick Erickson
Sent: Tuesday, May 06, 2008 9:36 AM
To: [email protected]
Subject: Re: How to make a query that associates 2 index files
You don't. You really have to roll your own solution here, there's
no "inter-index" awareness that I know of in Lucene.
Typically, people either do a half-half solution (that is, put the
text search in Lucene and leave the DB parts in the DB) or
de-normalize the data in a Lucene index so you don't have
to even try to do things cross-index.
And then there's Marcelo Ochoa (I may have the spelling wrong) who's
put together some way to embed Lucene in an Oracle database, but
that's magic to me.
Why do you want to arrange your index this way in the first place?
Perhaps there's a more specific answer waiting out there...
Best
Erick
On Tue, May 6, 2008 at 12:14 PM, Michael Siu wrote:
Hi,
I am a newbie to Lucene. I have a question for making a query that
associate
2 index files:
- One index has the content index for a list of documents and a key to the
document. That means the Lucene document of this index contains 2 fields:
the 'content' and the 'key'.
- another index has the some data indexed associated with the 'key' in the
previous index. The Lucene document of this index contains several fields:
the 'who' that contains some data and the 'key' that _points_ to the
document in the first index.
Sample data:
Index_1: [key] [content]
Abc "blah blah 123 ..."
Xyz "123 321 a nice day ..."
Index_2: [who] [accessed] [key]
David 1/1/2007 Abc
Someone 1/2/2005 Abc
Guess 12/1/2000 Xyz
Harry 1/1/2008 Abc
Sandra 1/1/2003 Xyz
As shown, the [key] field in Index_2 has repeated value that _points_ to
the
[key] values in Index_1. How do I make a query for the following:
Find out all documents in Index_2:
- [who] is in range of 'David' to 'Guess' and
- [accessed] in range '1/1/1900' to '1/1/2010' and
- [key] associated [content] in Index_1 that contains the term 'blah'
I know this is more SQL like query. Is Lucene capable of doing this type
of
query that needs associations among index files?
Thanks in advance.
- m
Hi,
I am a newbie to Lucene. I have a question for making a query that
associate
2 index files:
- One index has the content index for a list of documents and a key to the
document. That means the Lucene document of this index contains 2 fields:
the 'content' and the 'key'.
- another index has the some data indexed associated with the 'key' in the
previous index. The Lucene document of this index contains several fields:
the 'who' that contains some data and the 'key' that _points_ to the
document in the first index.
Sample data:
Index_1: [key] [content]
Abc "blah blah 123 ..."
Xyz "123 321 a nice day ..."
Index_2: [who] [accessed] [key]
David 1/1/2007 Abc
Someone 1/2/2005 Abc
Guess 12/1/2000 Xyz
Harry 1/1/2008 Abc
Sandra 1/1/2003 Xyz
As shown, the [key] field in Index_2 has repeated value that _points_ to
the
[key] values in Index_1. How do I make a query for the following:
Find out all documents in Index_2:
- [who] is in range of 'David' to 'Guess' and
- [accessed] in range '1/1/1900' to '1/1/2010' and
- [key] associated [content] in Index_1 that contains the term 'blah'
I know this is more SQL like query. Is Lucene capable of doing this type
of
query that needs associations among index files?
Thanks in advance.
- m
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]