Searching billions of anything is likely to be challenging. Mark
Miller's document at
http://www.lucidimagination.com/content/scaling-lucene-and-solr looks
well worth a read.
-if i search on last week's index and the individual index (this needs to be
opened at search request!?) will it be faster than using a single huge index
for all groups, for all weeks?
Too many variables to say.
-is* IndexSearcher searcher= new
IndexSearcher(IndexReader.open(writer,false));* read only?
Surely searchers are read only, by definition.
-How can i give NearRealTime acces to an IndexWriter started in another
application.
Sounds impossible.
-How can i store alldocuments from results. Something like AllDocs
(equivalent to TopDocs) of AllDocsCollector(
TopDocsCollector).
Not clear what you are asking here, but you can pass whatever you like
as the max doc count to the assorted search methods, and do whatever
you want with the results. Storing all docs from search results on a
massive index doesn't sound a very clever idea.
I understood that Tweeter submitted their code on realTime architecture to
lucene, can i get my hands on that ?
No idea.
--
Ian.
On Wed, Jul 13, 2011 at 10:09 AM, Mihai Caraman wrote:
Hello,
My name is Mihai and I'm trying to write a java (later I'll need to port it
to pylucene) search on billions of mentions like twitter statuses. Mentions
are grouped by some containing keywords.
I'm thinking of partitioning the index for faster results as follows:
common index for the past week
common index for earlier small groups | individual indexes for very large
groups
My questions are:
-if i search on last week's index and the individual index (this needs to be
opened at search request!?) will it be faster than using a single huge index
for all groups, for all weeks?
-is* IndexSearcher searcher= new
IndexSearcher(IndexReader.open(writer,false));* read only? if not how can i
build numerous near-real-time readers on same writer(index)?
-How can i give NearRealTime acces to an IndexWriter started in another
application.
-How can i store alldocuments from results. Something like AllDocs
(equivalent to TopDocs) of AllDocsCollector(
TopDocsCollector).
I understood that Tweeter submitted their code on realTime architecture to
lucene, can i get my hands on that ?
Thank you in advance,
Mihai
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org