FAQ
Stephane,

The actual indexing is actually less glamorous than it sounds. When
you index 1TB across 10 machines you end up with 100GB on each machine. We
do not merge the indexes either, since we get better speed on indexing as
well as querying when we keep indexes smaller and distributed across
different machines. (But somehow I think that I'll sit down and merge all of
them together and play with it when I get a chance ... 'cause it's cool :-)
I'll keep you posted when it happens).

My test set that I am playing with is 40GB, and I just posted a
benchmark.

Best,
Jochen
-----Original Message-----
From: Stephane Vaucher
Sent: Thursday, December 18, 2003 9:01 AM
To: Lucene Users List; jochen_frey@yahoo.com
Subject: RE: Indexing Speed: Documents vs. Sentences

Jochen,

If you have a bit of time, could you post some metrics, (as an example,
you can look at http://jakarta.apache.org/lucene/docs/benchmarks.html). I
haven't heard of anyone indexing 1TB yet. I'm sure everyone is interested
in problems you could be facing and we could probably give you some ideas.
I know (oddly enough) I sometimes wish I had dataset greater than a few M
docs to experiment with.

cheers,
sv
On Thu, 18 Dec 2003, Jochen Frey wrote:

Hi,

Yes, this is correct, I am dealing with a few 100GB (close to 1TB).
I am, however, distributing the data across several machines and then merge
the results from all the machines together (until I find a better & faster
solution).

Cheers!
-----Original Message-----
From: Victor Hadianto
Sent: Wednesday, December 17, 2003 10:50 PM
To: Lucene Users List
Subject: Re: Indexing Speed: Documents vs. Sentences
Hi,

I am using Lucene to index a large number of web pages (a few 100GB)
and
the
indexing speed is great.
Jochen .. a few 100 GB? Is this correct?

/victor


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org

Search Discussions

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupjava-user @
categorieslucene
postedDec 19, '03 at 5:05p
activeDec 19, '03 at 5:05p
posts1
users1
websitelucene.apache.org

1 user in discussion

Jochen Frey: 1 post

People

Translate

site design / logo © 2022 Grokbase