FAQ
Can Katta be used on an EC2 cluster?

The reason why I ask this is that it appears to use ZooKeeper which ideally
needs to have a dedicated drive. That may not be possible in a shared
environment. Is this a non-issue with respect to Katta?

I would appreciate any input in this regard.

Dev

On Wed, Jun 3, 2009 at 2:03 AM, Ted Dunning wrote:

Just a quick plug for Katta. We use it extensively (and have been sending
back some patches).

See www.deepdyve.com for a test drive.

At my previous job, we had utter fits working with SOLR using sharded
retrieval. Katta is designed to address the sharding problem very well and
we have been very happy. Our extensions have been to adapt Katta so that
it
is a general sharding and replication engine that supports general queries.
For some things we use a modified Lucene, for other things, we use our own
code. Katta handles that really well.
On Tue, Jun 2, 2009 at 9:23 AM, Tarandeep Singh wrote:

thanks all for your replies. I am checking Katta...

-Tarandeep
On Tue, Jun 2, 2009 at 8:05 AM, Stefan Groschupf wrote:

Hi,
you might want to checkout:
http://katta.sourceforge.net/

Stefan

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Hadoop training and consulting
http://www.scaleunlimited.com
http://www.101tec.com




On Jun 1, 2009, at 9:54 AM, Tarandeep Singh wrote:

Hi All,
I am trying to build a distributed system to build and serve lucene
indexes.
I came across the Distributed Lucene project-
http://wiki.apache.org/hadoop/DistributedLucene
https://issues.apache.org/jira/browse/HADOOP-3394

and have a couple of questions. It will be really helpful if someone
can
provide some insights.

1) Is this code production ready?
2) Does someone has performance data for this project?
3) It allows searches and updates/deletes to be performed at the same
time.
How well the system will perform if there are frequent updates to the
system. Will it handle the search and update load easily or will it be
better to rebuild or update the indexes on different machines and then
deploy the indexes back to the machines that are serving the indexes?

Basically I am trying to choose between the 2 approaches-

1) Use Hadoop to build and/or update Lucene indexes and then deploy
them
on
separate cluster that will take care or load balancing, fault
tolerance
etc.
There is a package in Hadoop contrib that does this, so I can use that
code.

2) Use and/or modify the Distributed Lucene code.

I am expecting daily updates to our index so I am not sure if
Distribtued
Lucene code (which allows searches and updates on the same indexes)
will
be
able to handle search and update load efficiently.

Any suggestions ?

Thanks,
Tarandeep


--
Ted Dunning, CTO
DeepDyve

111 West Evelyn Ave. Ste. 202
Sunnyvale, CA 94086
http://www.deepdyve.com
858-414-0013 (m)
408-773-0220 (fax)

Search Discussions

Discussion Posts

Previous

Follow ups

Related Discussions

Discussion Navigation
viewthread | post
posts ‹ prev | 6 of 8 | next ›
Discussion Overview
groupcommon-user @
categorieshadoop
postedJun 1, '09 at 4:55p
activeJun 3, '09 at 4:00a
posts8
users6
websitehadoop.apache.org...
irc#hadoop

People

Translate

site design / logo © 2022 Grokbase