I am trying to build a distributed system to build and serve lucene indexes.
I came across the Distributed Lucene project-
and have a couple of questions. It will be really helpful if someone can
provide some insights.
1) Is this code production ready?
2) Does someone has performance data for this project?
3) It allows searches and updates/deletes to be performed at the same time.
How well the system will perform if there are frequent updates to the
system. Will it handle the search and update load easily or will it be
better to rebuild or update the indexes on different machines and then
deploy the indexes back to the machines that are serving the indexes?
Basically I am trying to choose between the 2 approaches-
1) Use Hadoop to build and/or update Lucene indexes and then deploy them on
separate cluster that will take care or load balancing, fault tolerance etc.
There is a package in Hadoop contrib that does this, so I can use that code.
2) Use and/or modify the Distributed Lucene code.
I am expecting daily updates to our index so I am not sure if Distribtued
Lucene code (which allows searches and updates on the same indexes) will be
able to handle search and update load efficiently.
Any suggestions ?