I am working on a distributed searching system. Now I have one server only.
It has to crawl pages from the Web, generate indexes locally and respond
users' queries. I think this is too busy for it to work smoothly.
I plan to use two servers at at least. The jobs to crawl pages and generate
indexes are done by one of them. After that, the new available indexes
should be transmitted to anther one which is responsible for responding
users' queries. From users' point of view, this system must be fast.
However, I don't know how I can get the additional indexes which I can
transmit. After transmission, how to append them to the old indexes? Does
the appending block searching?
When generating indexex, Lucene is used. However, I cannot see the updates
so that I cannot send them. I know Hadoop does the above thing internally.
How can it be merged with Lucene?
Thanks so much for your help!