I had developed a NOSQL Search HSearch
(http://bizosyshsearch.sourceforge.net/ ) where HBase is the data storage,
we indexed around 10 times of wikipedia information distributing in 10
Amazon EC2 machine and searching 100 concurrent users. It works wonderfully.
I presented my findings in India Hadoop forum.http://www.slideshare.net/ydn/ahis2011-application-searching-information-ins
From: Jason Rutherglen
Sent: Thursday, June 02, 2011 10:02 PM
Subject: Re: Lucene's FST for the block index
The FST is more compact [than] keeping every Nth row id in RAM.
It would be nice to support pluggable block index implementations
Maybe we should try to support this prior to the HFile v2, which
instead uses a tree structure to layout the blocks? Eg, a pluggable
block index then becomes more difficult. I think HFile v2 lists the
memory usage of the bloom filter and the block index as primary
motivations for creation. There has also been work to try to turn the
FST into a bloom filter like data structure.
Perhaps we do this in the scope of HFile "v2"?https://issues.apache.org/jira/browse/HBASE-3857
I'm not sure.
The other possible usage of the FST is to simply store all rowids
(compactly) into it and lay it out on disk, eg, then a separate block
index should not be required. We could test out and benchmark these
scenarios with a pluggable HFile system.
On Thu, Jun 2, 2011 at 9:09 AM, Jason Rutherglen
Lucene has a compact FST (Finite State Transducer) that's used for the
sorted terms index. I think this is the same type of functionality as
the HBase block index, eg, a sorted index of row ids? The FST is more
compact keeping every Nth row id in RAM. Does the HFile format allow
pluggable block index implementations?
I posted this to Jira issues however that's probably not the best place.