Grokbase Groups HBase dev June 2011
FAQ
Lucene has a compact FST (Finite State Transducer) that's used for the
sorted terms index. I think this is the same type of functionality as
the HBase block index, eg, a sorted index of row ids? The FST is more
compact keeping every Nth row id in RAM. Does the HFile format allow
pluggable block index implementations?

I posted this to Jira issues however that's probably not the best place.

Search Discussions

  • Ted Yu at Jun 2, 2011 at 4:21 pm
    Currently BlockIndex is an inner class of HFile.
    It would be nice to support pluggable block index implementations.

    FYI
    On Thu, Jun 2, 2011 at 9:09 AM, Jason Rutherglen wrote:

    Lucene has a compact FST (Finite State Transducer) that's used for the
    sorted terms index. I think this is the same type of functionality as
    the HBase block index, eg, a sorted index of row ids? The FST is more
    compact keeping every Nth row id in RAM. Does the HFile format allow
    pluggable block index implementations?

    I posted this to Jira issues however that's probably not the best place.
  • Andrew Purtell at Jun 2, 2011 at 4:24 pm

    It would be nice to support pluggable block index
    implementations.
    +1

    Perhaps we do this in the scope of HFile "v2"? https://issues.apache.org/jira/browse/HBASE-3857

    - Andy


    --- On Thu, 6/2/11, Ted Yu wrote:
    From: Ted Yu <yuzhihong@gmail.com>
    Subject: Re: Lucene's FST for the block index
    To: dev@hbase.apache.org
    Date: Thursday, June 2, 2011, 9:20 AM
    Currently BlockIndex is an inner
    class of HFile.
    It would be nice to support pluggable block index
    implementations.

    FYI

    On Thu, Jun 2, 2011 at 9:09 AM, Jason Rutherglen <jason.rutherglen@gmail.com
    wrote:
    Lucene has a compact FST (Finite State Transducer)
    that's used for the
    sorted terms index.  I think this is the same
    type of functionality as
    the HBase block index, eg, a sorted index of row
    ids?  The FST is more
    compact keeping every Nth row id in RAM.  Does
    the HFile format allow
    pluggable block index implementations?

    I posted this to Jira issues however that's probably
    not the best place.
  • Jason Rutherglen at Jun 2, 2011 at 4:32 pm

    The FST is more compact [than] keeping every Nth row id in RAM.
    It would be nice to support pluggable block index implementations
    Maybe we should try to support this prior to the HFile v2, which
    instead uses a tree structure to layout the blocks? Eg, a pluggable
    block index then becomes more difficult. I think HFile v2 lists the
    memory usage of the bloom filter and the block index as primary
    motivations for creation. There has also been work to try to turn the
    FST into a bloom filter like data structure.
    Perhaps we do this in the scope of HFile "v2"? https://issues.apache.org/jira/browse/HBASE-3857
    I'm not sure.

    The other possible usage of the FST is to simply store all rowids
    (compactly) into it and lay it out on disk, eg, then a separate block
    index should not be required. We could test out and benchmark these
    scenarios with a pluggable HFile system.

    On Thu, Jun 2, 2011 at 9:09 AM, Jason Rutherglen
    wrote:
    Lucene has a compact FST (Finite State Transducer) that's used for the
    sorted terms index.  I think this is the same type of functionality as
    the HBase block index, eg, a sorted index of row ids?  The FST is more
    compact keeping every Nth row id in RAM.  Does the HFile format allow
    pluggable block index implementations?

    I posted this to Jira issues however that's probably not the best place.
  • Abinash Karana \(Bizosys\) at Jun 3, 2011 at 1:52 am
    Hi,

    I had developed a NOSQL Search HSearch
    (http://bizosyshsearch.sourceforge.net/ ) where HBase is the data storage,
    we indexed around 10 times of wikipedia information distributing in 10
    Amazon EC2 machine and searching 100 concurrent users. It works wonderfully.

    I presented my findings in India Hadoop forum.
    http://www.slideshare.net/ydn/ahis2011-application-searching-information-ins
    ide-hadoop-platform

    Cheers!
    Abinash

    -----Original Message-----
    From: Jason Rutherglen
    Sent: Thursday, June 02, 2011 10:02 PM
    To: dev@hbase.apache.org
    Subject: Re: Lucene's FST for the block index
    The FST is more compact [than] keeping every Nth row id in RAM.
    It would be nice to support pluggable block index implementations
    Maybe we should try to support this prior to the HFile v2, which
    instead uses a tree structure to layout the blocks? Eg, a pluggable
    block index then becomes more difficult. I think HFile v2 lists the
    memory usage of the bloom filter and the block index as primary
    motivations for creation. There has also been work to try to turn the
    FST into a bloom filter like data structure.
    Perhaps we do this in the scope of HFile "v2"?
    https://issues.apache.org/jira/browse/HBASE-3857

    I'm not sure.

    The other possible usage of the FST is to simply store all rowids
    (compactly) into it and lay it out on disk, eg, then a separate block
    index should not be required. We could test out and benchmark these
    scenarios with a pluggable HFile system.

    On Thu, Jun 2, 2011 at 9:09 AM, Jason Rutherglen
    wrote:
    Lucene has a compact FST (Finite State Transducer) that's used for the
    sorted terms index.  I think this is the same type of functionality as
    the HBase block index, eg, a sorted index of row ids?  The FST is more
    compact keeping every Nth row id in RAM.  Does the HFile format allow
    pluggable block index implementations?

    I posted this to Jira issues however that's probably not the best place.

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupdev @
categorieshbase, hadoop
postedJun 2, '11 at 4:10p
activeJun 3, '11 at 1:52a
posts5
users4
websitehbase.apache.org

People

Translate

site design / logo © 2022 Grokbase