Grokbase Groups HBase dev July 2011
FAQ
Hi,
http://s.apache.org/x4 has 38 JIRAs listed, excluding HFile v2 and
HBASE-4027.

I recommend one round of cleanup so that JIRAs lacking owner can be deferred
so that 0.92 branch becomes feasible.

My two cents.

Search Discussions

  • Stack at Jul 21, 2011 at 5:44 am
    Agreed. I'll do an edit this evening.
    St.Ack
    On Wed, Jul 20, 2011 at 4:14 PM, Ted Yu wrote:
    Hi,
    http://s.apache.org/x4 has 38 JIRAs listed, excluding HFile v2 and
    HBASE-4027.

    I recommend one round of cleanup so that JIRAs lacking owner can be deferred
    so that 0.92 branch becomes feasible.

    My two cents.
  • Ted Yu at Jul 25, 2011 at 9:31 pm
    http://s.apache.org/x4 has grown to 40 issues.

    We should clean up the above list so that coprocessors can be used by more
    people.

    I suggest moving HBASE-4060 out of 0.92 release.
    On Mon, Jul 25, 2011 at 2:26 PM, Gary Helmling wrote:

    Unfortunately there's no easy patch set to pull coprocessors into any 0.90
    HBase version (including CDH3 HBase). The changes are extensive and
    invasive and include RPC protocol changes. Internally at Trend Micro we
    run
    a heavily, heavily patched 0.90-based version of HBase that includes
    coprocessors and security. But that is only possible with a lot of effort
    to keep things up to date with the HBase 0.90 development.

    At one point we had made a 0.90-coprocessor branch available, but it's
    simply too much work to keep it up to date. It's in everyone's best
    interests if we instead focus on getting out a 0.92 release that includes
    coprocessors.

    HBase trunk (and by extension 0.92) of course supports running on CDH3, so
    you should have no problem plugging in the new version once HBase 0.92 is
    out.

    --gh


    On Mon, Jul 25, 2011 at 1:23 PM, Paul Nickerson <
    paul.nickerson@escapemg.com
    wrote:
    We currently run on the cloudera stack. Would this be something that we can
    pull, compile, and plug right into that stack?

    ----- Original Message -----

    From: "Gary Helmling" <ghelmling@gmail.com>
    To: user@hbase.apache.org
    Sent: Monday, July 25, 2011 2:02:50 PM
    Subject: Re: Fanning out hbase queries in parallel

    Coprocessors are currently only in trunk. They will be in the 0.92 release
    once we get that out. There's no set date for that, but personally I'll be
    trying to help get it out sooner than later.


    On Mon, Jul 25, 2011 at 7:37 AM, Michel Segel <michael_segel@hotmail.com
    wrote:
    Which release(s) have coprocessors enabled?

    Sent from a remote device. Please excuse any typos...

    Mike Segel
    On Jul 24, 2011, at 11:03 PM, Sonal Goyal wrote:

    Hi Paul,

    Have you taken a look at HBase coprocessors? I think you will find
    them
    useful.

    Best Regards,
    Sonal
    <https://github.com/sonalgoyal/hiho>Hadoop ETL and Data
    Integration<https://github.com/sonalgoyal/hiho>
    Nube Technologies <http://www.nubetech.co>

    <http://in.linkedin.com/in/sonalgoyal>





    On Mon, Jul 25, 2011 at 8:13 AM, Paul Nickerson <
    paul.nickerson@escapemg.com
    wrote:
    I would like to implement a multidimensional query system that
    aggregates
    large amounts of data on-the-fly by fanning out queries in parallel.
    It
    should be fast enough for interactive exploration of the data and
    extensible
    enough to take sets of hundreds or thousands of dimensions with high
    cardinality, and aggregate them from high granularity to low
    granularity.
    Dimensions and their values are stored in the row key. For instance,
    row
    keys look like this
    Foo=bar,blah=123
    and each row contains numerical values within their column families,
    such
    as plays=100, versioned by the date of calculation.
    User wants the top "Foo" values with blah=123 sorted downward by
    total
    plays in july. My current thinking is that a query would get
    executed
    by
    grouping all Foo-prefixed row keys by region server, and send the
    query
    to
    each of those. Each region server iterates through all of it's row
    keys
    that
    start with Foo=something,blah=, and passes the query on to all
    regions
    containing blahs that equal 123, which then contain play counts.
    Matching
    row keys, as well as the sum of all their play values within july,
    are
    passed back up the chain and sorted/truncated when possible.


    It seems quite complicated and would involve either modifying hbase
    source
    code or at the very least using the deep internals of the api. Does
    this
    seem like a practical solution or could someone offer some ideas?


    Thank you!

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupdev @
categorieshbase, hadoop
postedJul 20, '11 at 11:14p
activeJul 25, '11 at 9:31p
posts3
users2
websitehbase.apache.org

2 users in discussion

Ted Yu: 2 posts Stack: 1 post

People

Translate

site design / logo © 2022 Grokbase