FAQ
Hi everyone,

My previous mail mentioned a bunch of design ideas that were mainly
lifted from Jeff Dean's BigTable talk. BigTable seems like a useful
way to do large-scale row storage, and their decisions largely seem
like the right ones.

However, BigTable still leaves some things on the table. Items to
improve include a query language and multi-row locking, among
other things.

Dean said explicitly in his talk that they wanted to avoid multirow
locking because it's complicated, error-prone, and maybe not necessary.
He's right on at least the first two, and maybe the third.

Multiple row locks are useful when you're making a change to
several rows that should be atomic; you want all the changes
or none of the changes. It's also used in traditional databases
if you want to perform an expensive read operation (like a
multiway join) and you want to make sure the results don't
get modified while you're reading.

Distributed lock acquisition is very hard to do. It's bug-prone
and often has very weird performance ramifications. It's
difficult to get working, difficult to tune, difficult to everything.

Here are a few ideas on what to do:
1) Suck it up and have the client acquire locks on multiple
HRegionServers simultaneously. All clients would have to
agree to acquire locks according to some global ordering to
avoid deadlock. HRegions would not be allowed to migrate
to a new server if locked.

If this is a rare circumstance, a better approach would be
to have a dedicated "lock acquirer" through which clients
make requests. It doesn't help the theoretical problem here,
but it would make debugging an awful lot easier.

2) In the case of long-lasting read operations, we can
use versioning to guarantee consistency. If each row is
annotated with an edit timestamp, and we know that there
is sufficient version history available, the long-lasting job
can run over a specific version only.

Edits can continue to be made to the database while the
read-only job is ongoing. The operation is performed over
the database as of the time the task was submitted.

3) In the case of multiple row updates, we may be able to
use different edit semantics to avoid locking. For example,
consider that we want to add a single column/value pair to
multiple rows. We want this to happen atomically, so that
both rows get the value or neither of them do so.

If it's just an add, then we don't need to lock the rows at
all; the add will always succeed, even if other writes
intervene. Traditionally there's been no difference between
among data "updates", so they all require locking. If we
can get a client to adjust the update semantics slightly,
then the locking can be much more relaxed.

I'd say that "add" or "append" semantics are likely to be
at least as common as "edit" semantics.

Can you think of the family of edit semantics you'd like
to see offered here?

Also, how useful do you think a general-purpose query language
would be for HBase? It would be fairly straightforward to implement,
for example, a poor man's version of SQL that has different locking
and update behavior (and which chucks out the more exotic elements).
This might be compiled into a piece of code that is executed
immediately, or it might be transformed into a long-lasting mapreduce
job.

I have a few ideas for such a language, but I'm worried it's getting
a little far afield from what we're interested in for Hadoop.

--Mike

Search Discussions

  • Arkady Borkovsky at May 19, 2006 at 10:58 pm
    Michael,

    I like the initiative and most of the ideas.
    It looks like the design is very record oriented.

    Some potential applications of HBase-like functionality are
    batch-oriented: select (a large) subset of records, process a subset of
    their columns (often a small subset), and either add either new
    columns, or new versions of columns.
    Several applications like this may be running on the same "table"
    simultaneously.

    Think, for example, about query log processing, where the key is a
    search query, and the daily stats are represented as columns (several
    columns per day), plus columns for data aggregated by week, by month,
    etc.

    How well does the proposal address this kind of uses?

    -- ab
    On May 15, 2006, at 9:54 PM, Michael Cafarella wrote:

    Hi everyone,

    My previous mail mentioned a bunch of design ideas that were mainly
    lifted from Jeff Dean's BigTable talk. BigTable seems like a useful
    way to do large-scale row storage, and their decisions largely seem
    like the right ones.

    However, BigTable still leaves some things on the table. Items to
    improve include a query language and multi-row locking, among
    other things.

    Dean said explicitly in his talk that they wanted to avoid multirow
    locking because it's complicated, error-prone, and maybe not necessary.
    He's right on at least the first two, and maybe the third.

    Multiple row locks are useful when you're making a change to
    several rows that should be atomic; you want all the changes
    or none of the changes. It's also used in traditional databases
    if you want to perform an expensive read operation (like a
    multiway join) and you want to make sure the results don't
    get modified while you're reading.

    Distributed lock acquisition is very hard to do. It's bug-prone
    and often has very weird performance ramifications. It's
    difficult to get working, difficult to tune, difficult to everything.

    Here are a few ideas on what to do:
    1) Suck it up and have the client acquire locks on multiple
    HRegionServers simultaneously. All clients would have to
    agree to acquire locks according to some global ordering to
    avoid deadlock. HRegions would not be allowed to migrate
    to a new server if locked.

    If this is a rare circumstance, a better approach would be
    to have a dedicated "lock acquirer" through which clients
    make requests. It doesn't help the theoretical problem here,
    but it would make debugging an awful lot easier.

    2) In the case of long-lasting read operations, we can
    use versioning to guarantee consistency. If each row is
    annotated with an edit timestamp, and we know that there
    is sufficient version history available, the long-lasting job
    can run over a specific version only.

    Edits can continue to be made to the database while the
    read-only job is ongoing. The operation is performed over
    the database as of the time the task was submitted.

    3) In the case of multiple row updates, we may be able to
    use different edit semantics to avoid locking. For example,
    consider that we want to add a single column/value pair to
    multiple rows. We want this to happen atomically, so that
    both rows get the value or neither of them do so.

    If it's just an add, then we don't need to lock the rows at
    all; the add will always succeed, even if other writes
    intervene. Traditionally there's been no difference between
    among data "updates", so they all require locking. If we
    can get a client to adjust the update semantics slightly,
    then the locking can be much more relaxed.

    I'd say that "add" or "append" semantics are likely to be
    at least as common as "edit" semantics.

    Can you think of the family of edit semantics you'd like
    to see offered here?

    Also, how useful do you think a general-purpose query language
    would be for HBase? It would be fairly straightforward to implement,
    for example, a poor man's version of SQL that has different locking
    and update behavior (and which chucks out the more exotic elements).
    This might be compiled into a piece of code that is executed
    immediately, or it might be transformed into a long-lasting mapreduce
    job.

    I have a few ideas for such a language, but I'm worried it's getting
    a little far afield from what we're interested in for Hadoop.

    --Mike
  • Michael Cafarella at Jun 5, 2006 at 7:54 am
    Arkady,

    Sorry for the delay in responding to your message.

    I think the proposal would handle your log-reporting problem quite
    well. My current plan for iterators that process a lot of rows (instead
    of querying for them, or updating just a few) is to ask the user to
    implement a MapReduce-like class or method. This method is submitted
    to the system, and is invoked for every tuple that passes the query
    criteria.

    Imagine the user implements an interface like this:

    class UserIterator implements HDataIterator {
    void process(String row, String col, String attr) {};
    }

    and submits it using a query interface like this:

    boolean submitIterator(HDataIterator it, HQuery query);

    where HQuery describes the rows for which DataIterator.process()
    will be invoked. The user class can store state across invocations,
    making it quite easy to compute sums/averages for log summaries. I
    think this is a better way to do aggregates than a fancy query language,
    at least until the system and requirements are better-understood.

    BTW, for a completely different look at how to do log processing,
    you might take a look at the following quite interesting paper:

    *PADX*: Querying large-scale ad hoc data with
    XQuery<http://scholar.google.com/url?sa=U&q=http://www.cs.princeton.edu/%7Eyitzhakm/publications/padx_planx.pdf>
    M Fernandez, K Fisher, Y Mandelbaum - Submitted to PLAN-X, 2006 -
    cs.princeton.edu
    http://scholar.google.com/url?sa=U&q=http://www.cs.princeton.edu/~yitzhakm/publications/padx_planx.pdf

    Best,
    --Mike

    On 5/19/06, Arkady Borkovsky wrote:


    Some potential applications of HBase-like functionality are
    batch-oriented: select (a large) subset of records, process a subset of
    their columns (often a small subset), and either add either new
    columns, or new versions of columns.
    Several applications like this may be running on the same "table"
    simultaneously.
    Think, for example, about query log processing, where the key is a
    search query, and the daily stats are represented as columns (several
    columns per day), plus columns for data aggregated by week, by month,
    etc.

    How well does the proposal address this kind of uses?

    -- ab
    On May 15, 2006, at 9:54 PM, Michael Cafarella wrote:

    Hi everyone,

    My previous mail mentioned a bunch of design ideas that were mainly
    lifted from Jeff Dean's BigTable talk. BigTable seems like a useful
    way to do large-scale row storage, and their decisions largely seem
    like the right ones.

    However, BigTable still leaves some things on the table. Items to
    improve include a query language and multi-row locking, among
    other things.

    Dean said explicitly in his talk that they wanted to avoid multirow
    locking because it's complicated, error-prone, and maybe not necessary.
    He's right on at least the first two, and maybe the third.

    Multiple row locks are useful when you're making a change to
    several rows that should be atomic; you want all the changes
    or none of the changes. It's also used in traditional databases
    if you want to perform an expensive read operation (like a
    multiway join) and you want to make sure the results don't
    get modified while you're reading.

    Distributed lock acquisition is very hard to do. It's bug-prone
    and often has very weird performance ramifications. It's
    difficult to get working, difficult to tune, difficult to everything.

    Here are a few ideas on what to do:
    1) Suck it up and have the client acquire locks on multiple
    HRegionServers simultaneously. All clients would have to
    agree to acquire locks according to some global ordering to
    avoid deadlock. HRegions would not be allowed to migrate
    to a new server if locked.

    If this is a rare circumstance, a better approach would be
    to have a dedicated "lock acquirer" through which clients
    make requests. It doesn't help the theoretical problem here,
    but it would make debugging an awful lot easier.

    2) In the case of long-lasting read operations, we can
    use versioning to guarantee consistency. If each row is
    annotated with an edit timestamp, and we know that there
    is sufficient version history available, the long-lasting job
    can run over a specific version only.

    Edits can continue to be made to the database while the
    read-only job is ongoing. The operation is performed over
    the database as of the time the task was submitted.

    3) In the case of multiple row updates, we may be able to
    use different edit semantics to avoid locking. For example,
    consider that we want to add a single column/value pair to
    multiple rows. We want this to happen atomically, so that
    both rows get the value or neither of them do so.

    If it's just an add, then we don't need to lock the rows at
    all; the add will always succeed, even if other writes
    intervene. Traditionally there's been no difference between
    among data "updates", so they all require locking. If we
    can get a client to adjust the update semantics slightly,
    then the locking can be much more relaxed.

    I'd say that "add" or "append" semantics are likely to be
    at least as common as "edit" semantics.

    Can you think of the family of edit semantics you'd like
    to see offered here?

    Also, how useful do you think a general-purpose query language
    would be for HBase? It would be fairly straightforward to implement,
    for example, a poor man's version of SQL that has different locking
    and update behavior (and which chucks out the more exotic elements).
    This might be compiled into a piece of code that is executed
    immediately, or it might be transformed into a long-lasting mapreduce
    job.

    I have a few ideas for such a language, but I'm worried it's getting
    a little far afield from what we're interested in for Hadoop.

    --Mike
  • Stefan Groschupf at May 30, 2006 at 9:42 pm
    Hi Michael,
    However, BigTable still leaves some things on the table. Items to
    improve include a query language and multi-row locking, among
    other things.
    In my project we do not need to have multi row locking at all. I need
    to write and read as fast as possible into many rows - column pairs
    but each of them is atomically.
    I'm little bit worry that the goals are to big for now and having
    such difficult to implement functionality will slow down a general
    hadoop kind of big table implementation.
    I really would love to start small and grow may be later with each
    release.
    Also, how useful do you think a general-purpose query language
    would be for HBase? It would be fairly straightforward to implement,
    for example, a poor man's version of SQL that has different locking
    and update behavior (and which chucks out the more exotic elements).
    This might be compiled into a piece of code that is executed
    immediately, or it might be transformed into a long-lasting mapreduce
    job.
    I personal don't like query languages at all - especially sql. :-)
    They are only really usefully if you work on a admin gui tool and
    want to check your data, from my point of view.
    Since some years I really prefer api based query mechanisms as
    hibernate or db4o provide.
    For example query by example is just easy to use and can be powerful.

    A query is any time a string in your code you need to maintain
    queries separately. You need to scan and update your code as soon
    your change just one column name etc.
    But having a API based Query mechanism allows to develop faster from
    my experience. Keywords in my mind are test driven development,
    refactoring, refactoring tools like eclipse etc.

    Just my 2 cents. :-)
    However I would love to see a big table.
    Cheers,
    Stefan
  • Michael Cafarella at Jun 5, 2006 at 8:01 am
    Hi Stefan,
    On 5/30/06, Stefan Groschupf wrote:

    In my project we do not need to have multi row locking at all. I need
    to write and read as fast as possible into many rows - column pairs
    but each of them is atomically.
    I'm little bit worry that the goals are to big for now and having
    such difficult to implement functionality will slow down a general
    hadoop kind of big table implementation.
    I really would love to start small and grow may be later with each
    release.

    A good point. I've scaled down my ambitions for the thing.
    Locking, query languages, even compression - all that stuff is for
    version 2+. I might have gotten a little carried away, asking for
    query language examples before the row store works properly ;)

    I personal don't like query languages at all - especially sql. :-)
    They are only really usefully if you work on a admin gui tool and
    want to check your data, from my point of view.
    Since some years I really prefer api based query mechanisms as
    hibernate or db4o provide.
    For example query by example is just easy to use and can be powerful.
    A query is any time a string in your code you need to maintain
    queries separately. You need to scan and update your code as soon
    your change just one column name etc.
    But having a API based Query mechanism allows to develop faster from
    my experience. Keywords in my mind are test driven development,
    refactoring, refactoring tools like eclipse etc.

    In theory, a declarative language like SQL should enable all sorts
    of cool things on top of BigTable. In practice, it might take 10 years
    before
    the query optimizer works right ;) So for the moment, no SQL
    for us.

    Though there is probably a great query language waiting to be written
    on top of a BigTable-like store.

    --Mike

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcommon-dev @
categorieshadoop
postedMay 16, '06 at 4:54a
activeJun 5, '06 at 8:01a
posts5
users3
websitehadoop.apache.org...
irc#hadoop

People

Translate

site design / logo © 2022 Grokbase