FAQ
I have a dataset which is several terabytes in size. I would like to query
this data using hbase (sql). Would I need to setup mapreduce to use hbase?
Currently the data is stored in hdfs and I am using `hdfs -cat ` to get the
data and pipe it into stdin.


--
--- Get your facts first, then you can distort them as you please.--

Search Discussions

  • Robert Evans at Jul 11, 2011 at 2:54 pm
    Rita,

    My understanding is that you do not need to setup map/reduce to use Hbase, but I am not an expert on it. Contacting the Hbase mailing list would probably be the best option to get your questions answered.

    user@hbase.apache.org

    Their setup page might be able to help you out too

    http://hbase.apache.org/book/notsoquick.html

    I don't believe that Hbase supports SQL though. You can use Hive (http://hive.apache.org/) It supports a lot of SQL, but it does batch processing to run the queries and requires you to set up map/reduce to use.

    --Bobby Evans

    On 7/11/11 6:31 AM, "Rita" wrote:

    I have a dataset which is several terabytes in size. I would like to query
    this data using hbase (sql). Would I need to setup mapreduce to use hbase?
    Currently the data is stored in hdfs and I am using `hdfs -cat ` to get the
    data and pipe it into stdin.


    --
    --- Get your facts first, then you can distort them as you please.--
  • Bharath Mundlapudi at Jul 11, 2011 at 5:40 pm
    Another option to look at is Pig Or Hive. These need MapReduce.


    -Bharath



    ________________________________
    From: Rita <rmorgan466@gmail.com>
    To: "<common-user@hadoop.apache.org>" <common-user@hadoop.apache.org>
    Sent: Monday, July 11, 2011 4:31 AM
    Subject: large data and hbase

    I have a dataset which is several terabytes in size. I would like to query
    this data using hbase (sql). Would I need to setup mapreduce to use hbase?
    Currently the data is stored in hdfs and I am using `hdfs -cat ` to get the
    data and pipe it into stdin.


    --
    --- Get your facts first, then you can distort them as you please.--
  • Hadoopman at Jul 11, 2011 at 5:54 pm
    So we're seeing the following error during some of our hive loads:

    2011-07-05 12:26:52,927 Stage-2 map = 100%, reduce = 100%
    Ended Job = job_201106302113_3864
    Loading data to table default.merged_weblogs partition (day=null)
    Failed with exception Number of dynamic partitions created is 1013,
    which is more than 1000. To solve this try to set
    hive.exec.max.dynamic.partitions to at least 1013.
    FAILED: Execution Error, return code 1 from
    org.apache.hadoop.hive.ql.exec.MoveTask

    Here is a sample script we're running:

    SET hive.exec.dynamic.partition=true;
    SET hive.exec.dynamic.partition.mode=nonstrict;
    SET hive.exec.max.dynamic.partitions.pernode=10000;
    SET hive.exec.max.dynamic.partitions=10000;
    SET hive.exec.max.created.files=150000;

    SET hive.exec.compress.intermediate=true;
    SET hive.intermediate.compression.codec=com.hadoop.compression.lzo.LzoCodec;
    SET hive.intermediate.compression.type=BLOCK;
    SET mapred.map.output.compression.codec=com.hadoop.compression.lzo.LzoCodec;

    SET hive.exec.compress.output=true;
    SET mapred.output.compress=true;
    SET mapred.output.compression.codec=org.apache.hadoop.io.compress.GzipCodec;
    SET mapred.output.compression.type=BLOCK;

    FROM (
    SELECT hostname, name, ip, day
    FROM logsStaging
    UNION ALL
    SELECT hostname, name, ip, day
    FROM logs
    ) a

    INSERT OVERWRITE TABLE logs PARTITION(day)
    SELECT DISTINCT hostname, name, ip, day
    DISTRIBUTE BY day;

    QUIT;


    Has anyone run into this problem before? And I've noticed that
    increasing the number of partitions hasn't been working. Been looking
    for the config.xml setting already configured with 'final' in the
    properties but no go so far. I believe the default is 100 partitions
    and the job (when running) does show 10000 partitions (from the above
    script)

    thoughts on what else to look at?

    Thanks!
  • Rita at Jul 12, 2011 at 10:01 am
    This is encouraging.

    ¨Make sure HDFS is running first. Start and stop the Hadoop HDFS daemons by
    running bin/start-hdfs.sh over in the HADOOP_HOME directory. You can ensure
    it started properly by testing the *put* and *get* of files into the Hadoop
    filesystem. HBase does not normally use the mapreduce daemons. These do not
    need to be started.¨

    On Mon, Jul 11, 2011 at 1:40 PM, Bharath Mundlapudi
    wrote:
    Another option to look at is Pig Or Hive. These need MapReduce.


    -Bharath



    ________________________________
    From: Rita <rmorgan466@gmail.com>
    To: "<common-user@hadoop.apache.org>" <common-user@hadoop.apache.org>
    Sent: Monday, July 11, 2011 4:31 AM
    Subject: large data and hbase

    I have a dataset which is several terabytes in size. I would like to query
    this data using hbase (sql). Would I need to setup mapreduce to use hbase?
    Currently the data is stored in hdfs and I am using `hdfs -cat ` to get the
    data and pipe it into stdin.


    --
    --- Get your facts first, then you can distort them as you please.--


    --
    --- Get your facts first, then you can distort them as you please.--
  • Harsh J at Jul 12, 2011 at 1:02 pm
    For a query to work in a fully distributed manner, MapReduce may still
    be required (atop HBase, i.e.). There's been work ongoing to assist
    the same at the HBase side as well, but you're guaranteed better
    responses on their mailing lists instead.
    On Tue, Jul 12, 2011 at 3:31 PM, Rita wrote:
    This is encouraging.

    ¨Make sure HDFS is running first. Start and stop the Hadoop HDFS daemons by
    running bin/start-hdfs.sh over in the HADOOP_HOME directory. You can ensure
    it started properly by testing the *put* and *get* of files into the Hadoop
    filesystem. HBase does not normally use the mapreduce daemons. These do not
    need to be started.¨

    On Mon, Jul 11, 2011 at 1:40 PM, Bharath Mundlapudi
    wrote:
    Another option to look at is Pig Or Hive. These need MapReduce.


    -Bharath



    ________________________________
    From: Rita <rmorgan466@gmail.com>
    To: "<common-user@hadoop.apache.org>" <common-user@hadoop.apache.org>
    Sent: Monday, July 11, 2011 4:31 AM
    Subject: large data and hbase

    I have a dataset which is several terabytes in size. I would like to query
    this data using hbase (sql). Would I need to setup mapreduce to use hbase?
    Currently the data is stored in hdfs and I am using `hdfs -cat ` to get the
    data and pipe it into stdin.


    --
    --- Get your facts first, then you can distort them as you please.--


    --
    --- Get your facts first, then you can distort them as you please.--


    --
    Harsh J
  • Rita at Jul 13, 2011 at 10:30 am
    Thanks.

    If you mean asking to ask the MapReduce list they will naturally recommend
    it :)

    I suppose I will look into it eventually but we invested a lot of time into
    Torque.


    On Tue, Jul 12, 2011 at 9:01 AM, Harsh J wrote:

    For a query to work in a fully distributed manner, MapReduce may still
    be required (atop HBase, i.e.). There's been work ongoing to assist
    the same at the HBase side as well, but you're guaranteed better
    responses on their mailing lists instead.
    On Tue, Jul 12, 2011 at 3:31 PM, Rita wrote:
    This is encouraging.

    ¨Make sure HDFS is running first. Start and stop the Hadoop HDFS daemons by
    running bin/start-hdfs.sh over in the HADOOP_HOME directory. You can ensure
    it started properly by testing the *put* and *get* of files into the Hadoop
    filesystem. HBase does not normally use the mapreduce daemons. These do not
    need to be started.¨

    On Mon, Jul 11, 2011 at 1:40 PM, Bharath Mundlapudi
    wrote:
    Another option to look at is Pig Or Hive. These need MapReduce.


    -Bharath



    ________________________________
    From: Rita <rmorgan466@gmail.com>
    To: "<common-user@hadoop.apache.org>" <common-user@hadoop.apache.org>
    Sent: Monday, July 11, 2011 4:31 AM
    Subject: large data and hbase

    I have a dataset which is several terabytes in size. I would like to
    query
    this data using hbase (sql). Would I need to setup mapreduce to use
    hbase?
    Currently the data is stored in hdfs and I am using `hdfs -cat ` to get
    the
    data and pipe it into stdin.


    --
    --- Get your facts first, then you can distort them as you please.--


    --
    --- Get your facts first, then you can distort them as you please.--


    --
    Harsh J


    --
    --- Get your facts first, then you can distort them as you please.--
  • Harsh J at Jul 13, 2011 at 12:27 pm
    I meant asking the user@hbase.apache.org list, its pretty active just
    as these are :)
    On Wed, Jul 13, 2011 at 3:59 PM, Rita wrote:
    Thanks.

    If you mean asking to ask the MapReduce list they will naturally recommend
    it :)

    I suppose I will look into it eventually but we invested a lot of time into
    Torque.


    On Tue, Jul 12, 2011 at 9:01 AM, Harsh J wrote:

    For a query to work in a fully distributed manner, MapReduce may still
    be required (atop HBase, i.e.). There's been work ongoing to assist
    the same at the HBase side as well, but you're guaranteed better
    responses on their mailing lists instead.
    On Tue, Jul 12, 2011 at 3:31 PM, Rita wrote:
    This is encouraging.

    ¨Make sure HDFS is running first. Start and stop the Hadoop HDFS daemons by
    running bin/start-hdfs.sh over in the HADOOP_HOME directory. You can ensure
    it started properly by testing the *put* and *get* of files into the Hadoop
    filesystem. HBase does not normally use the mapreduce daemons. These do not
    need to be started.¨

    On Mon, Jul 11, 2011 at 1:40 PM, Bharath Mundlapudi
    wrote:
    Another option to look at is Pig Or Hive. These need MapReduce.


    -Bharath



    ________________________________
    From: Rita <rmorgan466@gmail.com>
    To: "<common-user@hadoop.apache.org>" <common-user@hadoop.apache.org>
    Sent: Monday, July 11, 2011 4:31 AM
    Subject: large data and hbase

    I have a dataset which is several terabytes in size. I would like to
    query
    this data using hbase (sql). Would I need to setup mapreduce to use
    hbase?
    Currently the data is stored in hdfs and I am using `hdfs -cat ` to get
    the
    data and pipe it into stdin.


    --
    --- Get your facts first, then you can distort them as you please.--


    --
    --- Get your facts first, then you can distort them as you please.--


    --
    Harsh J


    --
    --- Get your facts first, then you can distort them as you please.--


    --
    Harsh J

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcommon-user @
categorieshadoop
postedJul 11, '11 at 11:32a
activeJul 13, '11 at 12:27p
posts8
users5
websitehadoop.apache.org...
irc#hadoop

People

Translate

site design / logo © 2022 Grokbase