Grokbase Groups HBase user March 2011
FAQ
Hi,

I am trying to crawl several thousands of rss feeds every 30 minutes.

I thought I could use Hadoop and HBase as my platform.

However, I am not familiar with the HBase architecture and was wondering if
I could insert crawled news articles directly into HBase without first
saving it into HDFS.
I am asking this dumb question because all the HBase examples I saw in
reference books are always starting with saving data to HDFS.

And also, If I have 2 computers comprised of A for HDFS, and B for HBase,
what happens when I insert data directly into HBase?
Is the data stored in B automatically and a pointer is made to A?
Or is the data stored in A and a pointer is made to itself?
I really have no idea how HBase operates :(

Search Discussions

  • Michael Segel at Mar 1, 2011 at 12:36 pm
    Hi,

    The short answer is yes.

    I would have to guess that the reason you see examples of data first going to HDFS is that they are using a map/reduce to insert the data. You don't have to do this.
    You can make a connection from any "cloud aware" machine (Actually its just loading the proper config data in to your java app. (And even that doesn't have to be java.)

    The simplest thing is that you open a connection to HBase, instantiate an HTable object (target), and then a Put object that you use to write to HBase.

    HTH

    -Mike

    Date: Tue, 1 Mar 2011 20:28:58 +0900
    Subject: Inserting data directly into HBase?
    From: mp2893@gmail.com
    To: common-user@hadoop.apache.org; user@hbase.apache.org

    Hi,

    I am trying to crawl several thousands of rss feeds every 30 minutes.

    I thought I could use Hadoop and HBase as my platform.

    However, I am not familiar with the HBase architecture and was wondering if
    I could insert crawled news articles directly into HBase without first
    saving it into HDFS.
    I am asking this dumb question because all the HBase examples I saw in
    reference books are always starting with saving data to HDFS.

    And also, If I have 2 computers comprised of A for HDFS, and B for HBase,
    what happens when I insert data directly into HBase?
    Is the data stored in B automatically and a pointer is made to A?
    Or is the data stored in A and a pointer is made to itself?
    I really have no idea how HBase operates :(
  • Edward choi at Mar 2, 2011 at 12:12 am
    Thanks Michael.
    That cleared things up for me. :)

    2011/3/1 Michael Segel <michael_segel@hotmail.com>
    Hi,

    The short answer is yes.

    I would have to guess that the reason you see examples of data first going
    to HDFS is that they are using a map/reduce to insert the data. You don't
    have to do this.
    You can make a connection from any "cloud aware" machine (Actually its just
    loading the proper config data in to your java app. (And even that doesn't
    have to be java.)

    The simplest thing is that you open a connection to HBase, instantiate an
    HTable object (target), and then a Put object that you use to write to
    HBase.

    HTH

    -Mike

    Date: Tue, 1 Mar 2011 20:28:58 +0900
    Subject: Inserting data directly into HBase?
    From: mp2893@gmail.com
    To: common-user@hadoop.apache.org; user@hbase.apache.org

    Hi,

    I am trying to crawl several thousands of rss feeds every 30 minutes.

    I thought I could use Hadoop and HBase as my platform.

    However, I am not familiar with the HBase architecture and was wondering if
    I could insert crawled news articles directly into HBase without first
    saving it into HDFS.
    I am asking this dumb question because all the HBase examples I saw in
    reference books are always starting with saving data to HDFS.

    And also, If I have 2 computers comprised of A for HDFS, and B for HBase,
    what happens when I insert data directly into HBase?
    Is the data stored in B automatically and a pointer is made to A?
    Or is the data stored in A and a pointer is made to itself?
    I really have no idea how HBase operates :(

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupuser @
categorieshbase, hadoop
postedMar 1, '11 at 11:29a
activeMar 2, '11 at 12:12a
posts3
users2
websitehbase.apache.org

2 users in discussion

Edward choi: 2 posts Michael Segel: 1 post

People

Translate

site design / logo © 2022 Grokbase