FAQ
hello,

I have several questions on the physical storage of the HBase:

1. Does HBase store each table in A format:

"com.cnn.www", t6, "<html>...",
"com.cnn.www", t5, "<html>...",
"com.cnn.www", t3, "<html>...",
"com.sohu.www", t8, "<html>..."
"com.sohu.www", t7, "<html>..."

or B fomat:

"com.cnn.www", t6, "<html>...",
t5, "<html>...",
t3, "<html>...",
"com.sohu.www", t8, "<html>...",
t7, "<html>..."

A format treat RowKey and TimeStamp as key, and wastes space of the
RowKey "com.cnn.www" or "com.sohu.www"several times.

While B format treat RowKey as key, and TimeStamp and Column as
attributes. And each row doesn't maintain the same format.

2. Another question, maybe we will get several labels in the same
family at the same time. For example, we will crawl a web page at time
t1, and the page contains 2 anchors, one is a.com, the other is b.com.
How to store it in hbase?

"com.cnn.www", t1, "anchor:a.com", "aaa",
"com.cnn.www", t1, "anchor:b.com", "bbb",
"com.cnn.www", t2, "anchor:c.com", "ccc"

or

"com.cnn.www", t1, "anchor:a.com", "aaa",
"anchor:b.com", "bbb",
"com.cnn.www", t2, "anchor:c.com", "ccc"

thanks!

Bin YANG

--
Bin YANG
Department of Computer Science and Engineering
Fudan University
Shanghai, P. R. China
EMail: yangbinisme82@gmail.com

Search Discussions

  • Edward yoon at Oct 31, 2007 at 8:23 am
    Of course, It should be "B".
    I think it is probably easier to think like a Map> structure.

    ------------------------------
    B. Regards,
    Edward yoon @ NHN, corp.
    Home : http://www.udanax.org

    Date: Wed, 31 Oct 2007 16:03:07 +0800
    From: yangbinisme82@gmail.com
    To: hadoop-user@lucene.apache.org
    Subject: question on HBase storage

    hello,

    I have several questions on the physical storage of the HBase:

    1. Does HBase store each table in A format:

    "com.cnn.www", t6, "...",
    "com.cnn.www", t5, "...",
    "com.cnn.www", t3, "...",
    "com.sohu.www", t8, "..."
    "com.sohu.www", t7, "..."

    or B fomat:

    "com.cnn.www", t6, "...",
    t5, "...",
    t3, "...",
    "com.sohu.www", t8, "...",
    t7, "..."

    A format treat RowKey and TimeStamp as key, and wastes space of the
    RowKey "com.cnn.www" or "com.sohu.www"several times.

    While B format treat RowKey as key, and TimeStamp and Column as
    attributes. And each row doesn't maintain the same format.

    2. Another question, maybe we will get several labels in the same
    family at the same time. For example, we will crawl a web page at time
    t1, and the page contains 2 anchors, one is a.com, the other is b.com.
    How to store it in hbase?

    "com.cnn.www", t1, "anchor:a.com", "aaa",
    "com.cnn.www", t1, "anchor:b.com", "bbb",
    "com.cnn.www", t2, "anchor:c.com", "ccc"

    or

    "com.cnn.www", t1, "anchor:a.com", "aaa",
    "anchor:b.com", "bbb",
    "com.cnn.www", t2, "anchor:c.com", "ccc"

    thanks!

    Bin YANG

    --
    Bin YANG
    Department of Computer Science and Engineering
    Fudan University
    Shanghai, P. R. China
    EMail: yangbinisme82@gmail.com
    _________________________________________________________________
    Windows Live Hotmail and Microsoft Office Outlook – together at last.  Get it now.
    http://office.microsoft.com/en-us/outlook/HA102225181033.aspx?pid=CL100626971033
  • Edward yoon at Oct 31, 2007 at 8:27 am
    Sorry for broken text email.... ;(

    Of course, It should be "B".
    I think it is probably easier to think like a Map(Row, Map(Column, Cell)) structure.


    ------------------------------
    B. Regards,
    Edward yoon @ NHN, corp.
    Home : http://www.udanax.org

    From: webmaster@udanax.org
    To: hadoop-user@lucene.apache.org
    Subject: RE: question on HBase storage
    Date: Wed, 31 Oct 2007 08:22:39 +0000


    Of course, It should be "B".
    I think it is probably easier to think like a Map> structure.

    ------------------------------
    B. Regards,
    Edward yoon @ NHN, corp.
    Home : http://www.udanax.org

    Date: Wed, 31 Oct 2007 16:03:07 +0800
    From: yangbinisme82@gmail.com
    To: hadoop-user@lucene.apache.org
    Subject: question on HBase storage

    hello,

    I have several questions on the physical storage of the HBase:

    1. Does HBase store each table in A format:

    "com.cnn.www", t6, "...",
    "com.cnn.www", t5, "...",
    "com.cnn.www", t3, "...",
    "com.sohu.www", t8, "..."
    "com.sohu.www", t7, "..."

    or B fomat:

    "com.cnn.www", t6, "...",
    t5, "...",
    t3, "...",
    "com.sohu.www", t8, "...",
    t7, "..."

    A format treat RowKey and TimeStamp as key, and wastes space of the
    RowKey "com.cnn.www" or "com.sohu.www"several times.

    While B format treat RowKey as key, and TimeStamp and Column as
    attributes. And each row doesn't maintain the same format.

    2. Another question, maybe we will get several labels in the same
    family at the same time. For example, we will crawl a web page at time
    t1, and the page contains 2 anchors, one is a.com, the other is b.com.
    How to store it in hbase?

    "com.cnn.www", t1, "anchor:a.com", "aaa",
    "com.cnn.www", t1, "anchor:b.com", "bbb",
    "com.cnn.www", t2, "anchor:c.com", "ccc"

    or

    "com.cnn.www", t1, "anchor:a.com", "aaa",
    "anchor:b.com", "bbb",
    "com.cnn.www", t2, "anchor:c.com", "ccc"

    thanks!

    Bin YANG

    --
    Bin YANG
    Department of Computer Science and Engineering
    Fudan University
    Shanghai, P. R. China
    EMail: yangbinisme82@gmail.com
    _________________________________________________________________
    Windows Live Hotmail and Microsoft Office Outlook – together at last. Get it now.
    http://office.microsoft.com/en-us/outlook/HA102225181033.aspx?pid=CL100626971033
    _________________________________________________________________
    Boo! Scare away worms, viruses and so much more! Try Windows Live OneCare!
    http://onecare.live.com/standard/en-us/purchase/trial.aspx?s_cid=wl_hotmailnews
  • Edward yoon at Oct 31, 2007 at 8:28 am
    Sorry for broken text email.... ;(

    Of course, It should be "B".
    I think it is probably easier to think like a Map(Row, Map(Column, Cell)) structure.


    ------------------------------
    B. Regards,
    Edward yoon @ NHN, corp.
    Home : http://www.udanax.org

    From: webmaster@udanax.org
    To: hadoop-user@lucene.apache.org
    Subject: RE: question on HBase storage
    Date: Wed, 31 Oct 2007 08:22:39 +0000


    Of course, It should be "B".
    I think it is probably easier to think like a Map> structure.

    ------------------------------
    B. Regards,
    Edward yoon @ NHN, corp.
    Home : http://www.udanax.org

    Date: Wed, 31 Oct 2007 16:03:07 +0800
    From: yangbinisme82@gmail.com
    To: hadoop-user@lucene.apache.org
    Subject: question on HBase storage

    hello,

    I have several questions on the physical storage of the HBase:

    1. Does HBase store each table in A format:

    "com.cnn.www", t6, "...",
    "com.cnn.www", t5, "...",
    "com.cnn.www", t3, "...",
    "com.sohu.www", t8, "..."
    "com.sohu.www", t7, "..."

    or B fomat:

    "com.cnn.www", t6, "...",
    t5, "...",
    t3, "...",
    "com.sohu.www", t8, "...",
    t7, "..."

    A format treat RowKey and TimeStamp as key, and wastes space of the
    RowKey "com.cnn.www" or "com.sohu.www"several times.

    While B format treat RowKey as key, and TimeStamp and Column as
    attributes. And each row doesn't maintain the same format.

    2. Another question, maybe we will get several labels in the same
    family at the same time. For example, we will crawl a web page at time
    t1, and the page contains 2 anchors, one is a.com, the other is b.com.
    How to store it in hbase?

    "com.cnn.www", t1, "anchor:a.com", "aaa",
    "com.cnn.www", t1, "anchor:b.com", "bbb",
    "com.cnn.www", t2, "anchor:c.com", "ccc"

    or

    "com.cnn.www", t1, "anchor:a.com", "aaa",
    "anchor:b.com", "bbb",
    "com.cnn.www", t2, "anchor:c.com", "ccc"

    thanks!

    Bin YANG

    --
    Bin YANG
    Department of Computer Science and Engineering
    Fudan University
    Shanghai, P. R. China
    EMail: yangbinisme82@gmail.com
    _________________________________________________________________
    Windows Live Hotmail and Microsoft Office Outlook – together at last. Get it now.
    http://office.microsoft.com/en-us/outlook/HA102225181033.aspx?pid=CL100626971033
    _________________________________________________________________
    Windows Live Hotmail and Microsoft Office Outlook – together at last.  Get it now.
    http://office.microsoft.com/en-us/outlook/HA102225181033.aspx?pid=CL100626971033
  • Michael Stack at Oct 31, 2007 at 3:44 pm
    Regards 1. in the below, HBase store is closer to the A format described
    below.

    A table has column families. Each column family is written to a
    HStore. A HStore has HStoreFiles (~SSTables in bigtable-speak).
    HStoreFiles are hadoop MapFiles where the key is
    row/columnname/timestamp (See HStoreKey class) and the value is
    "<html>...." as bytes.

    Regards 2., how will you be accessing the data subsequently?

    St.Ack


    Bin YANG wrote:
    hello,

    I have several questions on the physical storage of the HBase:

    1. Does HBase store each table in A format:

    "com.cnn.www", t6, "<html>...",
    "com.cnn.www", t5, "<html>...",
    "com.cnn.www", t3, "<html>...",
    "com.sohu.www", t8, "<html>..."
    "com.sohu.www", t7, "<html>..."

    or B fomat:

    "com.cnn.www", t6, "<html>...",
    t5, "<html>...",
    t3, "<html>...",
    "com.sohu.www", t8, "<html>...",
    t7, "<html>..."

    A format treat RowKey and TimeStamp as key, and wastes space of the
    RowKey "com.cnn.www" or "com.sohu.www"several times.

    While B format treat RowKey as key, and TimeStamp and Column as
    attributes. And each row doesn't maintain the same format.

    2. Another question, maybe we will get several labels in the same
    family at the same time. For example, we will crawl a web page at time
    t1, and the page contains 2 anchors, one is a.com, the other is b.com.
    How to store it in hbase?

    "com.cnn.www", t1, "anchor:a.com", "aaa",
    "com.cnn.www", t1, "anchor:b.com", "bbb",
    "com.cnn.www", t2, "anchor:c.com", "ccc"

    or

    "com.cnn.www", t1, "anchor:a.com", "aaa",
    "anchor:b.com", "bbb",
    "com.cnn.www", t2, "anchor:c.com", "ccc"

    thanks!

    Bin YANG

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcommon-user @
categorieshadoop
postedOct 31, '07 at 8:03a
activeOct 31, '07 at 3:44p
posts5
users3
websitehadoop.apache.org...
irc#hadoop

People

Translate

site design / logo © 2022 Grokbase