FAQ
Hi all,

I need to have 2 nodes to store approx. 100 million rows of data using
Hbase/Hadoop.

I know by default hadoop replicates in 3 nodes. Due to money crunch we are
thinking about 2 nodes for now.
Also, the support fee from Cloudera is per node based.


For now, the input/output load on this machine is as follows.

It houses one single Hbase table which sits on Hadoop
Incoming data from sensor - every 5 seconds - 50 rows are created 24/7 in
the Hbase table.
Querying - For now there are 50 users who query this data from this Hbase
table every 10 seconds using website dashboard.

Can some one suggest me a machine configuration for the above set up please?

Thanks

--

Search Discussions

  • Mike at Nov 28, 2012 at 7:18 pm
    Could you please respond?

    Thanks

    On Tuesday, November 20, 2012 4:12:26 PM UTC-5, Mike wrote:

    Hi all,

    I need to have 2 nodes to store approx. 100 million rows of data using
    Hbase/Hadoop.

    I know by default hadoop replicates in 3 nodes. Due to money crunch we are
    thinking about 2 nodes for now.
    Also, the support fee from Cloudera is per node based.


    For now, the input/output load on this machine is as follows.

    It houses one single Hbase table which sits on Hadoop
    Incoming data from sensor - every 5 seconds - 50 rows are created 24/7 in
    the Hbase table.
    Querying - For now there are 50 users who query this data from this Hbase
    table every 10 seconds using website dashboard.

    Can some one suggest me a machine configuration for the above set up
    please?

    Thanks
    --
  • Kevin O'dell at Nov 28, 2012 at 7:25 pm
    It houses one single Hbase table which sits on Hadoop <-- What is the size
    of that table and how many CFs?
    Incoming data from sensor - every 5 seconds - 50 rows are created 24/7 in
    the Hbase table. What is the average size of a row?
    Querying - For now there are 50 users who query this data from this Hbase
    table every 10 seconds using website dashboard. Are they doing Scans or
    Gets? Is it sequential or Random?

    We need to know more about the use case before we can make these
    recommendations.
    On Wed, Nov 28, 2012 at 2:18 PM, Mike wrote:

    Could you please respond?

    Thanks


    On Tuesday, November 20, 2012 4:12:26 PM UTC-5, Mike wrote:

    Hi all,

    I need to have 2 nodes to store approx. 100 million rows of data using
    Hbase/Hadoop.

    I know by default hadoop replicates in 3 nodes. Due to money crunch we
    are thinking about 2 nodes for now.
    Also, the support fee from Cloudera is per node based.


    For now, the input/output load on this machine is as follows.

    It houses one single Hbase table which sits on Hadoop
    Incoming data from sensor - every 5 seconds - 50 rows are created 24/7 in
    the Hbase table.
    Querying - For now there are 50 users who query this data from this Hbase
    table every 10 seconds using website dashboard.

    Can some one suggest me a machine configuration for the above set up
    please?

    Thanks
    --




    --
    Kevin O'Dell
    Customer Operations Engineer, Cloudera

    --
  • Mike at Nov 28, 2012 at 9:09 pm
    Thanks Kevin. Here are the details.
    Table will hold around 200 million rows and there will have 6 CFs.
    100 columns/cells per row(in all 6 CFs). Each cell is of type float (size
    - 12 max).
    Dashboard uses "Scans" to query data from hbase and it is random. Thanks
    On Wednesday, November 28, 2012 2:25:19 PM UTC-5, Kevin O'dell wrote:

    It houses one single Hbase table which sits on Hadoop <-- What is the
    size of that table and how many CFs?
    Incoming data from sensor - every 5 seconds - 50 rows are created 24/7 in
    the Hbase table. What is the average size of a row?
    Querying - For now there are 50 users who query this data from this Hbase
    table every 10 seconds using website dashboard. Are they doing Scans or
    Gets? Is it sequential or Random?

    We need to know more about the use case before we can make these
    recommendations.

    On Wed, Nov 28, 2012 at 2:18 PM, Mike <mike...@gmail.com <javascript:>>wrote:
    Could you please respond?

    Thanks


    On Tuesday, November 20, 2012 4:12:26 PM UTC-5, Mike wrote:

    Hi all,

    I need to have 2 nodes to store approx. 100 million rows of data using
    Hbase/Hadoop.

    I know by default hadoop replicates in 3 nodes. Due to money crunch we
    are thinking about 2 nodes for now.
    Also, the support fee from Cloudera is per node based.


    For now, the input/output load on this machine is as follows.

    It houses one single Hbase table which sits on Hadoop
    Incoming data from sensor - every 5 seconds - 50 rows are created 24/7
    in the Hbase table.
    Querying - For now there are 50 users who query this data from this
    Hbase table every 10 seconds using website dashboard.

    Can some one suggest me a machine configuration for the above set up
    please?

    Thanks
    --




    --
    Kevin O'Dell
    Customer Operations Engineer, Cloudera
    --
  • Kevin O'dell at Nov 28, 2012 at 9:09 pm
    Mike,

    It sounds like to total table will be about 74GB? Does my math sound
    right to you?
    On Wed, Nov 28, 2012 at 4:01 PM, Mike wrote:

    Thanks Kevin. Here are the details.
    Table will hold around 200 million rows and there will have 6 CFs.
    100 columns/cells per row(in all 6 CFs). Each cell is of type float
    (size - 12 max).
    Dashboard uses "Scans" to query data from hbase and it is random.
    Thanks

    On Wednesday, November 28, 2012 2:25:19 PM UTC-5, Kevin O'dell wrote:

    It houses one single Hbase table which sits on Hadoop <-- What is the
    size of that table and how many CFs?
    Incoming data from sensor - every 5 seconds - 50 rows are created 24/7 in
    the Hbase table. What is the average size of a row?
    Querying - For now there are 50 users who query this data from this Hbase
    table every 10 seconds using website dashboard. Are they doing Scans or
    Gets? Is it sequential or Random?

    We need to know more about the use case before we can make these
    recommendations.
    On Wed, Nov 28, 2012 at 2:18 PM, Mike wrote:

    Could you please respond?

    Thanks


    On Tuesday, November 20, 2012 4:12:26 PM UTC-5, Mike wrote:

    Hi all,

    I need to have 2 nodes to store approx. 100 million rows of data using
    Hbase/Hadoop.

    I know by default hadoop replicates in 3 nodes. Due to money crunch we
    are thinking about 2 nodes for now.
    Also, the support fee from Cloudera is per node based.


    For now, the input/output load on this machine is as follows.

    It houses one single Hbase table which sits on Hadoop
    Incoming data from sensor - every 5 seconds - 50 rows are created 24/7
    in the Hbase table.
    Querying - For now there are 50 users who query this data from this
    Hbase table every 10 seconds using website dashboard.

    Can some one suggest me a machine configuration for the above set up
    please?

    Thanks
    --




    --
    Kevin O'Dell
    Customer Operations Engineer, Cloudera
    --




    --
    Kevin O'Dell
    Customer Operations Engineer, Cloudera

    --
  • Mike at Nov 28, 2012 at 9:44 pm
    Let us make it as 150GB.

    Thanks
    On Wednesday, November 28, 2012 4:09:47 PM UTC-5, Kevin O'dell wrote:

    Mike,

    It sounds like to total table will be about 74GB? Does my math sound
    right to you?

    On Wed, Nov 28, 2012 at 4:01 PM, Mike <mike...@gmail.com <javascript:>>wrote:
    Thanks Kevin. Here are the details.
    Table will hold around 200 million rows and there will have 6 CFs.
    100 columns/cells per row(in all 6 CFs). Each cell is of type float
    (size - 12 max).
    Dashboard uses "Scans" to query data from hbase and it is random.
    Thanks

    On Wednesday, November 28, 2012 2:25:19 PM UTC-5, Kevin O'dell wrote:

    It houses one single Hbase table which sits on Hadoop <-- What is the
    size of that table and how many CFs?
    Incoming data from sensor - every 5 seconds - 50 rows are created 24/7
    in the Hbase table. What is the average size of a row?
    Querying - For now there are 50 users who query this data from this
    Hbase table every 10 seconds using website dashboard. Are they doing Scans
    or Gets? Is it sequential or Random?

    We need to know more about the use case before we can make these
    recommendations.
    On Wed, Nov 28, 2012 at 2:18 PM, Mike wrote:

    Could you please respond?

    Thanks


    On Tuesday, November 20, 2012 4:12:26 PM UTC-5, Mike wrote:

    Hi all,

    I need to have 2 nodes to store approx. 100 million rows of data using
    Hbase/Hadoop.

    I know by default hadoop replicates in 3 nodes. Due to money crunch we
    are thinking about 2 nodes for now.
    Also, the support fee from Cloudera is per node based.


    For now, the input/output load on this machine is as follows.

    It houses one single Hbase table which sits on Hadoop
    Incoming data from sensor - every 5 seconds - 50 rows are created 24/7
    in the Hbase table.
    Querying - For now there are 50 users who query this data from this
    Hbase table every 10 seconds using website dashboard.

    Can some one suggest me a machine configuration for the above set up
    please?

    Thanks
    --




    --
    Kevin O'Dell
    Customer Operations Engineer, Cloudera
    --




    --
    Kevin O'Dell
    Customer Operations Engineer, Cloudera
    --
  • Mike at Nov 29, 2012 at 5:39 pm
    Could you please respond?
    On Wednesday, November 28, 2012 4:36:15 PM UTC-5, Mike wrote:

    Let us make it as 150GB.

    Thanks
    On Wednesday, November 28, 2012 4:09:47 PM UTC-5, Kevin O'dell wrote:

    Mike,

    It sounds like to total table will be about 74GB? Does my math sound
    right to you?
    On Wed, Nov 28, 2012 at 4:01 PM, Mike wrote:

    Thanks Kevin. Here are the details.
    Table will hold around 200 million rows and there will have 6 CFs.
    100 columns/cells per row(in all 6 CFs). Each cell is of type float
    (size - 12 max).
    Dashboard uses "Scans" to query data from hbase and it is random.
    Thanks

    On Wednesday, November 28, 2012 2:25:19 PM UTC-5, Kevin O'dell wrote:

    It houses one single Hbase table which sits on Hadoop <-- What is the
    size of that table and how many CFs?
    Incoming data from sensor - every 5 seconds - 50 rows are created 24/7
    in the Hbase table. What is the average size of a row?
    Querying - For now there are 50 users who query this data from this
    Hbase table every 10 seconds using website dashboard. Are they doing Scans
    or Gets? Is it sequential or Random?

    We need to know more about the use case before we can make these
    recommendations.
    On Wed, Nov 28, 2012 at 2:18 PM, Mike wrote:

    Could you please respond?

    Thanks


    On Tuesday, November 20, 2012 4:12:26 PM UTC-5, Mike wrote:

    Hi all,

    I need to have 2 nodes to store approx. 100 million rows of data
    using Hbase/Hadoop.

    I know by default hadoop replicates in 3 nodes. Due to money crunch
    we are thinking about 2 nodes for now.
    Also, the support fee from Cloudera is per node based.


    For now, the input/output load on this machine is as follows.

    It houses one single Hbase table which sits on Hadoop
    Incoming data from sensor - every 5 seconds - 50 rows are created
    24/7 in the Hbase table.
    Querying - For now there are 50 users who query this data from this
    Hbase table every 10 seconds using website dashboard.

    Can some one suggest me a machine configuration for the above set up
    please?

    Thanks
    --




    --
    Kevin O'Dell
    Customer Operations Engineer, Cloudera
    --




    --
    Kevin O'Dell
    Customer Operations Engineer, Cloudera
    --
  • Kevin O'dell at Nov 29, 2012 at 8:49 pm
    Mike,

    Is it 4 nodes with two dedicated for NN and standby node or 2 nodes total?
    On Thu, Nov 29, 2012 at 12:32 PM, Mike wrote:

    Could you please respond?

    On Wednesday, November 28, 2012 4:36:15 PM UTC-5, Mike wrote:

    Let us make it as 150GB.

    Thanks
    On Wednesday, November 28, 2012 4:09:47 PM UTC-5, Kevin O'dell wrote:

    Mike,

    It sounds like to total table will be about 74GB? Does my math sound
    right to you?
    On Wed, Nov 28, 2012 at 4:01 PM, Mike wrote:

    Thanks Kevin. Here are the details.
    Table will hold around 200 million rows and there will have 6 CFs.
    100 columns/cells per row(in all 6 CFs). Each cell is of type float
    (size - 12 max).
    Dashboard uses "Scans" to query data from hbase and it is random.
    Thanks

    On Wednesday, November 28, 2012 2:25:19 PM UTC-5, Kevin O'dell wrote:

    It houses one single Hbase table which sits on Hadoop <-- What is the
    size of that table and how many CFs?
    Incoming data from sensor - every 5 seconds - 50 rows are created 24/7
    in the Hbase table. What is the average size of a row?
    Querying - For now there are 50 users who query this data from this
    Hbase table every 10 seconds using website dashboard. Are they doing Scans
    or Gets? Is it sequential or Random?

    We need to know more about the use case before we can make these
    recommendations.
    On Wed, Nov 28, 2012 at 2:18 PM, Mike wrote:

    Could you please respond?

    Thanks


    On Tuesday, November 20, 2012 4:12:26 PM UTC-5, Mike wrote:

    Hi all,

    I need to have 2 nodes to store approx. 100 million rows of data
    using Hbase/Hadoop.

    I know by default hadoop replicates in 3 nodes. Due to money crunch
    we are thinking about 2 nodes for now.
    Also, the support fee from Cloudera is per node based.


    For now, the input/output load on this machine is as follows.

    It houses one single Hbase table which sits on Hadoop
    Incoming data from sensor - every 5 seconds - 50 rows are created
    24/7 in the Hbase table.
    Querying - For now there are 50 users who query this data from this
    Hbase table every 10 seconds using website dashboard.

    Can some one suggest me a machine configuration for the above set up
    please?

    Thanks
    --




    --
    Kevin O'Dell
    Customer Operations Engineer, Cloudera
    --




    --
    Kevin O'Dell
    Customer Operations Engineer, Cloudera
    --



    --
    Kevin O'Dell
    Customer Operations Engineer, Cloudera

    --
  • Mike at Nov 30, 2012 at 6:54 pm
    2 nodes total.
    On Thursday, November 29, 2012 3:49:41 PM UTC-5, Kevin O'dell wrote:

    Mike,

    Is it 4 nodes with two dedicated for NN and standby node or 2 nodes
    total?

    On Thu, Nov 29, 2012 at 12:32 PM, Mike <mike...@gmail.com <javascript:>>wrote:
    Could you please respond?

    On Wednesday, November 28, 2012 4:36:15 PM UTC-5, Mike wrote:

    Let us make it as 150GB.

    Thanks
    On Wednesday, November 28, 2012 4:09:47 PM UTC-5, Kevin O'dell wrote:

    Mike,

    It sounds like to total table will be about 74GB? Does my math sound
    right to you?
    On Wed, Nov 28, 2012 at 4:01 PM, Mike wrote:

    Thanks Kevin. Here are the details.
    Table will hold around 200 million rows and there will have 6 CFs.
    100 columns/cells per row(in all 6 CFs). Each cell is of type float
    (size - 12 max).
    Dashboard uses "Scans" to query data from hbase and it is random.
    Thanks

    On Wednesday, November 28, 2012 2:25:19 PM UTC-5, Kevin O'dell wrote:

    It houses one single Hbase table which sits on Hadoop <-- What is
    the size of that table and how many CFs?
    Incoming data from sensor - every 5 seconds - 50 rows are created
    24/7 in the Hbase table. What is the average size of a row?
    Querying - For now there are 50 users who query this data from this
    Hbase table every 10 seconds using website dashboard. Are they doing Scans
    or Gets? Is it sequential or Random?

    We need to know more about the use case before we can make these
    recommendations.
    On Wed, Nov 28, 2012 at 2:18 PM, Mike wrote:

    Could you please respond?

    Thanks


    On Tuesday, November 20, 2012 4:12:26 PM UTC-5, Mike wrote:

    Hi all,

    I need to have 2 nodes to store approx. 100 million rows of data
    using Hbase/Hadoop.

    I know by default hadoop replicates in 3 nodes. Due to money crunch
    we are thinking about 2 nodes for now.
    Also, the support fee from Cloudera is per node based.


    For now, the input/output load on this machine is as follows.

    It houses one single Hbase table which sits on Hadoop
    Incoming data from sensor - every 5 seconds - 50 rows are created
    24/7 in the Hbase table.
    Querying - For now there are 50 users who query this data from this
    Hbase table every 10 seconds using website dashboard.

    Can some one suggest me a machine configuration for the above set
    up please?

    Thanks
    --




    --
    Kevin O'Dell
    Customer Operations Engineer, Cloudera
    --




    --
    Kevin O'Dell
    Customer Operations Engineer, Cloudera
    --



    --
    Kevin O'Dell
    Customer Operations Engineer, Cloudera
    --
  • Kevin O'dell at Nov 30, 2012 at 6:59 pm
    Thanks Mike. There is not a definitive set of standards but since as your
    will be growing this you will want these boxes to be beefy since they will
    end up being your NN and standy node boxes in the long run. If you are
    going to do MR as well you should probably do something like:

    6 - 12 1-2 TB drives

    dual quad core from intel for the hyperthreading

    32 - 64 GB of RAM depending on how large you think these projects are going
    to scale.

    That is a pretty standard setup.

    On Fri, Nov 30, 2012 at 1:54 PM, Mike wrote:

    2 nodes total.

    On Thursday, November 29, 2012 3:49:41 PM UTC-5, Kevin O'dell wrote:

    Mike,

    Is it 4 nodes with two dedicated for NN and standby node or 2 nodes
    total?
    On Thu, Nov 29, 2012 at 12:32 PM, Mike wrote:

    Could you please respond?

    On Wednesday, November 28, 2012 4:36:15 PM UTC-5, Mike wrote:

    Let us make it as 150GB.

    Thanks
    On Wednesday, November 28, 2012 4:09:47 PM UTC-5, Kevin O'dell wrote:

    Mike,

    It sounds like to total table will be about 74GB? Does my math
    sound right to you?
    On Wed, Nov 28, 2012 at 4:01 PM, Mike wrote:

    Thanks Kevin. Here are the details.
    Table will hold around 200 million rows and there will have 6 CFs.
    100 columns/cells per row(in all 6 CFs). Each cell is of type float
    (size - 12 max).
    Dashboard uses "Scans" to query data from hbase and it is random.
    Thanks

    On Wednesday, November 28, 2012 2:25:19 PM UTC-5, Kevin O'dell wrote:

    It houses one single Hbase table which sits on Hadoop <-- What is
    the size of that table and how many CFs?
    Incoming data from sensor - every 5 seconds - 50 rows are created
    24/7 in the Hbase table. What is the average size of a row?
    Querying - For now there are 50 users who query this data from this
    Hbase table every 10 seconds using website dashboard. Are they doing Scans
    or Gets? Is it sequential or Random?

    We need to know more about the use case before we can make these
    recommendations.
    On Wed, Nov 28, 2012 at 2:18 PM, Mike wrote:

    Could you please respond?

    Thanks


    On Tuesday, November 20, 2012 4:12:26 PM UTC-5, Mike wrote:

    Hi all,

    I need to have 2 nodes to store approx. 100 million rows of data
    using Hbase/Hadoop.

    I know by default hadoop replicates in 3 nodes. Due to money
    crunch we are thinking about 2 nodes for now.
    Also, the support fee from Cloudera is per node based.


    For now, the input/output load on this machine is as follows.

    It houses one single Hbase table which sits on Hadoop
    Incoming data from sensor - every 5 seconds - 50 rows are created
    24/7 in the Hbase table.
    Querying - For now there are 50 users who query this data from
    this Hbase table every 10 seconds using website dashboard.

    Can some one suggest me a machine configuration for the above set
    up please?

    Thanks
    --




    --
    Kevin O'Dell
    Customer Operations Engineer, Cloudera
    --




    --
    Kevin O'Dell
    Customer Operations Engineer, Cloudera
    --



    --
    Kevin O'Dell
    Customer Operations Engineer, Cloudera
    --




    --
    Kevin O'Dell
    Customer Operations Engineer, Cloudera

    --
  • Ricky Saltzer at Nov 28, 2012 at 7:28 pm
    Hi Mike -

    An older blog, but still relevant describes some basic hardware
    recommendations.

    http://blog.cloudera.com/blog/2010/03/clouderas-support-team-shares-some-basic-hardware-recommendations/

    It's good practice to have several disks in each node for this type of
    case, you're not doing much computation since most of your workload seems
    to be PUT/GET operations. Is your row-key designed in such a way that you
    can take advantage of SCAN operations?

    Ricky


    On Wed, Nov 28, 2012 at 2:18 PM, Mike wrote:

    Could you please respond?

    Thanks


    On Tuesday, November 20, 2012 4:12:26 PM UTC-5, Mike wrote:

    Hi all,

    I need to have 2 nodes to store approx. 100 million rows of data using
    Hbase/Hadoop.

    I know by default hadoop replicates in 3 nodes. Due to money crunch we
    are thinking about 2 nodes for now.
    Also, the support fee from Cloudera is per node based.


    For now, the input/output load on this machine is as follows.

    It houses one single Hbase table which sits on Hadoop
    Incoming data from sensor - every 5 seconds - 50 rows are created 24/7 in
    the Hbase table.
    Querying - For now there are 50 users who query this data from this Hbase
    table every 10 seconds using website dashboard.

    Can some one suggest me a machine configuration for the above set up
    please?

    Thanks
    --




    --

    --

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcdh-user @
categorieshadoop
postedNov 20, '12 at 9:12p
activeNov 30, '12 at 6:59p
posts11
users3
websitecloudera.com
irc#hadoop

People

Translate

site design / logo © 2022 Grokbase