FAQ
From Steven (who apparently still has issues posting to this group):

"Thanks for the suggestions. The /metrics output looks good now, and the
SELECT COUNT(*) runs much faster than before.

But I still have the "Unknown disk id" error message. My CDH version is:

hadoop-client x86_64 2.0.0+552-1.cdh4.1.2.p0.27.el5 cloudera-cdh4
18 k

hadoop-mapreduce x86_64 2.0.0+552-1.cdh4.1.2.p0.27.el5 cloudera-cdh4
9.8 M

hadoop-yarn x86_64 2.0.0+552-1.cdh4.1.2.p0.27.el5 cloudera-cdh4
8.9 M"
On Tuesday, January 22, 2013 11:40:05 AM UTC-8, Steven Wong wrote:

Hi,

I followed
http://zenfractal.com/2012/11/15/from-zero-to-impala-in-minutes/ to set
up a cluster on EC2. After seeing disappointing performance numbers from a
SELECT COUNT(*), I am following
https://ccp.cloudera.com/display/IMPALA10BETADOC/Configuring+Impala+for+Performance#ConfiguringImpalaforPerformance-TestingImpalaforHighPerformanceConfiguration to
check my cluster setup. Questions:

1. My cluster has 3 data nodes. Is the following
http://<hostname>:<port>/metrics output good?

statestore.backend.state.map:
{
127.0.0.1:23000 : OK
}
statestore.live.backends:3
statestore.live.backends.list:[127.0.0.1:22000]

2. My impalad logs contain "Unknown disk id. This will negatively
affect performance. Check your hdfs settings to enable block location
metadata." and my http://<hostname>:<port>/varz doesn't contain the string
"dfs.datanode.hdfs-blocks-metadata.enabled". But my hdfs-site.xml sets
dfs.datanode.hdfs-blocks-metadata.enabled to true. Why?

3. My impalad.out doesn't contain "Unable to load native-hadoop
library". This is good, I believe.

4. My impalad logs contain the following lines matching the word
"scheduler", but none contains "locality percentage". Why?

/tmp/impalad.INFO:I0122 00:19:09.137197 5121 simple-scheduler.cc:82]
Starting simple scheduler
/tmp/impalad.ip-10-170-17-154.impala.log.INFO.20130122-001901.5121:I0122
00:19:09.137197 5121 simple-scheduler.cc:82] Starting simple scheduler

Thanks.
Steven
--

Search Discussions

  • Steven Wong at Jan 23, 2013 at 11:28 pm
    Thanks for the suggestions. The /metrics output looks good now, and the SELECT COUNT(*) runs much faster than before.

    But I still have the "Unknown disk id" error message. My CDH version is:

    hadoop-client x86_64 2.0.0+552-1.cdh4.1.2.p0.27.el5 cloudera-cdh4 18 k
    hadoop-mapreduce x86_64 2.0.0+552-1.cdh4.1.2.p0.27.el5 cloudera-cdh4 9.8 M
    hadoop-yarn x86_64 2.0.0+552-1.cdh4.1.2.p0.27.el5 cloudera-cdh4 8.9 M



    On Tuesday, January 22, 2013 5:37:30 PM UTC-8, Henry wrote:
    On 22 January 2013 11:40, Steven Wong wrote:
    Hi,

    I followed http://zenfractal.com/2012/11/15/from-zero-to-impala-in-minutes/ to set up a cluster on EC2. After seeing disappointing performance numbers from a SELECT COUNT(*), I am following https://ccp.cloudera.com/display/IMPALA10BETADOC/Configuring+Impala+for+Performance#ConfiguringImpalaforPerformance-TestingImpalaforHighPerformanceConfiguration to check my cluster setup. Questions:

    1. My cluster has 3 data nodes. Is the following http://<hostname>:<port>/metrics output good?

    statestore.backend.state.map:
    {
    127.0.0.1:23000<http://127.0.0.1:23000/> : OK
    }
    statestore.live.backends:3
    statestore.live.backends.list:[127.0.0.1:22000<http://127.0.0.1:22000/>]


    Hi Steven -

    This looks like your problem. Your machines are registering themselves with 'localhost' as their hostname, and this means that they all look the same to the statestore.

    I looked at Matt's zero-to-impala link - it's awesome, but now a little out of date. You should modify where you run impalad to also have --ipaddress and --hostname correctly set for each node. Then check the statestore metrics; things should look a lot better and your performance should improve.


    2. My impalad logs contain "Unknown disk id. This will negatively affect performance. Check your hdfs settings to enable block location metadata." and my http://<hostname>:<port>/varz doesn't contain the string "dfs.datanode.hdfs-blocks-metadata.enabled". But my hdfs-site.xml sets dfs.datanode.hdfs-blocks-metadata.enabled to true. Why?

    What version of CDH are you using?


    3. My impalad.out doesn't contain "Unable to load native-hadoop library". This is good, I believe.

    4. My impalad logs contain the following lines matching the word "scheduler", but none contains "locality percentage". Why?


    The locality percentage is printed only for GLOG_v=1 - and I note that the setup-impala.sh script has a typo where it has GVLOG_v=1. If you fix this, you should see the locality percentage.

    Hope this helps - let us know if things improve.

    Henry


    /tmp/impalad.INFO:I0122 00:19:09.137197 5121 simple-scheduler.cc:82] Starting simple scheduler
    /tmp/impalad.ip-10-170-17-154.impala.log.INFO.20130122-001901.5121:I0122 00:19:09.137197 5121 simple-scheduler.cc:82] Starting simple scheduler

    Thanks.
    Steven


    --





    --
    Henry Robinson
    Software Engineer
    Cloudera
    415-994-6679

    --
  • Steven Wong at Jan 23, 2013 at 11:28 pm
    Thanks for the pointers. Now my http://<hostname>:<port>/metrics output
    contains 3 IPs, and SELECT COUNT(*) runs much faster than before.

    But I still see the log message "Unknown disk id. This will negatively
    affect performance. Check your hdfs settings to enable block location
    metadata." My CDH version is:

    hive noarch 0.9.0+155-1.cdh4.1.2.p0.21.el5 cloudera-cdh4
    26 M
    hadoop-client x86_64 2.0.0+552-1.cdh4.1.2.p0.27.el5 cloudera-cdh4
    18 k
    hadoop-mapreduce x86_64 2.0.0+552-1.cdh4.1.2.p0.27.el5 cloudera-cdh4
    9.8 M
    hadoop-yarn x86_64 2.0.0+552-1.cdh4.1.2.p0.27.el5 cloudera-cdh4
    8.9 M
    hbase noarch 0.92.1+160-1.cdh4.1.2.p0.24.el5
    cloudera-cdh4
    34 M

    And I confirmed that my DNs were restarted with the changed hdfs-site.xml:

    -bash-4.1$ stat -c %y /etc/impala/conf/hdfs-site.xml /proc/$(pgrep -f
    DataNode)
    2013-01-23 02:06:37.000000000 +0000
    2013-01-23 02:06:43.170157571 +0000

    Thanks.

    On Tuesday, January 22, 2013 11:40:05 AM UTC-8, Steven Wong wrote:

    Hi,

    I followed
    http://zenfractal.com/2012/11/15/from-zero-to-impala-in-minutes/ to set
    up a cluster on EC2. After seeing disappointing performance numbers from a
    SELECT COUNT(*), I am following
    https://ccp.cloudera.com/display/IMPALA10BETADOC/Configuring+Impala+for+Performance#ConfiguringImpalaforPerformance-TestingImpalaforHighPerformanceConfiguration to
    check my cluster setup. Questions:

    1. My cluster has 3 data nodes. Is the following
    http://<hostname>:<port>/metrics output good?

    statestore.backend.state.map:
    {
    127.0.0.1:23000 : OK
    }
    statestore.live.backends:3
    statestore.live.backends.list:[127.0.0.1:22000]

    2. My impalad logs contain "Unknown disk id. This will negatively
    affect performance. Check your hdfs settings to enable block location
    metadata." and my http://<hostname>:<port>/varz doesn't contain the string
    "dfs.datanode.hdfs-blocks-metadata.enabled". But my hdfs-site.xml sets
    dfs.datanode.hdfs-blocks-metadata.enabled to true. Why?

    3. My impalad.out doesn't contain "Unable to load native-hadoop
    library". This is good, I believe.

    4. My impalad logs contain the following lines matching the word
    "scheduler", but none contains "locality percentage". Why?

    /tmp/impalad.INFO:I0122 00:19:09.137197 5121 simple-scheduler.cc:82]
    Starting simple scheduler
    /tmp/impalad.ip-10-170-17-154.impala.log.INFO.20130122-001901.5121:I0122
    00:19:09.137197 5121 simple-scheduler.cc:82] Starting simple scheduler

    Thanks.
    Steven
    --
  • Steven Wong at Jan 23, 2013 at 11:29 pm
    Hi,

    I followed http://zenfractal.com/2012/11/15/from-zero-to-impala-in-minutes/
    to set up a cluster on EC2. After seeing disappointing performance numbers
    from a SELECT COUNT(*), I am
    following https://ccp.cloudera.com/display/IMPALA10BETADOC/Configuring+Impala+for+Performance#ConfiguringImpalaforPerformance-TestingImpalaforHighPerformanceConfiguration
    to check my cluster setup. Questions:

    1. My impalad logs contain "Unknown disk id. This will negatively affect
    performance. Check your hdfs settings to enable block location metadata."
    and my http://<hostname>:<port>/varz doesn't contain the string
    "dfs.datanode.hdfs-blocks-metadata.enabled". But my hdfs-site.xml
    sets dfs.datanode.hdfs-blocks-metadata.enabled to true. Why?

    2. My impalad.out doesn't contain "Unable to load native-hadoop library".
    This is good, I believe.

    3. My impalad logs contain the following lines matching the word
    "scheduler", but none contains "locality percentage". Why?

    /tmp/impalad.INFO:I0122 00:19:09.137197 5121 simple-scheduler.cc:82]
    Starting simple scheduler
    /tmp/impalad.ip-10-170-17-154.impala.log.INFO.20130122-001901.5121:I0122
    00:19:09.137197 5121 simple-scheduler.cc:82] Starting simple scheduler

    4. My cluster has 3 data nodes. Is the
    following http://<hostname>:<port>/metrics output good?

    statestore.backend.state.map:
    {
    127.0.0.1:23000 : OK
    }
    statestore.live.backends:3
    statestore.live.backends.list:[127.0.0.1:22000]


    Thanks.
    Steven

    --
  • Steven Wong at Jan 23, 2013 at 11:29 pm
    Hi,

    I followed http://zenfractal.com/2012/11/15/from-zero-to-impala-in-minutes/
    to set up a cluster on EC2. After seeing disappointing performance numbers
    from a SELECT COUNT(*), I am
    following https://ccp.cloudera.com/display/IMPALA10BETADOC/Configuring+Impala+for+Performance#ConfiguringImpalaforPerformance-TestingImpalaforHighPerformanceConfiguration
    to check my cluster setup. Questions:

    1. My impalad logs contain "Unknown disk id. This will negatively affect
    performance. Check your hdfs settings to enable block location metadata."
    and my http://<hostname>:<port>/varz doesn't contain the string
    "dfs.datanode.hdfs-blocks-metadata.enabled". But my hdfs-site.xml
    sets dfs.datanode.hdfs-blocks-metadata.enabled to true. Why?

    2. My impalad.out doesn't contain "Unable to load native-hadoop library".
    This is good, I believe.

    3. My impalad logs contain the following lines matching the word
    "scheduler", but none contains "locality percentage". Why?

    /tmp/impalad.INFO:I0122 00:19:09.137197 5121 simple-scheduler.cc:82]
    Starting simple scheduler
    /tmp/impalad.ip-10-170-17-154.impala.log.INFO.20130122-001901.5121:I0122
    00:19:09.137197 5121 simple-scheduler.cc:82] Starting simple scheduler

    4. My cluster has 3 data nodes. Is the
    following http://<hostname>:<port>/metrics output good?

    statestore.backend.state.map:
    {
    127.0.0.1:23000 : OK
    }
    statestore.live.backends:3
    statestore.live.backends.list:[127.0.0.1:22000]


    Thanks.
    Steven

    --

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupimpala-user @
categorieshadoop
postedJan 23, '13 at 11:28p
activeJan 23, '13 at 11:29p
posts5
users2
websitecloudera.com
irc#hadoop

2 users in discussion

Steven Wong: 4 posts Sriram Krishnan: 1 post

People

Translate

site design / logo © 2022 Grokbase