FAQ
Dear all,

   I encountered an unknown error 255 while inserting about 1 GB size of 1
year data into table partitioned by date,

   if works fine with 1 year data, but when I increased to 10 year(3650
days) and replicated 10 times of record per day,

   it ran an error while inserting the data at the 2557th day(it varies
among different days), it seems like the hdfs went down and can not offer
more

   Is there any limit on the partition numbers (may be cannot partition over
a particular number), below is the error log, thanks.

I0905 11:24:14.917146 9966 status.cc:44] Failed to write row (length: 628)
to Hdfs file:
hdfs://hadoopcluster/user/hive/warehouse/impala_6052.db/related_4_10_partition/.-842472423170165391-8979331812909578124_811768275_dir/qdate=2557/-842472423170165391-8979331812909578124_1836752452_data.0
Error(255): Unknown error 255
     @ 0x83af7d (unknown)
     @ 0x96542d (unknown)
     @ 0x92a63f (unknown)
     @ 0x912df6 (unknown)
     @ 0x7fd2ba (unknown)
     @ 0x7fd72e (unknown)
     @ 0x69527d (unknown)
     @ 0x69bbbb (unknown)
     @ 0x9a36c4 (unknown)
     @ 0x3176c07851 (unknown)
     @ 0x31768e890d (unknown)
I0905 11:24:14.926785 19689 coordinator.cc:870] Cancel()
query_id=f44eefcb73e13571:7c9cfeb594d6f78b
I0905 11:24:14.926875 19689 coordinator.cc:922] sending CancelPlanFragment
rpc for instance_id=f44eefcb73e13571:7c9cfeb594d6f78d backend=DN3.lab:22000
I0905 11:24:14.929299 19689 coordinator.cc:922] sending CancelPlanFragment
rpc for instance_id=f44eefcb73e13571:7c9cfeb594d6f78e backend=DN1.lab:22000
I0905 11:24:14.929847 8751 client-cache.cc:98] CreateClient(): adding new
client for DN1.lab:22000
I0905 11:24:14.931853 19689 coordinator.cc:922] sending CancelPlanFragment
rpc for instance_id=f44eefcb73e13571:7c9cfeb594d6f78f backend=DN2.lab:22000
I0905 11:24:14.932543 9965 impala-server.cc:1284] CancelPlanFragment():
instance_id=f44eefcb73e13571:7c9cfeb594d6f78f
I0905 11:24:14.932713 9965 plan-fragment-executor.cc:419] Cancel():
instance_id=f44eefcb73e13571:7c9cfeb594d6f78f
I0905 11:24:14.932920 9965 data-stream-mgr.cc:302] cancelling all streams
for fragment=f44eefcb73e13571:7c9cfeb594d6f78f
I0905 11:24:14.933126 19689 coordinator.cc:922] sending CancelPlanFragment
rpc for instance_id=f44eefcb73e13571:7c9cfeb594d6f790 backend=DN3.lab:22000
I0905 11:24:14.934126 19689 coordinator.cc:922] sending CancelPlanFragment
rpc for instance_id=f44eefcb73e13571:7c9cfeb594d6f791 backend=DN1.lab:22000
I0905 11:24:14.936192 19689 coordinator.cc:1209] Final profile for
query_id=f44eefcb73e13571:7c9cfeb594d6f78b
Execution Profile f44eefcb73e13571:7c9cfeb594d6f78b:(Active: 230.8ms, %
non-child: 0.00%)
   Per Node Peak Memory Usage: DN2.lab:22000(1.90 GB) DN1.lab:22000(1.94 GB)
DN3.lab:22000(1.94 GB)

To unsubscribe from this group and stop receiving emails from it, send an email to impala-user+unsubscribe@cloudera.org.

Search Discussions

  • John Russell at Sep 6, 2013 at 4:55 pm
    Eric, that looks like you are hitting a known issue where the number of simultaneous file handles and concurrent threads goes past the HDFS limit for the node, sometimes putting HDFS into an error state. The max "transceiver" limit for HDFS is 4096, it needs to be higher for an insert operation involving so many partitions. The HDFS property to change is actually spelled "xciever", i.e. dfs.datanode.max.xcievers.

    Here's a blog post with some more detail, in an HBase context: http://blog.cloudera.com/blog/2012/03/hbase-hadoop-xceivers/

    John
    On Sep 6, 2013, at 4:52 AM, Eric Huang wrote:

    Dear all,

    I encountered an unknown error 255 while inserting about 1 GB size of 1 year data into table partitioned by date,

    if works fine with 1 year data, but when I increased to 10 year(3650 days) and replicated 10 times of record per day,

    it ran an error while inserting the data at the 2557th day(it varies among different days), it seems like the hdfs went down and can not offer more

    Is there any limit on the partition numbers (may be cannot partition over a particular number), below is the error log, thanks.

    I0905 11:24:14.917146 9966 status.cc:44] Failed to write row (length: 628) to Hdfs file: hdfs://hadoopcluster/user/hive/warehouse/impala_6052.db/related_4_10_partition/.-842472423170165391-8979331812909578124_811768275_dir/qdate=2557/-842472423170165391-8979331812909578124_1836752452_data.0
    Error(255): Unknown error 255
    @ 0x83af7d (unknown)
    @ 0x96542d (unknown)
    @ 0x92a63f (unknown)
    @ 0x912df6 (unknown)
    @ 0x7fd2ba (unknown)
    @ 0x7fd72e (unknown)
    @ 0x69527d (unknown)
    @ 0x69bbbb (unknown)
    @ 0x9a36c4 (unknown)
    @ 0x3176c07851 (unknown)
    @ 0x31768e890d (unknown)
    I0905 11:24:14.926785 19689 coordinator.cc:870] Cancel() query_id=f44eefcb73e13571:7c9cfeb594d6f78b
    I0905 11:24:14.926875 19689 coordinator.cc:922] sending CancelPlanFragment rpc for instance_id=f44eefcb73e13571:7c9cfeb594d6f78d backend=DN3.lab:22000
    I0905 11:24:14.929299 19689 coordinator.cc:922] sending CancelPlanFragment rpc for instance_id=f44eefcb73e13571:7c9cfeb594d6f78e backend=DN1.lab:22000
    I0905 11:24:14.929847 8751 client-cache.cc:98] CreateClient(): adding new client for DN1.lab:22000
    I0905 11:24:14.931853 19689 coordinator.cc:922] sending CancelPlanFragment rpc for instance_id=f44eefcb73e13571:7c9cfeb594d6f78f backend=DN2.lab:22000
    I0905 11:24:14.932543 9965 impala-server.cc:1284] CancelPlanFragment(): instance_id=f44eefcb73e13571:7c9cfeb594d6f78f
    I0905 11:24:14.932713 9965 plan-fragment-executor.cc:419] Cancel(): instance_id=f44eefcb73e13571:7c9cfeb594d6f78f
    I0905 11:24:14.932920 9965 data-stream-mgr.cc:302] cancelling all streams for fragment=f44eefcb73e13571:7c9cfeb594d6f78f
    I0905 11:24:14.933126 19689 coordinator.cc:922] sending CancelPlanFragment rpc for instance_id=f44eefcb73e13571:7c9cfeb594d6f790 backend=DN3.lab:22000
    I0905 11:24:14.934126 19689 coordinator.cc:922] sending CancelPlanFragment rpc for instance_id=f44eefcb73e13571:7c9cfeb594d6f791 backend=DN1.lab:22000
    I0905 11:24:14.936192 19689 coordinator.cc:1209] Final profile for query_id=f44eefcb73e13571:7c9cfeb594d6f78b
    Execution Profile f44eefcb73e13571:7c9cfeb594d6f78b:(Active: 230.8ms, % non-child: 0.00%)
    Per Node Peak Memory Usage: DN2.lab:22000(1.90 GB) DN1.lab:22000(1.94 GB) DN3.lab:22000(1.94 GB)

    To unsubscribe from this group and stop receiving emails from it, send an email to impala-user+unsubscribe@cloudera.org.
    To unsubscribe from this group and stop receiving emails from it, send an email to impala-user+unsubscribe@cloudera.org.
  • John Russell at Sep 6, 2013 at 7:32 pm
    3 more data points about this particular issue:

    - The issue that actually puts HDFS into the error state is resolved in Impala 1.1.1 which came out recently, so I'd suggest upgrading.

    - It still is possible to exceed the dfs.datanode.max.xcievers limit, so you might need to bump that setting and/or split your INSERT into multiple smaller ones.

    - However, there is also an improvement in Impala 1.1.1 to the planning for such INSERTs into partitioned tables, to reduce the number of files written on each node.

    So overall, when you upgrade to Impala 1.1.1, the issue might disappear entirely for you. Even if doesn't, you will just see a query failure with no harmful aftereffects for the HDFS node.

    Hope that helps,
    John
    On Sep 6, 2013, at 9:55 AM, John Russell wrote:

    Eric, that looks like you are hitting a known issue where the number of simultaneous file handles and concurrent threads goes past the HDFS limit for the node, sometimes putting HDFS into an error state. The max "transceiver" limit for HDFS is 4096, it needs to be higher for an insert operation involving so many partitions. The HDFS property to change is actually spelled "xciever", i.e. dfs.datanode.max.xcievers.

    Here's a blog post with some more detail, in an HBase context: http://blog.cloudera.com/blog/2012/03/hbase-hadoop-xceivers/

    John
    On Sep 6, 2013, at 4:52 AM, Eric Huang wrote:

    Dear all,

    I encountered an unknown error 255 while inserting about 1 GB size of 1 year data into table partitioned by date,

    if works fine with 1 year data, but when I increased to 10 year(3650 days) and replicated 10 times of record per day,

    it ran an error while inserting the data at the 2557th day(it varies among different days), it seems like the hdfs went down and can not offer more

    Is there any limit on the partition numbers (may be cannot partition over a particular number), below is the error log, thanks.

    I0905 11:24:14.917146 9966 status.cc:44] Failed to write row (length: 628) to Hdfs file: hdfs://hadoopcluster/user/hive/warehouse/impala_6052.db/related_4_10_partition/.-842472423170165391-8979331812909578124_811768275_dir/qdate=2557/-842472423170165391-8979331812909578124_1836752452_data.0
    Error(255): Unknown error 255
    @ 0x83af7d (unknown)
    @ 0x96542d (unknown)
    @ 0x92a63f (unknown)
    @ 0x912df6 (unknown)
    @ 0x7fd2ba (unknown)
    @ 0x7fd72e (unknown)
    @ 0x69527d (unknown)
    @ 0x69bbbb (unknown)
    @ 0x9a36c4 (unknown)
    @ 0x3176c07851 (unknown)
    @ 0x31768e890d (unknown)
    I0905 11:24:14.926785 19689 coordinator.cc:870] Cancel() query_id=f44eefcb73e13571:7c9cfeb594d6f78b
    I0905 11:24:14.926875 19689 coordinator.cc:922] sending CancelPlanFragment rpc for instance_id=f44eefcb73e13571:7c9cfeb594d6f78d backend=DN3.lab:22000
    I0905 11:24:14.929299 19689 coordinator.cc:922] sending CancelPlanFragment rpc for instance_id=f44eefcb73e13571:7c9cfeb594d6f78e backend=DN1.lab:22000
    I0905 11:24:14.929847 8751 client-cache.cc:98] CreateClient(): adding new client for DN1.lab:22000
    I0905 11:24:14.931853 19689 coordinator.cc:922] sending CancelPlanFragment rpc for instance_id=f44eefcb73e13571:7c9cfeb594d6f78f backend=DN2.lab:22000
    I0905 11:24:14.932543 9965 impala-server.cc:1284] CancelPlanFragment(): instance_id=f44eefcb73e13571:7c9cfeb594d6f78f
    I0905 11:24:14.932713 9965 plan-fragment-executor.cc:419] Cancel(): instance_id=f44eefcb73e13571:7c9cfeb594d6f78f
    I0905 11:24:14.932920 9965 data-stream-mgr.cc:302] cancelling all streams for fragment=f44eefcb73e13571:7c9cfeb594d6f78f
    I0905 11:24:14.933126 19689 coordinator.cc:922] sending CancelPlanFragment rpc for instance_id=f44eefcb73e13571:7c9cfeb594d6f790 backend=DN3.lab:22000
    I0905 11:24:14.934126 19689 coordinator.cc:922] sending CancelPlanFragment rpc for instance_id=f44eefcb73e13571:7c9cfeb594d6f791 backend=DN1.lab:22000
    I0905 11:24:14.936192 19689 coordinator.cc:1209] Final profile for query_id=f44eefcb73e13571:7c9cfeb594d6f78b
    Execution Profile f44eefcb73e13571:7c9cfeb594d6f78b:(Active: 230.8ms, % non-child: 0.00%)
    Per Node Peak Memory Usage: DN2.lab:22000(1.90 GB) DN1.lab:22000(1.94 GB) DN3.lab:22000(1.94 GB)

    To unsubscribe from this group and stop receiving emails from it, send an email to impala-user+unsubscribe@cloudera.org.
    To unsubscribe from this group and stop receiving emails from it, send an email to impala-user+unsubscribe@cloudera.org.
  • Eric Huang at Sep 7, 2013 at 2:39 am
    Hi, John

        Thanks for reply, I'm quite confused on the impala 1.1.1 version can
    solve the entire problem, the testing version which I'm using is already
    1.1.1, but it still shows this error

         Anyway, I'll try to increase the max "transceiver" limit for HDFS on
    Monday and try again, LOL.

    Eric

    John Russell於 2013年9月7日星期六UTC+8上午3時32分23秒寫道:
    3 more data points about this particular issue:

    - The issue that actually puts HDFS into the error state is resolved in
    Impala 1.1.1 which came out recently, so I'd suggest upgrading.

    - It still is possible to exceed the dfs.datanode.max.xcievers limit, so
    you might need to bump that setting and/or split your INSERT into multiple
    smaller ones.

    - However, there is also an improvement in Impala 1.1.1 to the planning
    for such INSERTs into partitioned tables, to reduce the number of files
    written on each node.

    So overall, when you upgrade to Impala 1.1.1, the issue might disappear
    entirely for you. Even if doesn't, you will just see a query failure with
    no harmful aftereffects for the HDFS node.

    Hope that helps,
    John

    On Sep 6, 2013, at 9:55 AM, John Russell wrote:

    Eric, that looks like you are hitting a known issue where the number of
    simultaneous file handles and concurrent threads goes past the HDFS limit
    for the node, sometimes putting HDFS into an error state. The max
    "transceiver" limit for HDFS is 4096, it needs to be higher for an insert
    operation involving so many partitions. The HDFS property to change is
    actually spelled "xciever", i.e. dfs.datanode.max.xcievers.

    Here's a blog post with some more detail, in an HBase context:
    http://blog.cloudera.com/blog/2012/03/hbase-hadoop-xceivers/

    John

    On Sep 6, 2013, at 4:52 AM, Eric Huang <skipa...@gmail.com <javascript:>>
    wrote:

    Dear all,

    I encountered an unknown error 255 while inserting about 1 GB size of 1
    year data into table partitioned by date,

    if works fine with 1 year data, but when I increased to 10 year(3650
    days) and replicated 10 times of record per day,

    it ran an error while inserting the data at the 2557th day(it varies
    among different days), it seems like the hdfs went down and can not offer
    more

    Is there any limit on the partition numbers (may be cannot partition
    over a particular number), below is the error log, thanks.

    I0905 11:24:14.917146 9966 status.cc:44] Failed to write row (length:
    628) to Hdfs file:
    hdfs://hadoopcluster/user/hive/warehouse/impala_6052.db/related_4_10_partition/.-842472423170165391-8979331812909578124_811768275_dir/qdate=2557/-842472423170165391-8979331812909578124_1836752452_data.0
    Error(255): Unknown error 255
    @ 0x83af7d (unknown)
    @ 0x96542d (unknown)
    @ 0x92a63f (unknown)
    @ 0x912df6 (unknown)
    @ 0x7fd2ba (unknown)
    @ 0x7fd72e (unknown)
    @ 0x69527d (unknown)
    @ 0x69bbbb (unknown)
    @ 0x9a36c4 (unknown)
    @ 0x3176c07851 (unknown)
    @ 0x31768e890d (unknown)
    I0905 11:24:14.926785 19689 coordinator.cc:870] Cancel()
    query_id=f44eefcb73e13571:7c9cfeb594d6f78b
    I0905 11:24:14.926875 19689 coordinator.cc:922] sending CancelPlanFragment
    rpc for instance_id=f44eefcb73e13571:7c9cfeb594d6f78d backend=DN3.lab:22000
    I0905 11:24:14.929299 19689 coordinator.cc:922] sending CancelPlanFragment
    rpc for instance_id=f44eefcb73e13571:7c9cfeb594d6f78e backend=DN1.lab:22000
    I0905 11:24:14.929847 8751 client-cache.cc:98] CreateClient(): adding new
    client for DN1.lab:22000
    I0905 11:24:14.931853 19689 coordinator.cc:922] sending CancelPlanFragment
    rpc for instance_id=f44eefcb73e13571:7c9cfeb594d6f78f backend=DN2.lab:22000
    I0905 11:24:14.932543 9965 impala-server.cc:1284] CancelPlanFragment():
    instance_id=f44eefcb73e13571:7c9cfeb594d6f78f
    I0905 11:24:14.932713 9965 plan-fragment-executor.cc:419] Cancel():
    instance_id=f44eefcb73e13571:7c9cfeb594d6f78f
    I0905 11:24:14.932920 9965 data-stream-mgr.cc:302] cancelling all streams
    for fragment=f44eefcb73e13571:7c9cfeb594d6f78f
    I0905 11:24:14.933126 19689 coordinator.cc:922] sending CancelPlanFragment
    rpc for instance_id=f44eefcb73e13571:7c9cfeb594d6f790 backend=DN3.lab:22000
    I0905 11:24:14.934126 19689 coordinator.cc:922] sending CancelPlanFragment
    rpc for instance_id=f44eefcb73e13571:7c9cfeb594d6f791 backend=DN1.lab:22000
    I0905 11:24:14.936192 19689 coordinator.cc:1209] Final profile for
    query_id=f44eefcb73e13571:7c9cfeb594d6f78b
    Execution Profile f44eefcb73e13571:7c9cfeb594d6f78b:(Active: 230.8ms, %
    non-child: 0.00%)
    Per Node Peak Memory Usage: DN2.lab:22000(1.90 GB) DN1.lab:22000(1.94
    GB) DN3.lab:22000(1.94 GB)

    To unsubscribe from this group and stop receiving emails from it, send an
    email to impala-user...@cloudera.org <javascript:>.


    To unsubscribe from this group and stop receiving emails from it, send an email to impala-user+unsubscribe@cloudera.org.
  • John Russell at Sep 7, 2013 at 6:29 am

    On Sep 6, 2013, at 7:38 PM, Eric Huang wrote:

    Hi, John

    Thanks for reply, I'm quite confused on the impala 1.1.1 version can solve the entire problem, the testing version which I'm using is already 1.1.1, but it still shows this error

    Anyway, I'll try to increase the max "transceiver" limit for HDFS on Monday and try again, LOL.
    Yeah, I have not done a full runthrough with 1.1.1 yet to verify that the fix applies in every case. I was struck by the coincidence that I encountered the same error doing exactly the same test as you -- 10 years of data for a partitioned table. That must be a popular range to choose.

    John
    Eric

    John Russell於 2013年9月7日星期六UTC+8上午3時32分23秒寫道:
    3 more data points about this particular issue:

    - The issue that actually puts HDFS into the error state is resolved in Impala 1.1.1 which came out recently, so I'd suggest upgrading.

    - It still is possible to exceed the dfs.datanode.max.xcievers limit, so you might need to bump that setting and/or split your INSERT into multiple smaller ones.

    - However, there is also an improvement in Impala 1.1.1 to the planning for such INSERTs into partitioned tables, to reduce the number of files written on each node.

    So overall, when you upgrade to Impala 1.1.1, the issue might disappear entirely for you. Even if doesn't, you will just see a query failure with no harmful aftereffects for the HDFS node.

    Hope that helps,
    John
    On Sep 6, 2013, at 9:55 AM, John Russell wrote:

    Eric, that looks like you are hitting a known issue where the number of simultaneous file handles and concurrent threads goes past the HDFS limit for the node, sometimes putting HDFS into an error state. The max "transceiver" limit for HDFS is 4096, it needs to be higher for an insert operation involving so many partitions. The HDFS property to change is actually spelled "xciever", i.e. dfs.datanode.max.xcievers.

    Here's a blog post with some more detail, in an HBase context: http://blog.cloudera.com/blog/2012/03/hbase-hadoop-xceivers/

    John
    On Sep 6, 2013, at 4:52 AM, Eric Huang wrote:

    Dear all,

    I encountered an unknown error 255 while inserting about 1 GB size of 1 year data into table partitioned by date,

    if works fine with 1 year data, but when I increased to 10 year(3650 days) and replicated 10 times of record per day,

    it ran an error while inserting the data at the 2557th day(it varies among different days), it seems like the hdfs went down and can not offer more

    Is there any limit on the partition numbers (may be cannot partition over a particular number), below is the error log, thanks.

    I0905 11:24:14.917146 9966 status.cc:44] Failed to write row (length: 628) to Hdfs file: hdfs://hadoopcluster/user/hive/warehouse/impala_6052.db/related_4_10_partition/.-842472423170165391-8979331812909578124_811768275_dir/qdate=2557/-842472423170165391-8979331812909578124_1836752452_data.0
    Error(255): Unknown error 255
    @ 0x83af7d (unknown)
    @ 0x96542d (unknown)
    @ 0x92a63f (unknown)
    @ 0x912df6 (unknown)
    @ 0x7fd2ba (unknown)
    @ 0x7fd72e (unknown)
    @ 0x69527d (unknown)
    @ 0x69bbbb (unknown)
    @ 0x9a36c4 (unknown)
    @ 0x3176c07851 (unknown)
    @ 0x31768e890d (unknown)
    I0905 11:24:14.926785 19689 coordinator.cc:870] Cancel() query_id=f44eefcb73e13571:7c9cfeb594d6f78b
    I0905 11:24:14.926875 19689 coordinator.cc:922] sending CancelPlanFragment rpc for instance_id=f44eefcb73e13571:7c9cfeb594d6f78d backend=DN3.lab:22000
    I0905 11:24:14.929299 19689 coordinator.cc:922] sending CancelPlanFragment rpc for instance_id=f44eefcb73e13571:7c9cfeb594d6f78e backend=DN1.lab:22000
    I0905 11:24:14.929847 8751 client-cache.cc:98] CreateClient(): adding new client for DN1.lab:22000
    I0905 11:24:14.931853 19689 coordinator.cc:922] sending CancelPlanFragment rpc for instance_id=f44eefcb73e13571:7c9cfeb594d6f78f backend=DN2.lab:22000
    I0905 11:24:14.932543 9965 impala-server.cc:1284] CancelPlanFragment(): instance_id=f44eefcb73e13571:7c9cfeb594d6f78f
    I0905 11:24:14.932713 9965 plan-fragment-executor.cc:419] Cancel(): instance_id=f44eefcb73e13571:7c9cfeb594d6f78f
    I0905 11:24:14.932920 9965 data-stream-mgr.cc:302] cancelling all streams for fragment=f44eefcb73e13571:7c9cfeb594d6f78f
    I0905 11:24:14.933126 19689 coordinator.cc:922] sending CancelPlanFragment rpc for instance_id=f44eefcb73e13571:7c9cfeb594d6f790 backend=DN3.lab:22000
    I0905 11:24:14.934126 19689 coordinator.cc:922] sending CancelPlanFragment rpc for instance_id=f44eefcb73e13571:7c9cfeb594d6f791 backend=DN1.lab:22000
    I0905 11:24:14.936192 19689 coordinator.cc:1209] Final profile for query_id=f44eefcb73e13571:7c9cfeb594d6f78b
    Execution Profile f44eefcb73e13571:7c9cfeb594d6f78b:(Active: 230.8ms, % non-child: 0.00%)
    Per Node Peak Memory Usage: DN2.lab:22000(1.90 GB) DN1.lab:22000(1.94 GB) DN3.lab:22000(1.94 GB)

    To unsubscribe from this group and stop receiving emails from it, send an email to impala-user...@cloudera.org.

    To unsubscribe from this group and stop receiving emails from it, send an email to impala-user+unsubscribe@cloudera.org.
    To unsubscribe from this group and stop receiving emails from it, send an email to impala-user+unsubscribe@cloudera.org.
  • John Russell at Sep 11, 2013 at 12:01 am
    Hi Eric,

    In my environment, I confirmed I still get the error for my original query with 1.1.1 but now it's less serious -- the query fails but there is no bad effect on the HDFS service of my hosts. But I was able to construct a query that worked -- see below.

    My original query includes some CAST and CASE clauses which could complicate the planning a little bit and make the INSERT operation more resource-intensive than it really needs to be. When I did ALTER TABLE REPLACE COLUMNS on the source table to make the data types match up precisely (for example, to change the year / month / day columns from STRING to INT as in the destination table), I was able to get rid of my CAST/CASE expressions and now my INSERT works:

    [localhost:21000] > insert into parquet_ymd partition (year, month, day) select id, val, zfill, name, assertion, year, month, day from raw_data_ymd;
    Query: insert into parquet_ymd partition (year, month, day) select id, val, zfill, name, assertion, year, month, day from raw_data_ymd
    Inserted 489409929 rows in 1483.79s

    That's on a 4-node cluster with 48GB memory per node, everything else basically default settings as defined by Cloudera Manager.

    Is there any conversion or testing logic in the query portion of your INSERT statement?

    I believe there could be some benefit to having column stats on the partition key columns in the source table. The instructions for that are at:

    http://www.cloudera.com/content/cloudera-content/cloudera-docs/Impala/latest/Installing-and-Using-Impala/ciiu_performance.html#perf_column_stats_unique_1

    I didn't verify yet that it helped in this case. (If you collect any column stats via ANALYZE TABLE in Hive, make sure to do it _after_ any ALTER TABLE … REPLACE COLUMNS to make all the source and destination data types match.)

    Hope that helps,
    John
    On Sep 6, 2013, at 11:29 PM, John Russell wrote:

    On Sep 6, 2013, at 7:38 PM, Eric Huang wrote:

    Hi, John

    Thanks for reply, I'm quite confused on the impala 1.1.1 version can solve the entire problem, the testing version which I'm using is already 1.1.1, but it still shows this error

    Anyway, I'll try to increase the max "transceiver" limit for HDFS on Monday and try again, LOL.
    Yeah, I have not done a full runthrough with 1.1.1 yet to verify that the fix applies in every case. I was struck by the coincidence that I encountered the same error doing exactly the same test as you -- 10 years of data for a partitioned table. That must be a popular range to choose.

    John
    Eric

    John Russell於 2013年9月7日星期六UTC+8上午3時32分23秒寫道:
    3 more data points about this particular issue:

    - The issue that actually puts HDFS into the error state is resolved in Impala 1.1.1 which came out recently, so I'd suggest upgrading.

    - It still is possible to exceed the dfs.datanode.max.xcievers limit, so you might need to bump that setting and/or split your INSERT into multiple smaller ones.

    - However, there is also an improvement in Impala 1.1.1 to the planning for such INSERTs into partitioned tables, to reduce the number of files written on each node.

    So overall, when you upgrade to Impala 1.1.1, the issue might disappear entirely for you. Even if doesn't, you will just see a query failure with no harmful aftereffects for the HDFS node.

    Hope that helps,
    John
    On Sep 6, 2013, at 9:55 AM, John Russell wrote:

    Eric, that looks like you are hitting a known issue where the number of simultaneous file handles and concurrent threads goes past the HDFS limit for the node, sometimes putting HDFS into an error state. The max "transceiver" limit for HDFS is 4096, it needs to be higher for an insert operation involving so many partitions. The HDFS property to change is actually spelled "xciever", i.e. dfs.datanode.max.xcievers.

    Here's a blog post with some more detail, in an HBase context: http://blog.cloudera.com/blog/2012/03/hbase-hadoop-xceivers/

    John
    On Sep 6, 2013, at 4:52 AM, Eric Huang wrote:

    Dear all,

    I encountered an unknown error 255 while inserting about 1 GB size of 1 year data into table partitioned by date,

    if works fine with 1 year data, but when I increased to 10 year(3650 days) and replicated 10 times of record per day,

    it ran an error while inserting the data at the 2557th day(it varies among different days), it seems like the hdfs went down and can not offer more

    Is there any limit on the partition numbers (may be cannot partition over a particular number), below is the error log, thanks.

    I0905 11:24:14.917146 9966 status.cc:44] Failed to write row (length: 628) to Hdfs file: hdfs://hadoopcluster/user/hive/warehouse/impala_6052.db/related_4_10_partition/.-842472423170165391-8979331812909578124_811768275_dir/qdate=2557/-842472423170165391-8979331812909578124_1836752452_data.0
    Error(255): Unknown error 255
    @ 0x83af7d (unknown)
    @ 0x96542d (unknown)
    @ 0x92a63f (unknown)
    @ 0x912df6 (unknown)
    @ 0x7fd2ba (unknown)
    @ 0x7fd72e (unknown)
    @ 0x69527d (unknown)
    @ 0x69bbbb (unknown)
    @ 0x9a36c4 (unknown)
    @ 0x3176c07851 (unknown)
    @ 0x31768e890d (unknown)
    I0905 11:24:14.926785 19689 coordinator.cc:870] Cancel() query_id=f44eefcb73e13571:7c9cfeb594d6f78b
    I0905 11:24:14.926875 19689 coordinator.cc:922] sending CancelPlanFragment rpc for instance_id=f44eefcb73e13571:7c9cfeb594d6f78d backend=DN3.lab:22000
    I0905 11:24:14.929299 19689 coordinator.cc:922] sending CancelPlanFragment rpc for instance_id=f44eefcb73e13571:7c9cfeb594d6f78e backend=DN1.lab:22000
    I0905 11:24:14.929847 8751 client-cache.cc:98] CreateClient(): adding new client for DN1.lab:22000
    I0905 11:24:14.931853 19689 coordinator.cc:922] sending CancelPlanFragment rpc for instance_id=f44eefcb73e13571:7c9cfeb594d6f78f backend=DN2.lab:22000
    I0905 11:24:14.932543 9965 impala-server.cc:1284] CancelPlanFragment(): instance_id=f44eefcb73e13571:7c9cfeb594d6f78f
    I0905 11:24:14.932713 9965 plan-fragment-executor.cc:419] Cancel(): instance_id=f44eefcb73e13571:7c9cfeb594d6f78f
    I0905 11:24:14.932920 9965 data-stream-mgr.cc:302] cancelling all streams for fragment=f44eefcb73e13571:7c9cfeb594d6f78f
    I0905 11:24:14.933126 19689 coordinator.cc:922] sending CancelPlanFragment rpc for instance_id=f44eefcb73e13571:7c9cfeb594d6f790 backend=DN3.lab:22000
    I0905 11:24:14.934126 19689 coordinator.cc:922] sending CancelPlanFragment rpc for instance_id=f44eefcb73e13571:7c9cfeb594d6f791 backend=DN1.lab:22000
    I0905 11:24:14.936192 19689 coordinator.cc:1209] Final profile for query_id=f44eefcb73e13571:7c9cfeb594d6f78b
    Execution Profile f44eefcb73e13571:7c9cfeb594d6f78b:(Active: 230.8ms, % non-child: 0.00%)
    Per Node Peak Memory Usage: DN2.lab:22000(1.90 GB) DN1.lab:22000(1.94 GB) DN3.lab:22000(1.94 GB)

    To unsubscribe from this group and stop receiving emails from it, send an email to impala-user...@cloudera.org.

    To unsubscribe from this group and stop receiving emails from it, send an email to impala-user+unsubscribe@cloudera.org.
    To unsubscribe from this group and stop receiving emails from it, send an email to impala-user+unsubscribe@cloudera.org.

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupimpala-user @
categorieshadoop
postedSep 6, '13 at 11:52a
activeSep 11, '13 at 12:01a
posts6
users2
websitecloudera.com
irc#hadoop

2 users in discussion

John Russell: 4 posts Eric Huang: 2 posts

People

Translate

site design / logo © 2021 Grokbase