FAQ
Hi DK,
Impala supports reading RC files, but does not yet support writing to RC
files.

Thanks,
Lenni
Software Engineer - Cloudera
On Thu, Mar 7, 2013 at 11:33 AM, DK wrote:

Based on the documentation I think RC_FILE is supported but when I run the
query it complains that :
Backend 0:RC_FILE not implemented.


Here is my table definition and query:
create table web_returns_i
(
wr_returned_time_sk int,
wr_item_sk int,
wr_refunded_customer_sk int,
wr_refunded_cdemo_sk int,
wr_refunded_hdemo_sk int,
wr_refunded_addr_sk int,
wr_returning_customer_sk int,
wr_returning_cdemo_sk int,
wr_returning_hdemo_sk int,
wr_returning_addr_sk int,
wr_web_page_sk int,
wr_reason_sk int,
wr_order_number int,
wr_return_quantity int,
wr_return_amt float,
wr_return_tax float,
wr_return_amt_inc_tax float,
wr_fee float,
wr_return_ship_cost float,
wr_refunded_cash float,
wr_reversed_charge float,
wr_account_credit float,
wr_net_loss float

)
PARTITIONED BY (wr_returned_date_sk int)
stored as RCFILE
location '/hive/tpcds/web_returns_i';


And the query I am trying to run:
insert overwrite table web_returns_i
PARTITION (wr_returned_date_sk)
select
wr_returned_time_sk ,
wr_item_sk ,
wr_refunded_customer_sk ,
wr_refunded_cdemo_sk ,
wr_refunded_hdemo_sk ,
wr_refunded_addr_sk ,
wr_returning_customer_sk ,
wr_returning_cdemo_sk ,
wr_returning_hdemo_sk ,
wr_returning_addr_sk ,
wr_web_page_sk ,
wr_reason_sk ,
wr_order_number ,
wr_return_quantity ,
wr_return_amt ,
wr_return_tax ,
wr_return_amt_inc_tax ,
wr_fee ,
wr_return_ship_cost ,
wr_refunded_cash ,
wr_reversed_charge ,
wr_account_credit ,
wr_net_loss ,
wr_returned_date_sk
from web_returns;

Please advise on what could be wrong here.

Thanks,
DK

Search Discussions

  • DK at Mar 7, 2013 at 9:14 pm
    Thanks for your response !
    I have tried to do this using Hive which fails as well with OutofMemory
    exception so I though will use Impala for this:
    I am using Cloudera Manager and have changed all the HeapSize to be 4 GB
    still no idea which JVM process is this and how can I change the heap
    setting for this which has the OOM error

    2013-03-07 12:03:58,255 Stage-1 map = 100%, reduce = 0%, Cumulative CPU
    16076.51 sec
    2013-03-07 12:03:59,265 Stage-1 map = 100%, reduce = 100%, Cumulative CPU
    16076.51 sec
    MapReduce Total cumulative CPU time: 0 days 4 hours 27 minutes 56 seconds
    510 msec
    Ended Job = job_201303071139_0001
    Ended Job = 1850985048, job is filtered out (removed at runtime).
    Ended Job = -155979390, job is filtered out (removed at runtime).
    java.lang.OutOfMemoryError: Java heap space
    at java.util.Arrays.copyOfRange(Arrays.java:3209)
    at java.lang.String.(StringBuffer.java:585)
    at java.net.URI.toString(URI.java:1908)
    at java.net.URI.(Path.java:154)
    at org.apache.hadoop.fs.Path.(Path.java:58)
    at
    org.apache.hadoop.hdfs.protocol.HdfsFileStatus.getFullPath(HdfsFileStatus.java:209)
    at
    org.apache.hadoop.hdfs.DistributedFileSystem.makeQualified(DistributedFileSystem.java:371)
    at
    org.apache.hadoop.hdfs.DistributedFileSystem.listStatus(DistributedFileSystem.java:415)
    at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1416)
    at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1456)
    at
    org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:182)
    at
    org.apache.hadoop.mapred.lib.CombineFileInputFormat.getSplits(CombineFileInputFormat.java:180)
    at
    org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileInputFormatShim.getSplits(HadoopShimsSecure.java:411)
    at
    org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileInputFormatShim.getSplits(HadoopShimsSecure.java:377)
    at
    org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.processPaths(CombineHiveInputFormat.java:419)
    at
    org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getSplits(CombineHiveInputFormat.java:390)
    at
    org.apache.hadoop.mapred.JobClient.writeOldSplits(JobClient.java:1091)
    at
    org.apache.hadoop.mapred.JobClient.writeSplits(JobClient.java:1083)
    at org.apache.hadoop.mapred.JobClient.access$600(JobClient.java:174)
    at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:993)
    at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:946)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:396)
    at
    org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
    at
    org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:946)
    at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:920)
    at
    org.apache.hadoop.hive.ql.io.rcfile.merge.BlockMergeTask.execute(BlockMergeTask.java:204)
    at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:138)
    at
    org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:57)
    FAILED: Execution Error, return code -101 from
    org.apache.hadoop.hive.ql.io.rcfile.merge.BlockMergeTask
    MapReduce Jobs Launched:
    Job 0: Map: 153 Cumulative CPU: 16076.51 sec HDFS Read: 42558399095
    HDFS Write: 32176064044 SUCCESS
    Total MapReduce CPU Time Spent: 0 days 4 hours 27 minutes 56 seconds 510
    msec

    On Thursday, March 7, 2013 12:16:17 PM UTC-8, lskuff wrote:

    Hi DK,
    Impala supports reading RC files, but does not yet support writing to RC
    files.

    Thanks,
    Lenni
    Software Engineer - Cloudera

    On Thu, Mar 7, 2013 at 11:33 AM, DK <dileepk...@gmail.com <javascript:>>wrote:
    Based on the documentation I think RC_FILE is supported but when I run
    the query it complains that :
    Backend 0:RC_FILE not implemented.


    Here is my table definition and query:
    create table web_returns_i
    (
    wr_returned_time_sk int,
    wr_item_sk int,
    wr_refunded_customer_sk int,
    wr_refunded_cdemo_sk int,
    wr_refunded_hdemo_sk int,
    wr_refunded_addr_sk int,
    wr_returning_customer_sk int,
    wr_returning_cdemo_sk int,
    wr_returning_hdemo_sk int,
    wr_returning_addr_sk int,
    wr_web_page_sk int,
    wr_reason_sk int,
    wr_order_number int,
    wr_return_quantity int,
    wr_return_amt float,
    wr_return_tax float,
    wr_return_amt_inc_tax float,
    wr_fee float,
    wr_return_ship_cost float,
    wr_refunded_cash float,
    wr_reversed_charge float,
    wr_account_credit float,
    wr_net_loss float

    )
    PARTITIONED BY (wr_returned_date_sk int)
    stored as RCFILE
    location '/hive/tpcds/web_returns_i';


    And the query I am trying to run:
    insert overwrite table web_returns_i
    PARTITION (wr_returned_date_sk)
    select
    wr_returned_time_sk ,
    wr_item_sk ,
    wr_refunded_customer_sk ,
    wr_refunded_cdemo_sk ,
    wr_refunded_hdemo_sk ,
    wr_refunded_addr_sk ,
    wr_returning_customer_sk ,
    wr_returning_cdemo_sk ,
    wr_returning_hdemo_sk ,
    wr_returning_addr_sk ,
    wr_web_page_sk ,
    wr_reason_sk ,
    wr_order_number ,
    wr_return_quantity ,
    wr_return_amt ,
    wr_return_tax ,
    wr_return_amt_inc_tax ,
    wr_fee ,
    wr_return_ship_cost ,
    wr_refunded_cash ,
    wr_reversed_charge ,
    wr_account_credit ,
    wr_net_loss ,
    wr_returned_date_sk
    from web_returns;

    Please advise on what could be wrong here.

    Thanks,
    DK
  • DK at Mar 7, 2013 at 11:08 pm
    I was able to resolve this by increasing Hive client default memory of
    256MB.
    I am not sure why CM sets this to so low.

    Thanks
    On Thursday, March 7, 2013 1:14:30 PM UTC-8, DK wrote:

    Thanks for your response !
    I have tried to do this using Hive which fails as well with OutofMemory
    exception so I though will use Impala for this:
    I am using Cloudera Manager and have changed all the HeapSize to be 4 GB
    still no idea which JVM process is this and how can I change the heap
    setting for this which has the OOM error

    2013-03-07 12:03:58,255 Stage-1 map = 100%, reduce = 0%, Cumulative CPU
    16076.51 sec
    2013-03-07 12:03:59,265 Stage-1 map = 100%, reduce = 100%, Cumulative CPU
    16076.51 sec
    MapReduce Total cumulative CPU time: 0 days 4 hours 27 minutes 56 seconds
    510 msec
    Ended Job = job_201303071139_0001
    Ended Job = 1850985048, job is filtered out (removed at runtime).
    Ended Job = -155979390, job is filtered out (removed at runtime).
    java.lang.OutOfMemoryError: Java heap space
    at java.util.Arrays.copyOfRange(Arrays.java:3209)
    at java.lang.String.<init>(String.java:215)
    at java.lang.StringBuffer.toString(StringBuffer.java:585)
    at java.net.URI.toString(URI.java:1908)
    at java.net.URI.<init>(URI.java:731)
    at org.apache.hadoop.fs.Path.initialize(Path.java:154)
    at org.apache.hadoop.fs.Path.<init>(Path.java:80)
    at org.apache.hadoop.fs.Path.<init>(Path.java:58)
    at
    org.apache.hadoop.hdfs.protocol.HdfsFileStatus.getFullPath(HdfsFileStatus.java:209)
    at
    org.apache.hadoop.hdfs.DistributedFileSystem.makeQualified(DistributedFileSystem.java:371)
    at
    org.apache.hadoop.hdfs.DistributedFileSystem.listStatus(DistributedFileSystem.java:415)
    at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1416)
    at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1456)
    at
    org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:182)
    at
    org.apache.hadoop.mapred.lib.CombineFileInputFormat.getSplits(CombineFileInputFormat.java:180)
    at
    org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileInputFormatShim.getSplits(HadoopShimsSecure.java:411)
    at
    org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileInputFormatShim.getSplits(HadoopShimsSecure.java:377)
    at
    org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.processPaths(CombineHiveInputFormat.java:419)
    at
    org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getSplits(CombineHiveInputFormat.java:390)
    at
    org.apache.hadoop.mapred.JobClient.writeOldSplits(JobClient.java:1091)
    at
    org.apache.hadoop.mapred.JobClient.writeSplits(JobClient.java:1083)
    at
    org.apache.hadoop.mapred.JobClient.access$600(JobClient.java:174)
    at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:993)
    at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:946)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:396)
    at
    org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
    at
    org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:946)
    at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:920)
    at
    org.apache.hadoop.hive.ql.io.rcfile.merge.BlockMergeTask.execute(BlockMergeTask.java:204)
    at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:138)
    at
    org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:57)
    FAILED: Execution Error, return code -101 from
    org.apache.hadoop.hive.ql.io.rcfile.merge.BlockMergeTask
    MapReduce Jobs Launched:
    Job 0: Map: 153 Cumulative CPU: 16076.51 sec HDFS Read: 42558399095
    HDFS Write: 32176064044 SUCCESS
    Total MapReduce CPU Time Spent: 0 days 4 hours 27 minutes 56 seconds 510
    msec

    On Thursday, March 7, 2013 12:16:17 PM UTC-8, lskuff wrote:

    Hi DK,
    Impala supports reading RC files, but does not yet support writing to RC
    files.

    Thanks,
    Lenni
    Software Engineer - Cloudera
    On Thu, Mar 7, 2013 at 11:33 AM, DK wrote:

    Based on the documentation I think RC_FILE is supported but when I run
    the query it complains that :
    Backend 0:RC_FILE not implemented.


    Here is my table definition and query:
    create table web_returns_i
    (
    wr_returned_time_sk int,
    wr_item_sk int,
    wr_refunded_customer_sk int,
    wr_refunded_cdemo_sk int,
    wr_refunded_hdemo_sk int,
    wr_refunded_addr_sk int,
    wr_returning_customer_sk int,
    wr_returning_cdemo_sk int,
    wr_returning_hdemo_sk int,
    wr_returning_addr_sk int,
    wr_web_page_sk int,
    wr_reason_sk int,
    wr_order_number int,
    wr_return_quantity int,
    wr_return_amt float,
    wr_return_tax float,
    wr_return_amt_inc_tax float,
    wr_fee float,
    wr_return_ship_cost float,
    wr_refunded_cash float,
    wr_reversed_charge float,
    wr_account_credit float,
    wr_net_loss float

    )
    PARTITIONED BY (wr_returned_date_sk int)
    stored as RCFILE
    location '/hive/tpcds/web_returns_i';


    And the query I am trying to run:
    insert overwrite table web_returns_i
    PARTITION (wr_returned_date_sk)
    select
    wr_returned_time_sk ,
    wr_item_sk ,
    wr_refunded_customer_sk ,
    wr_refunded_cdemo_sk ,
    wr_refunded_hdemo_sk ,
    wr_refunded_addr_sk ,
    wr_returning_customer_sk ,
    wr_returning_cdemo_sk ,
    wr_returning_hdemo_sk ,
    wr_returning_addr_sk ,
    wr_web_page_sk ,
    wr_reason_sk ,
    wr_order_number ,
    wr_return_quantity ,
    wr_return_amt ,
    wr_return_tax ,
    wr_return_amt_inc_tax ,
    wr_fee ,
    wr_return_ship_cost ,
    wr_refunded_cash ,
    wr_reversed_charge ,
    wr_account_credit ,
    wr_net_loss ,
    wr_returned_date_sk
    from web_returns;

    Please advise on what could be wrong here.

    Thanks,
    DK
  • Henry Robinson at Mar 7, 2013 at 11:56 pm
    I'm not sure either! I suggest you ask that question on
    scm-users@cloudera.org (
    https://groups.google.com/a/cloudera.org/group/scm-users/topics).

    Henry
    On 7 March 2013 15:08, DK wrote:

    I was able to resolve this by increasing Hive client default memory of
    256MB.
    I am not sure why CM sets this to so low.

    Thanks

    On Thursday, March 7, 2013 1:14:30 PM UTC-8, DK wrote:

    Thanks for your response !
    I have tried to do this using Hive which fails as well with OutofMemory
    exception so I though will use Impala for this:
    I am using Cloudera Manager and have changed all the HeapSize to be 4 GB
    still no idea which JVM process is this and how can I change the heap
    setting for this which has the OOM error

    2013-03-07 12:03:58,255 Stage-1 map = 100%, reduce = 0%, Cumulative CPU
    16076.51 sec
    2013-03-07 12:03:59,265 Stage-1 map = 100%, reduce = 100%, Cumulative
    CPU 16076.51 sec
    MapReduce Total cumulative CPU time: 0 days 4 hours 27 minutes 56 seconds
    510 msec
    Ended Job = job_201303071139_0001
    Ended Job = 1850985048, job is filtered out (removed at runtime).
    Ended Job = -155979390, job is filtered out (removed at runtime).
    java.lang.OutOfMemoryError: Java heap space
    at java.util.Arrays.copyOfRange(**Arrays.java:3209)
    at java.lang.String.<init>(**String.java:215)
    at java.lang.StringBuffer.**toString(StringBuffer.java:**585)
    at java.net.URI.toString(URI.**java:1908)
    at java.net.URI.<init>(URI.java:**731)
    at org.apache.hadoop.fs.Path.**initialize(Path.java:154)
    at org.apache.hadoop.fs.Path.<**init>(Path.java:80)
    at org.apache.hadoop.fs.Path.<**init>(Path.java:58)
    at org.apache.hadoop.hdfs.**protocol.HdfsFileStatus.**
    getFullPath(HdfsFileStatus.**java:209)
    at org.apache.hadoop.hdfs.**DistributedFileSystem.**
    makeQualified(**DistributedFileSystem.java:**371)
    at org.apache.hadoop.hdfs.**DistributedFileSystem.**listStatus(**
    DistributedFileSystem.java:**415)
    at org.apache.hadoop.fs.**FileSystem.listStatus(**
    FileSystem.java:1416)
    at org.apache.hadoop.fs.**FileSystem.listStatus(**
    FileSystem.java:1456)
    at org.apache.hadoop.mapred.**FileInputFormat.listStatus(**
    FileInputFormat.java:182)
    at org.apache.hadoop.mapred.lib.**CombineFileInputFormat.**
    getSplits(**CombineFileInputFormat.java:**180)
    at org.apache.hadoop.hive.shims.**HadoopShimsSecure$**
    CombineFileInputFormatShim.**getSplits(HadoopShimsSecure.**java:411)
    at org.apache.hadoop.hive.shims.**HadoopShimsSecure$**
    CombineFileInputFormatShim.**getSplits(HadoopShimsSecure.**java:377)
    at org.apache.hadoop.hive.ql.io.**CombineHiveInputFormat.**
    processPaths(**CombineHiveInputFormat.java:**419)
    at org.apache.hadoop.hive.ql.io.**CombineHiveInputFormat.**
    getSplits(**CombineHiveInputFormat.java:**390)
    at org.apache.hadoop.mapred.**JobClient.writeOldSplits(**
    JobClient.java:1091)
    at org.apache.hadoop.mapred.**JobClient.writeSplits(**
    JobClient.java:1083)
    at org.apache.hadoop.mapred.**JobClient.access$600(**
    JobClient.java:174)
    at org.apache.hadoop.mapred.**JobClient$2.run(JobClient.**
    java:993)
    at org.apache.hadoop.mapred.**JobClient$2.run(JobClient.**
    java:946)
    at java.security.**AccessController.doPrivileged(**Native Method)
    at javax.security.auth.Subject.**doAs(Subject.java:396)
    at org.apache.hadoop.security.**UserGroupInformation.doAs(**
    UserGroupInformation.java:**1408)
    at org.apache.hadoop.mapred.**JobClient.submitJobInternal(**
    JobClient.java:946)
    at org.apache.hadoop.mapred.**JobClient.submitJob(JobClient.**
    java:920)
    at org.apache.hadoop.hive.ql.io.**rcfile.merge.BlockMergeTask.**
    execute(BlockMergeTask.java:**204)
    at org.apache.hadoop.hive.ql.**exec.Task.executeTask(Task.**
    java:138)
    at org.apache.hadoop.hive.ql.**exec.TaskRunner.runSequential(**
    TaskRunner.java:57)
    FAILED: Execution Error, return code -101 from
    org.apache.hadoop.hive.ql.io.**rcfile.merge.BlockMergeTask
    MapReduce Jobs Launched:
    Job 0: Map: 153 Cumulative CPU: 16076.51 sec HDFS Read: 42558399095
    HDFS Write: 32176064044 SUCCESS
    Total MapReduce CPU Time Spent: 0 days 4 hours 27 minutes 56 seconds 510
    msec

    On Thursday, March 7, 2013 12:16:17 PM UTC-8, lskuff wrote:

    Hi DK,
    Impala supports reading RC files, but does not yet support writing to RC
    files.

    Thanks,
    Lenni
    Software Engineer - Cloudera
    On Thu, Mar 7, 2013 at 11:33 AM, DK wrote:

    Based on the documentation I think RC_FILE is supported but when I run
    the query it complains that :
    Backend 0:RC_FILE not implemented.


    Here is my table definition and query:
    create table web_returns_i
    (
    wr_returned_time_sk int,
    wr_item_sk int,
    wr_refunded_customer_sk int,
    wr_refunded_cdemo_sk int,
    wr_refunded_hdemo_sk int,
    wr_refunded_addr_sk int,
    wr_returning_customer_sk int,
    wr_returning_cdemo_sk int,
    wr_returning_hdemo_sk int,
    wr_returning_addr_sk int,
    wr_web_page_sk int,
    wr_reason_sk int,
    wr_order_number int,
    wr_return_quantity int,
    wr_return_amt float,
    wr_return_tax float,
    wr_return_amt_inc_tax float,
    wr_fee float,
    wr_return_ship_cost float,
    wr_refunded_cash float,
    wr_reversed_charge float,
    wr_account_credit float,
    wr_net_loss float

    )
    PARTITIONED BY (wr_returned_date_sk int)
    stored as RCFILE
    location '/hive/tpcds/web_returns_i';


    And the query I am trying to run:
    insert overwrite table web_returns_i
    PARTITION (wr_returned_date_sk)
    select
    wr_returned_time_sk ,
    wr_item_sk ,
    wr_refunded_customer_sk ,
    wr_refunded_cdemo_sk ,
    wr_refunded_hdemo_sk ,
    wr_refunded_addr_sk ,
    wr_returning_customer_sk ,
    wr_returning_cdemo_sk ,
    wr_returning_hdemo_sk ,
    wr_returning_addr_sk ,
    wr_web_page_sk ,
    wr_reason_sk ,
    wr_order_number ,
    wr_return_quantity ,
    wr_return_amt ,
    wr_return_tax ,
    wr_return_amt_inc_tax ,
    wr_fee ,
    wr_return_ship_cost ,
    wr_refunded_cash ,
    wr_reversed_charge ,
    wr_account_credit ,
    wr_net_loss ,
    wr_returned_date_sk
    from web_returns;

    Please advise on what could be wrong here.

    Thanks,
    DK

    --
    Henry Robinson
    Software Engineer
    Cloudera
    415-994-6679

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupimpala-user @
categorieshadoop
postedMar 7, '13 at 8:16p
activeMar 7, '13 at 11:56p
posts4
users3
websitecloudera.com
irc#hadoop

3 users in discussion

DK: 2 posts Lenni Kuff: 1 post Henry Robinson: 1 post

People

Translate

site design / logo © 2022 Grokbase