Grokbase Groups Hive user March 2011
FAQ
Hi,

I have a hive script [given below] which calls a python script using
transform and for large datasets [ > 100M rows ], the reducer is not able to
start the python process and the error message is "argument list too long".
The detailed error stack is given below.

The python script takes only 1 static argument "hbase". For small datasets [
1M rows ], the script works fine.

1. Is this problem related to the number of open file handles on the
reducer box?
2. How do I get the correct error message?
3. Is there a way to get to the actual unix process with arguments it is
instantiating?

Thanks,
Irfan
script >>>>>>>>>>>>>>>>>>>>>>>>>>>
true && echo "
set hive.exec.compress.output=true;
set io.seqfile.compression.type=BLOCK;
set mapred.output.compression.type=BLOCK;
set mapred.output.compression.codec=org.apache.hadoop.io.compress.GzipCodec;

set hive.merge.mapfiles=false;
set hive.exec.dynamic.partition=true;
set hive.exec.dynamic.partition.mode=nonstrict;
set hive.exec.max.dynamic.partitions=20000;

set hive.map.aggr=true;
set mapred.reduce.tasks=200;

add file /home/irfan/scripts/user_id_output.py;

from (
select
user_id,
source_id,
load_id,
user_id_json,
pid
from
user_id_bucket
where
1 = 1
and length(user_id) = 40
distribute by pid
sort by user_id, load_id desc
) T1
insert overwrite table user_id_hbase_stg1 partition (pid)
SELECT
transform
(
T1.user_id
, T1.source_id
, T1.load_id
, T1.user_id_json
, T1.pid
)
using 'python2.6 user_id_output.py hbase'
as user_id, user_id_info1, user_id_info2, user_id_info3, user_id_info4,
user_id_info5, user_id_info3, pid
;

" > ${SQL_FILE};

true && ${HHIVE} -f ${SQL_FILE} 1>${TXT_FILE} 2>${LOG_FILE};
error log >>>>>>>>>>>>>>>>>>>>>>>>>
2011-03-01 14:46:13,705 INFO org.apache.hadoop.hive.ql.exec.ExtractOperator:
Initializing Self 5 OP
2011-03-01 14:46:13,711 INFO org.apache.hadoop.hive.ql.exec.ExtractOperator:
Operator 5 OP initialized
2011-03-01 14:46:13,711 INFO org.apache.hadoop.hive.ql.exec.ExtractOperator:
Initializing children of 5 OP
2011-03-01 14:46:13,711 INFO org.apache.hadoop.hive.ql.exec.SelectOperator:
Initializing child 6 SEL
2011-03-01 14:46:13,711 INFO org.apache.hadoop.hive.ql.exec.SelectOperator:
Initializing Self 6 SEL
2011-03-01 14:46:13,711 INFO org.apache.hadoop.hive.ql.exec.SelectOperator:
SELECT
struct<_col0:string,_col1:string,_col2:string,_col3:string,_col4:string>
2011-03-01 14:46:13,711 INFO org.apache.hadoop.hive.ql.exec.SelectOperator:
Operator 6 SEL initialized
2011-03-01 14:46:13,711 INFO org.apache.hadoop.hive.ql.exec.SelectOperator:
Initializing children of 6 SEL
2011-03-01 14:46:13,711 INFO org.apache.hadoop.hive.ql.exec.ScriptOperator:
Initializing child 7 SCR
2011-03-01 14:46:13,711 INFO org.apache.hadoop.hive.ql.exec.ScriptOperator:
Initializing Self 7 SCR
2011-03-01 14:46:13,728 INFO org.apache.hadoop.hive.ql.exec.ScriptOperator:
Operator 7 SCR initialized
2011-03-01 14:46:13,728 INFO org.apache.hadoop.hive.ql.exec.ScriptOperator:
Initializing children of 7 SCR
2011-03-01 14:46:13,728 INFO
org.apache.hadoop.hive.ql.exec.FileSinkOperator: Initializing child 8 FS
2011-03-01 14:46:13,728 INFO
org.apache.hadoop.hive.ql.exec.FileSinkOperator: Initializing Self 8 FS
2011-03-01 14:46:13,730 INFO
org.apache.hadoop.hive.ql.exec.FileSinkOperator: Operator 8 FS initialized
2011-03-01 14:46:13,730 INFO
org.apache.hadoop.hive.ql.exec.FileSinkOperator: Initialization Done 8 FS
2011-03-01 14:46:13,730 INFO org.apache.hadoop.hive.ql.exec.ScriptOperator:
Initialization Done 7 SCR
2011-03-01 14:46:13,730 INFO org.apache.hadoop.hive.ql.exec.SelectOperator:
Initialization Done 6 SEL
2011-03-01 14:46:13,730 INFO org.apache.hadoop.hive.ql.exec.ExtractOperator:
Initialization Done 5 OP
2011-03-01 14:46:13,733 INFO ExecReducer: ExecReducer: processing 1 rows:
used memory = 89690888
2011-03-01 14:46:13,733 INFO org.apache.hadoop.hive.ql.exec.ExtractOperator:
5 forwarding 1 rows
2011-03-01 14:46:13,733 INFO org.apache.hadoop.hive.ql.exec.SelectOperator:
6 forwarding 1 rows
2011-03-01 14:46:13,733 INFO org.apache.hadoop.hive.ql.exec.ScriptOperator:
Executing [/usr/bin/python2.6, user_id_output.py, hbase]
2011-03-01 14:46:13,733 INFO org.apache.hadoop.hive.ql.exec.ScriptOperator:
tablename=null
2011-03-01 14:46:13,733 INFO org.apache.hadoop.hive.ql.exec.ScriptOperator:
partname=null
2011-03-01 14:46:13,733 INFO org.apache.hadoop.hive.ql.exec.ScriptOperator:
alias=null
2011-03-01 14:46:13,777 FATAL ExecReducer:
org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while
processing row
(tag=0) {"key":{"reducesinkkey0":"AA11223344","reducesinkkey1":"20110210_02"},"value":{"_col0":"xxxxx","_col1":"m1","_col2":"20110210_02","_col3":"{'m07':
'x12', 'm02': 'x34', 'm01': 'm45'}","_col4":"0A9"},"alias":0}
at org.apache.hadoop.hive.ql.exec.ExecReducer.reduce(ExecReducer.java:253)
at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:467)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:415)
at org.apache.hadoop.mapred.Child$4.run(Child.java:217)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1063)
at org.apache.hadoop.mapred.Child.main(Child.java:211)
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Cannot
initialize ScriptOperator
at
org.apache.hadoop.hive.ql.exec.ScriptOperator.processOp(ScriptOperator.java:320)
at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:457)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:697)
at
org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84)
at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:457)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:697)
at
org.apache.hadoop.hive.ql.exec.ExtractOperator.processOp(ExtractOperator.java:45)
at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:457)
at org.apache.hadoop.hive.ql.exec.ExecReducer.reduce(ExecReducer.java:244)
... 7 more
Caused by: java.io.IOException: Cannot run program "/usr/bin/python2.6":
java.io.IOException: error=7, Argument list too long
at java.lang.ProcessBuilder.start(ProcessBuilder.java:459)
at
org.apache.hadoop.hive.ql.exec.ScriptOperator.processOp(ScriptOperator.java:279)
... 15 more
Caused by: java.io.IOException: java.io.IOException: error=7, Argument list
too long
at java.lang.UNIXProcess.(ProcessImpl.java:65)
at java.lang.ProcessBuilder.start(ProcessBuilder.java:452)
... 16 more

2011-03-01 14:46:13,779 WARN org.apache.hadoop.mapred.Child: Error running
child
java.lang.RuntimeException:
org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while
processing row
(tag=0) {"key":{"reducesinkkey0":"AA11223344","reducesinkkey1":"20110210_02"},"value":{"_col0":"xxxxx","_col1":"m1","_col2":"20110210_02","_col3":"{'m07':
'x12', 'm02': 'x34', 'm01': 'm45'}","_col4":"0A9"},"alias":0}
at org.apache.hadoop.hive.ql.exec.ExecReducer.reduce(ExecReducer.java:265)
at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:467)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:415)
at org.apache.hadoop.mapred.Child$4.run(Child.java:217)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1063)
at org.apache.hadoop.mapred.Child.main(Child.java:211)
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime
Error while processing row
(tag=0) {"key":{"reducesinkkey0":"AA11223344","reducesinkkey1":"20110210_02"},"value":{"_col0":"xxxxx","_col1":"m1","_col2":"20110210_02","_col3":"{'m07':
'x12', 'm02': 'x34', 'm01': 'm45'}","_col4":"0A9"},"alias":0}
at org.apache.hadoop.hive.ql.exec.ExecReducer.reduce(ExecReducer.java:253)
... 7 more
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Cannot
initialize ScriptOperator
at
org.apache.hadoop.hive.ql.exec.ScriptOperator.processOp(ScriptOperator.java:320)
at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:457)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:697)
at
org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84)
at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:457)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:697)
at
org.apache.hadoop.hive.ql.exec.ExtractOperator.processOp(ExtractOperator.java:45)
at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:457)
at org.apache.hadoop.hive.ql.exec.ExecReducer.reduce(ExecReducer.java:244)
... 7 more
Caused by: java.io.IOException: Cannot run program "/usr/bin/python2.6":
java.io.IOException: error=7, Argument list too long
at java.lang.ProcessBuilder.start(ProcessBuilder.java:459)
at
org.apache.hadoop.hive.ql.exec.ScriptOperator.processOp(ScriptOperator.java:279)
... 15 more
Caused by: java.io.IOException: java.io.IOException: error=7, Argument list
too long
at java.lang.UNIXProcess.(ProcessImpl.java:65)
at java.lang.ProcessBuilder.start(ProcessBuilder.java:452)
... 16 more
2011-03-01 14:46:13,784 INFO org.apache.hadoop.mapred.Task: Runnning cleanup
for the task

Search Discussions

  • Ajo Fod at Mar 1, 2011 at 9:24 pm
    instead of

    using 'python2.6 user_id_output.py hbase'

    try something like this:

    using 'user_id_output.py'
    ... and a #! line with the location of the python binary.

    I think you can include a parameter too in the call like :
    using 'user_id_output.py hbase'

    Cheers,
    Ajo.
    On Tue, Mar 1, 2011 at 8:22 AM, Irfan Mohammed wrote:

    Hi,

    I have a hive script [given below] which calls a python script using
    transform and for large datasets [ > 100M rows ], the reducer is not able to
    start the python process and the error message is "argument list too long".
    The detailed error stack is given below.

    The python script takes only 1 static argument "hbase". For small datasets
    [ 1M rows ], the script works fine.

    1. Is this problem related to the number of open file handles on the
    reducer box?
    2. How do I get the correct error message?
    3. Is there a way to get to the actual unix process with arguments it
    is instantiating?

    Thanks,
    Irfan
    script >>>>>>>>>>>>>>>>>>>>>>>>>>>
    true && echo "
    set hive.exec.compress.output=true;
    set io.seqfile.compression.type=BLOCK;
    set mapred.output.compression.type=BLOCK;
    set
    mapred.output.compression.codec=org.apache.hadoop.io.compress.GzipCodec;

    set hive.merge.mapfiles=false;
    set hive.exec.dynamic.partition=true;
    set hive.exec.dynamic.partition.mode=nonstrict;
    set hive.exec.max.dynamic.partitions=20000;

    set hive.map.aggr=true;
    set mapred.reduce.tasks=200;

    add file /home/irfan/scripts/user_id_output.py;

    from (
    select
    user_id,
    source_id,
    load_id,
    user_id_json,
    pid
    from
    user_id_bucket
    where
    1 = 1
    and length(user_id) = 40
    distribute by pid
    sort by user_id, load_id desc
    ) T1
    insert overwrite table user_id_hbase_stg1 partition (pid)
    SELECT
    transform
    (
    T1.user_id
    , T1.source_id
    , T1.load_id
    , T1.user_id_json
    , T1.pid
    )
    using 'python2.6 user_id_output.py hbase'
    as user_id, user_id_info1, user_id_info2, user_id_info3, user_id_info4,
    user_id_info5, user_id_info3, pid
    ;

    " > ${SQL_FILE};

    true && ${HHIVE} -f ${SQL_FILE} 1>${TXT_FILE} 2>${LOG_FILE};
    error log >>>>>>>>>>>>>>>>>>>>>>>>>
    2011-03-01 14:46:13,705 INFO
    org.apache.hadoop.hive.ql.exec.ExtractOperator: Initializing Self 5 OP
    2011-03-01 14:46:13,711 INFO
    org.apache.hadoop.hive.ql.exec.ExtractOperator: Operator 5 OP initialized
    2011-03-01 14:46:13,711 INFO
    org.apache.hadoop.hive.ql.exec.ExtractOperator: Initializing children of 5
    OP
    2011-03-01 14:46:13,711 INFO org.apache.hadoop.hive.ql.exec.SelectOperator:
    Initializing child 6 SEL
    2011-03-01 14:46:13,711 INFO org.apache.hadoop.hive.ql.exec.SelectOperator:
    Initializing Self 6 SEL
    2011-03-01 14:46:13,711 INFO org.apache.hadoop.hive.ql.exec.SelectOperator:
    SELECT
    struct<_col0:string,_col1:string,_col2:string,_col3:string,_col4:string>
    2011-03-01 14:46:13,711 INFO org.apache.hadoop.hive.ql.exec.SelectOperator:
    Operator 6 SEL initialized
    2011-03-01 14:46:13,711 INFO org.apache.hadoop.hive.ql.exec.SelectOperator:
    Initializing children of 6 SEL
    2011-03-01 14:46:13,711 INFO org.apache.hadoop.hive.ql.exec.ScriptOperator:
    Initializing child 7 SCR
    2011-03-01 14:46:13,711 INFO org.apache.hadoop.hive.ql.exec.ScriptOperator:
    Initializing Self 7 SCR
    2011-03-01 14:46:13,728 INFO org.apache.hadoop.hive.ql.exec.ScriptOperator:
    Operator 7 SCR initialized
    2011-03-01 14:46:13,728 INFO org.apache.hadoop.hive.ql.exec.ScriptOperator:
    Initializing children of 7 SCR
    2011-03-01 14:46:13,728 INFO
    org.apache.hadoop.hive.ql.exec.FileSinkOperator: Initializing child 8 FS
    2011-03-01 14:46:13,728 INFO
    org.apache.hadoop.hive.ql.exec.FileSinkOperator: Initializing Self 8 FS
    2011-03-01 14:46:13,730 INFO
    org.apache.hadoop.hive.ql.exec.FileSinkOperator: Operator 8 FS initialized
    2011-03-01 14:46:13,730 INFO
    org.apache.hadoop.hive.ql.exec.FileSinkOperator: Initialization Done 8 FS
    2011-03-01 14:46:13,730 INFO org.apache.hadoop.hive.ql.exec.ScriptOperator:
    Initialization Done 7 SCR
    2011-03-01 14:46:13,730 INFO org.apache.hadoop.hive.ql.exec.SelectOperator:
    Initialization Done 6 SEL
    2011-03-01 14:46:13,730 INFO
    org.apache.hadoop.hive.ql.exec.ExtractOperator: Initialization Done 5 OP
    2011-03-01 14:46:13,733 INFO ExecReducer: ExecReducer: processing 1 rows:
    used memory = 89690888
    2011-03-01 14:46:13,733 INFO
    org.apache.hadoop.hive.ql.exec.ExtractOperator: 5 forwarding 1 rows
    2011-03-01 14:46:13,733 INFO org.apache.hadoop.hive.ql.exec.SelectOperator:
    6 forwarding 1 rows
    2011-03-01 14:46:13,733 INFO org.apache.hadoop.hive.ql.exec.ScriptOperator:
    Executing [/usr/bin/python2.6, user_id_output.py, hbase]
    2011-03-01 14:46:13,733 INFO org.apache.hadoop.hive.ql.exec.ScriptOperator:
    tablename=null
    2011-03-01 14:46:13,733 INFO org.apache.hadoop.hive.ql.exec.ScriptOperator:
    partname=null
    2011-03-01 14:46:13,733 INFO org.apache.hadoop.hive.ql.exec.ScriptOperator:
    alias=null
    2011-03-01 14:46:13,777 FATAL ExecReducer:
    org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while
    processing row
    (tag=0) {"key":{"reducesinkkey0":"AA11223344","reducesinkkey1":"20110210_02"},"value":{"_col0":"xxxxx","_col1":"m1","_col2":"20110210_02","_col3":"{'m07':
    'x12', 'm02': 'x34', 'm01': 'm45'}","_col4":"0A9"},"alias":0}
    at
    org.apache.hadoop.hive.ql.exec.ExecReducer.reduce(ExecReducer.java:253)
    at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:467)
    at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:415)
    at org.apache.hadoop.mapred.Child$4.run(Child.java:217)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:396)
    at
    org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1063)
    at org.apache.hadoop.mapred.Child.main(Child.java:211)
    Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Cannot
    initialize ScriptOperator
    at
    org.apache.hadoop.hive.ql.exec.ScriptOperator.processOp(ScriptOperator.java:320)
    at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:457)
    at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:697)
    at
    org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84)
    at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:457)
    at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:697)
    at
    org.apache.hadoop.hive.ql.exec.ExtractOperator.processOp(ExtractOperator.java:45)
    at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:457)
    at org.apache.hadoop.hive.ql.exec.ExecReducer.reduce(ExecReducer.java:244)
    ... 7 more
    Caused by: java.io.IOException: Cannot run program "/usr/bin/python2.6":
    java.io.IOException: error=7, Argument list too long
    at java.lang.ProcessBuilder.start(ProcessBuilder.java:459)
    at
    org.apache.hadoop.hive.ql.exec.ScriptOperator.processOp(ScriptOperator.java:279)
    ... 15 more
    Caused by: java.io.IOException: java.io.IOException: error=7, Argument list
    too long
    at java.lang.UNIXProcess.<init>(UNIXProcess.java:148)
    at java.lang.ProcessImpl.start(ProcessImpl.java:65)
    at java.lang.ProcessBuilder.start(ProcessBuilder.java:452)
    ... 16 more

    2011-03-01 14:46:13,779 WARN org.apache.hadoop.mapred.Child: Error running
    child
    java.lang.RuntimeException:
    org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while
    processing row
    (tag=0) {"key":{"reducesinkkey0":"AA11223344","reducesinkkey1":"20110210_02"},"value":{"_col0":"xxxxx","_col1":"m1","_col2":"20110210_02","_col3":"{'m07':
    'x12', 'm02': 'x34', 'm01': 'm45'}","_col4":"0A9"},"alias":0}
    at
    org.apache.hadoop.hive.ql.exec.ExecReducer.reduce(ExecReducer.java:265)
    at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:467)
    at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:415)
    at org.apache.hadoop.mapred.Child$4.run(Child.java:217)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:396)
    at
    org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1063)
    at org.apache.hadoop.mapred.Child.main(Child.java:211)
    Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime
    Error while processing row
    (tag=0) {"key":{"reducesinkkey0":"AA11223344","reducesinkkey1":"20110210_02"},"value":{"_col0":"xxxxx","_col1":"m1","_col2":"20110210_02","_col3":"{'m07':
    'x12', 'm02': 'x34', 'm01': 'm45'}","_col4":"0A9"},"alias":0}
    at
    org.apache.hadoop.hive.ql.exec.ExecReducer.reduce(ExecReducer.java:253)
    ... 7 more
    Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Cannot
    initialize ScriptOperator
    at
    org.apache.hadoop.hive.ql.exec.ScriptOperator.processOp(ScriptOperator.java:320)
    at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:457)
    at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:697)
    at
    org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84)
    at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:457)
    at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:697)
    at
    org.apache.hadoop.hive.ql.exec.ExtractOperator.processOp(ExtractOperator.java:45)
    at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:457)
    at org.apache.hadoop.hive.ql.exec.ExecReducer.reduce(ExecReducer.java:244)
    ... 7 more
    Caused by: java.io.IOException: Cannot run program "/usr/bin/python2.6":
    java.io.IOException: error=7, Argument list too long
    at java.lang.ProcessBuilder.start(ProcessBuilder.java:459)
    at
    org.apache.hadoop.hive.ql.exec.ScriptOperator.processOp(ScriptOperator.java:279)
    ... 15 more
    Caused by: java.io.IOException: java.io.IOException: error=7, Argument list
    too long
    at java.lang.UNIXProcess.<init>(UNIXProcess.java:148)
    at java.lang.ProcessImpl.start(ProcessImpl.java:65)
    at java.lang.ProcessBuilder.start(ProcessBuilder.java:452)
    ... 16 more
    2011-03-01 14:46:13,784 INFO org.apache.hadoop.mapred.Task: Runnning
    cleanup for the task
  • Irfan Mohammed at Mar 1, 2011 at 9:39 pm
    Thanks Ajo.

    The hive sql and the python script run fine with a dataset of ( < 10M )
    records. It is only when running with a large dataset of ( > 100M ) records,
    that I run into this error. I have been trying to get to the number of
    records (10-100M) or the invalid data which is causing the error.

    I am trying to get to the actual error from the UnixProcess instantiation.
    On Tue, Mar 1, 2011 at 1:23 PM, Ajo Fod wrote:

    instead of

    using 'python2.6 user_id_output.py hbase'

    try something like this:

    using 'user_id_output.py'
    ... and a #! line with the location of the python binary.

    I think you can include a parameter too in the call like :
    using 'user_id_output.py hbase'

    Cheers,
    Ajo.

    On Tue, Mar 1, 2011 at 8:22 AM, Irfan Mohammed wrote:

    Hi,

    I have a hive script [given below] which calls a python script using
    transform and for large datasets [ > 100M rows ], the reducer is not able to
    start the python process and the error message is "argument list too long".
    The detailed error stack is given below.

    The python script takes only 1 static argument "hbase". For small datasets
    [ 1M rows ], the script works fine.

    1. Is this problem related to the number of open file handles on the
    reducer box?
    2. How do I get the correct error message?
    3. Is there a way to get to the actual unix process with arguments it
    is instantiating?

    Thanks,
    Irfan
    script >>>>>>>>>>>>>>>>>>>>>>>>>>>
    true && echo "
    set hive.exec.compress.output=true;
    set io.seqfile.compression.type=BLOCK;
    set mapred.output.compression.type=BLOCK;
    set
    mapred.output.compression.codec=org.apache.hadoop.io.compress.GzipCodec;

    set hive.merge.mapfiles=false;
    set hive.exec.dynamic.partition=true;
    set hive.exec.dynamic.partition.mode=nonstrict;
    set hive.exec.max.dynamic.partitions=20000;

    set hive.map.aggr=true;
    set mapred.reduce.tasks=200;

    add file /home/irfan/scripts/user_id_output.py;

    from (
    select
    user_id,
    source_id,
    load_id,
    user_id_json,
    pid
    from
    user_id_bucket
    where
    1 = 1
    and length(user_id) = 40
    distribute by pid
    sort by user_id, load_id desc
    ) T1
    insert overwrite table user_id_hbase_stg1 partition (pid)
    SELECT
    transform
    (
    T1.user_id
    , T1.source_id
    , T1.load_id
    , T1.user_id_json
    , T1.pid
    )
    using 'python2.6 user_id_output.py hbase'
    as user_id, user_id_info1, user_id_info2, user_id_info3, user_id_info4,
    user_id_info5, user_id_info3, pid
    ;

    " > ${SQL_FILE};

    true && ${HHIVE} -f ${SQL_FILE} 1>${TXT_FILE} 2>${LOG_FILE};
    error log >>>>>>>>>>>>>>>>>>>>>>>>>
    2011-03-01 14:46:13,705 INFO
    org.apache.hadoop.hive.ql.exec.ExtractOperator: Initializing Self 5 OP
    2011-03-01 14:46:13,711 INFO
    org.apache.hadoop.hive.ql.exec.ExtractOperator: Operator 5 OP initialized
    2011-03-01 14:46:13,711 INFO
    org.apache.hadoop.hive.ql.exec.ExtractOperator: Initializing children of 5
    OP
    2011-03-01 14:46:13,711 INFO
    org.apache.hadoop.hive.ql.exec.SelectOperator: Initializing child 6 SEL
    2011-03-01 14:46:13,711 INFO
    org.apache.hadoop.hive.ql.exec.SelectOperator: Initializing Self 6 SEL
    2011-03-01 14:46:13,711 INFO
    org.apache.hadoop.hive.ql.exec.SelectOperator: SELECT
    struct<_col0:string,_col1:string,_col2:string,_col3:string,_col4:string>
    2011-03-01 14:46:13,711 INFO
    org.apache.hadoop.hive.ql.exec.SelectOperator: Operator 6 SEL initialized
    2011-03-01 14:46:13,711 INFO
    org.apache.hadoop.hive.ql.exec.SelectOperator: Initializing children of 6
    SEL
    2011-03-01 14:46:13,711 INFO
    org.apache.hadoop.hive.ql.exec.ScriptOperator: Initializing child 7 SCR
    2011-03-01 14:46:13,711 INFO
    org.apache.hadoop.hive.ql.exec.ScriptOperator: Initializing Self 7 SCR
    2011-03-01 14:46:13,728 INFO
    org.apache.hadoop.hive.ql.exec.ScriptOperator: Operator 7 SCR initialized
    2011-03-01 14:46:13,728 INFO
    org.apache.hadoop.hive.ql.exec.ScriptOperator: Initializing children of 7
    SCR
    2011-03-01 14:46:13,728 INFO
    org.apache.hadoop.hive.ql.exec.FileSinkOperator: Initializing child 8 FS
    2011-03-01 14:46:13,728 INFO
    org.apache.hadoop.hive.ql.exec.FileSinkOperator: Initializing Self 8 FS
    2011-03-01 14:46:13,730 INFO
    org.apache.hadoop.hive.ql.exec.FileSinkOperator: Operator 8 FS initialized
    2011-03-01 14:46:13,730 INFO
    org.apache.hadoop.hive.ql.exec.FileSinkOperator: Initialization Done 8 FS
    2011-03-01 14:46:13,730 INFO
    org.apache.hadoop.hive.ql.exec.ScriptOperator: Initialization Done 7 SCR
    2011-03-01 14:46:13,730 INFO
    org.apache.hadoop.hive.ql.exec.SelectOperator: Initialization Done 6 SEL
    2011-03-01 14:46:13,730 INFO
    org.apache.hadoop.hive.ql.exec.ExtractOperator: Initialization Done 5 OP
    2011-03-01 14:46:13,733 INFO ExecReducer: ExecReducer: processing 1 rows:
    used memory = 89690888
    2011-03-01 14:46:13,733 INFO
    org.apache.hadoop.hive.ql.exec.ExtractOperator: 5 forwarding 1 rows
    2011-03-01 14:46:13,733 INFO
    org.apache.hadoop.hive.ql.exec.SelectOperator: 6 forwarding 1 rows
    2011-03-01 14:46:13,733 INFO
    org.apache.hadoop.hive.ql.exec.ScriptOperator: Executing
    [/usr/bin/python2.6, user_id_output.py, hbase]
    2011-03-01 14:46:13,733 INFO
    org.apache.hadoop.hive.ql.exec.ScriptOperator: tablename=null
    2011-03-01 14:46:13,733 INFO
    org.apache.hadoop.hive.ql.exec.ScriptOperator: partname=null
    2011-03-01 14:46:13,733 INFO
    org.apache.hadoop.hive.ql.exec.ScriptOperator: alias=null
    2011-03-01 14:46:13,777 FATAL ExecReducer:
    org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while
    processing row
    (tag=0) {"key":{"reducesinkkey0":"AA11223344","reducesinkkey1":"20110210_02"},"value":{"_col0":"xxxxx","_col1":"m1","_col2":"20110210_02","_col3":"{'m07':
    'x12', 'm02': 'x34', 'm01': 'm45'}","_col4":"0A9"},"alias":0}
    at
    org.apache.hadoop.hive.ql.exec.ExecReducer.reduce(ExecReducer.java:253)
    at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:467)
    at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:415)
    at org.apache.hadoop.mapred.Child$4.run(Child.java:217)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:396)
    at
    org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1063)
    at org.apache.hadoop.mapred.Child.main(Child.java:211)
    Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Cannot
    initialize ScriptOperator
    at
    org.apache.hadoop.hive.ql.exec.ScriptOperator.processOp(ScriptOperator.java:320)
    at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:457)
    at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:697)
    at
    org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84)
    at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:457)
    at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:697)
    at
    org.apache.hadoop.hive.ql.exec.ExtractOperator.processOp(ExtractOperator.java:45)
    at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:457)
    at org.apache.hadoop.hive.ql.exec.ExecReducer.reduce(ExecReducer.java:244)
    ... 7 more
    Caused by: java.io.IOException: Cannot run program "/usr/bin/python2.6":
    java.io.IOException: error=7, Argument list too long
    at java.lang.ProcessBuilder.start(ProcessBuilder.java:459)
    at
    org.apache.hadoop.hive.ql.exec.ScriptOperator.processOp(ScriptOperator.java:279)
    ... 15 more
    Caused by: java.io.IOException: java.io.IOException: error=7, Argument
    list too long
    at java.lang.UNIXProcess.<init>(UNIXProcess.java:148)
    at java.lang.ProcessImpl.start(ProcessImpl.java:65)
    at java.lang.ProcessBuilder.start(ProcessBuilder.java:452)
    ... 16 more

    2011-03-01 14:46:13,779 WARN org.apache.hadoop.mapred.Child: Error running
    child
    java.lang.RuntimeException:
    org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while
    processing row
    (tag=0) {"key":{"reducesinkkey0":"AA11223344","reducesinkkey1":"20110210_02"},"value":{"_col0":"xxxxx","_col1":"m1","_col2":"20110210_02","_col3":"{'m07':
    'x12', 'm02': 'x34', 'm01': 'm45'}","_col4":"0A9"},"alias":0}
    at
    org.apache.hadoop.hive.ql.exec.ExecReducer.reduce(ExecReducer.java:265)
    at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:467)
    at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:415)
    at org.apache.hadoop.mapred.Child$4.run(Child.java:217)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:396)
    at
    org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1063)
    at org.apache.hadoop.mapred.Child.main(Child.java:211)
    Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime
    Error while processing row
    (tag=0) {"key":{"reducesinkkey0":"AA11223344","reducesinkkey1":"20110210_02"},"value":{"_col0":"xxxxx","_col1":"m1","_col2":"20110210_02","_col3":"{'m07':
    'x12', 'm02': 'x34', 'm01': 'm45'}","_col4":"0A9"},"alias":0}
    at
    org.apache.hadoop.hive.ql.exec.ExecReducer.reduce(ExecReducer.java:253)
    ... 7 more
    Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Cannot
    initialize ScriptOperator
    at
    org.apache.hadoop.hive.ql.exec.ScriptOperator.processOp(ScriptOperator.java:320)
    at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:457)
    at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:697)
    at
    org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84)
    at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:457)
    at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:697)
    at
    org.apache.hadoop.hive.ql.exec.ExtractOperator.processOp(ExtractOperator.java:45)
    at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:457)
    at org.apache.hadoop.hive.ql.exec.ExecReducer.reduce(ExecReducer.java:244)
    ... 7 more
    Caused by: java.io.IOException: Cannot run program "/usr/bin/python2.6":
    java.io.IOException: error=7, Argument list too long
    at java.lang.ProcessBuilder.start(ProcessBuilder.java:459)
    at
    org.apache.hadoop.hive.ql.exec.ScriptOperator.processOp(ScriptOperator.java:279)
    ... 15 more
    Caused by: java.io.IOException: java.io.IOException: error=7, Argument
    list too long
    at java.lang.UNIXProcess.<init>(UNIXProcess.java:148)
    at java.lang.ProcessImpl.start(ProcessImpl.java:65)
    at java.lang.ProcessBuilder.start(ProcessBuilder.java:452)
    ... 16 more
    2011-03-01 14:46:13,784 INFO org.apache.hadoop.mapred.Task: Runnning
    cleanup for the task
  • Steven Wong at Mar 1, 2011 at 10:54 pm
    Looks like this is the command line it was executing:

    2011-03-01 14:46:13,733 INFO org.apache.hadoop.hive.ql.exec.ScriptOperator: Executing [/usr/bin/python2.6, user_id_output.py, hbase]


    From: Irfan Mohammed
    Sent: Tuesday, March 01, 2011 1:39 PM
    To: user@hive.apache.org
    Subject: Re: cannot start the transform script. reason : "argument list too long"

    Thanks Ajo.

    The hive sql and the python script run fine with a dataset of ( < 10M ) records. It is only when running with a large dataset of ( > 100M ) records, that I run into this error. I have been trying to get to the number of records (10-100M) or the invalid data which is causing the error.

    I am trying to get to the actual error from the UnixProcess instantiation.

    On Tue, Mar 1, 2011 at 1:23 PM, Ajo Fod wrote:
    instead of

    using 'python2.6 user_id_output.py hbase'
    try something like this:

    using 'user_id_output.py'
    ... and a #! line with the location of the python binary.

    I think you can include a parameter too in the call like :
    using 'user_id_output.py hbase'

    Cheers,
    Ajo.

    On Tue, Mar 1, 2011 at 8:22 AM, Irfan Mohammed wrote:
    Hi,

    I have a hive script [given below] which calls a python script using transform and for large datasets [ > 100M rows ], the reducer is not able to start the python process and the error message is "argument list too long". The detailed error stack is given below.

    The python script takes only 1 static argument "hbase". For small datasets [ 1M rows ], the script works fine.

    1. Is this problem related to the number of open file handles on the reducer box?
    2. How do I get the correct error message?
    3. Is there a way to get to the actual unix process with arguments it is instantiating?
    Thanks,
    Irfan
    script >>>>>>>>>>>>>>>>>>>>>>>>>>>
    true && echo "
    set hive.exec.compress.output=true;
    set io.seqfile.compression.type=BLOCK;
    set mapred.output.compression.type=BLOCK;
    set mapred.output.compression.codec=org.apache.hadoop.io.compress.GzipCodec;

    set hive.merge.mapfiles=false;
    set hive.exec.dynamic.partition=true;
    set hive.exec.dynamic.partition.mode=nonstrict;
    set hive.exec.max.dynamic.partitions=20000;

    set hive.map.aggr=true;
    set mapred.reduce.tasks=200;

    add file /home/irfan/scripts/user_id_output.py;

    from (
    select
    user_id,
    source_id,
    load_id,
    user_id_json,
    pid
    from
    user_id_bucket
    where
    1 = 1
    and length(user_id) = 40
    distribute by pid
    sort by user_id, load_id desc
    ) T1
    insert overwrite table user_id_hbase_stg1 partition (pid)
    SELECT
    transform
    (
    T1.user_id
    , T1.source_id
    , T1.load_id
    , T1.user_id_json
    , T1.pid
    )
    using 'python2.6 user_id_output.py hbase'
    as user_id, user_id_info1, user_id_info2, user_id_info3, user_id_info4, user_id_info5, user_id_info3, pid
    ;

    " > ${SQL_FILE};

    true && ${HHIVE} -f ${SQL_FILE} 1>${TXT_FILE} 2>${LOG_FILE};
    error log >>>>>>>>>>>>>>>>>>>>>>>>>
    2011-03-01 14:46:13,705 INFO org.apache.hadoop.hive.ql.exec.ExtractOperator: Initializing Self 5 OP
    2011-03-01 14:46:13,711 INFO org.apache.hadoop.hive.ql.exec.ExtractOperator: Operator 5 OP initialized
    2011-03-01 14:46:13,711 INFO org.apache.hadoop.hive.ql.exec.ExtractOperator: Initializing children of 5 OP
    2011-03-01 14:46:13,711 INFO org.apache.hadoop.hive.ql.exec.SelectOperator: Initializing child 6 SEL
    2011-03-01 14:46:13,711 INFO org.apache.hadoop.hive.ql.exec.SelectOperator: Initializing Self 6 SEL
    2011-03-01 14:46:13,711 INFO org.apache.hadoop.hive.ql.exec.SelectOperator: SELECT struct<_col0:string,_col1:string,_col2:string,_col3:string,_col4:string>
    2011-03-01 14:46:13,711 INFO org.apache.hadoop.hive.ql.exec.SelectOperator: Operator 6 SEL initialized
    2011-03-01 14:46:13,711 INFO org.apache.hadoop.hive.ql.exec.SelectOperator: Initializing children of 6 SEL
    2011-03-01 14:46:13,711 INFO org.apache.hadoop.hive.ql.exec.ScriptOperator: Initializing child 7 SCR
    2011-03-01 14:46:13,711 INFO org.apache.hadoop.hive.ql.exec.ScriptOperator: Initializing Self 7 SCR
    2011-03-01 14:46:13,728 INFO org.apache.hadoop.hive.ql.exec.ScriptOperator: Operator 7 SCR initialized
    2011-03-01 14:46:13,728 INFO org.apache.hadoop.hive.ql.exec.ScriptOperator: Initializing children of 7 SCR
    2011-03-01 14:46:13,728 INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator: Initializing child 8 FS
    2011-03-01 14:46:13,728 INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator: Initializing Self 8 FS
    2011-03-01 14:46:13,730 INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator: Operator 8 FS initialized
    2011-03-01 14:46:13,730 INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator: Initialization Done 8 FS
    2011-03-01 14:46:13,730 INFO org.apache.hadoop.hive.ql.exec.ScriptOperator: Initialization Done 7 SCR
    2011-03-01 14:46:13,730 INFO org.apache.hadoop.hive.ql.exec.SelectOperator: Initialization Done 6 SEL
    2011-03-01 14:46:13,730 INFO org.apache.hadoop.hive.ql.exec.ExtractOperator: Initialization Done 5 OP
    2011-03-01 14:46:13,733 INFO ExecReducer: ExecReducer: processing 1 rows: used memory = 89690888
    2011-03-01 14:46:13,733 INFO org.apache.hadoop.hive.ql.exec.ExtractOperator: 5 forwarding 1 rows
    2011-03-01 14:46:13,733 INFO org.apache.hadoop.hive.ql.exec.SelectOperator: 6 forwarding 1 rows
    2011-03-01 14:46:13,733 INFO org.apache.hadoop.hive.ql.exec.ScriptOperator: Executing [/usr/bin/python2.6, user_id_output.py, hbase]
    2011-03-01 14:46:13,733 INFO org.apache.hadoop.hive.ql.exec.ScriptOperator: tablename=null
    2011-03-01 14:46:13,733 INFO org.apache.hadoop.hive.ql.exec.ScriptOperator: partname=null
    2011-03-01 14:46:13,733 INFO org.apache.hadoop.hive.ql.exec.ScriptOperator: alias=null
    2011-03-01 14:46:13,777 FATAL ExecReducer: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row (tag=0) {"key":{"reducesinkkey0":"AA11223344","reducesinkkey1":"20110210_02"},"value":{"_col0":"xxxxx","_col1":"m1","_col2":"20110210_02","_col3":"{'m07': 'x12', 'm02': 'x34', 'm01': 'm45'}","_col4":"0A9"},"alias":0}
    at org.apache.hadoop.hive.ql.exec.ExecReducer.reduce(ExecReducer.java:253)
    at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:467)
    at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:415)
    at org.apache.hadoop.mapred.Child$4.run(Child.java:217)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:396)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1063)
    at org.apache.hadoop.mapred.Child.main(Child.java:211)
    Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Cannot initialize ScriptOperator
    at org.apache.hadoop.hive.ql.exec.ScriptOperator.processOp(ScriptOperator.java:320)
    at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:457)
    at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:697)
    at org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84)
    at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:457)
    at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:697)
    at org.apache.hadoop.hive.ql.exec.ExtractOperator.processOp(ExtractOperator.java:45)
    at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:457)
    at org.apache.hadoop.hive.ql.exec.ExecReducer.reduce(ExecReducer.java:244)
    ... 7 more
    Caused by: java.io.IOException: Cannot run program "/usr/bin/python2.6": java.io.IOException: error=7, Argument list too long
    at java.lang.ProcessBuilder.start(ProcessBuilder.java:459)
    at org.apache.hadoop.hive.ql.exec.ScriptOperator.processOp(ScriptOperator.java:279)
    ... 15 more
    Caused by: java.io.IOException: java.io.IOException: error=7, Argument list too long
    at java.lang.UNIXProcess.(ProcessImpl.java:65)
    at java.lang.ProcessBuilder.start(ProcessBuilder.java:452)
    ... 16 more

    2011-03-01 14:46:13,779 WARN org.apache.hadoop.mapred.Child: Error running child
    java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row (tag=0) {"key":{"reducesinkkey0":"AA11223344","reducesinkkey1":"20110210_02"},"value":{"_col0":"xxxxx","_col1":"m1","_col2":"20110210_02","_col3":"{'m07': 'x12', 'm02': 'x34', 'm01': 'm45'}","_col4":"0A9"},"alias":0}
    at org.apache.hadoop.hive.ql.exec.ExecReducer.reduce(ExecReducer.java:265)
    at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:467)
    at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:415)
    at org.apache.hadoop.mapred.Child$4.run(Child.java:217)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:396)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1063)
    at org.apache.hadoop.mapred.Child.main(Child.java:211)
    Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row (tag=0) {"key":{"reducesinkkey0":"AA11223344","reducesinkkey1":"20110210_02"},"value":{"_col0":"xxxxx","_col1":"m1","_col2":"20110210_02","_col3":"{'m07': 'x12', 'm02': 'x34', 'm01': 'm45'}","_col4":"0A9"},"alias":0}
    at org.apache.hadoop.hive.ql.exec.ExecReducer.reduce(ExecReducer.java:253)
    ... 7 more
    Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Cannot initialize ScriptOperator
    at org.apache.hadoop.hive.ql.exec.ScriptOperator.processOp(ScriptOperator.java:320)
    at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:457)
    at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:697)
    at org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84)
    at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:457)
    at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:697)
    at org.apache.hadoop.hive.ql.exec.ExtractOperator.processOp(ExtractOperator.java:45)
    at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:457)
    at org.apache.hadoop.hive.ql.exec.ExecReducer.reduce(ExecReducer.java:244)
    ... 7 more
    Caused by: java.io.IOException: Cannot run program "/usr/bin/python2.6": java.io.IOException: error=7, Argument list too long
    at java.lang.ProcessBuilder.start(ProcessBuilder.java:459)
    at org.apache.hadoop.hive.ql.exec.ScriptOperator.processOp(ScriptOperator.java:279)
    ... 15 more
    Caused by: java.io.IOException: java.io.IOException: error=7, Argument list too long
    at java.lang.UNIXProcess.(ProcessImpl.java:65)
    at java.lang.ProcessBuilder.start(ProcessBuilder.java:452)
    ... 16 more
    2011-03-01 14:46:13,784 INFO org.apache.hadoop.mapred.Task: Runnning cleanup for the task
  • Irfan Mohammed at Mar 1, 2011 at 10:57 pm
    Yes. That is the command it is executing but what I do not understand is why
    I am getting "argument list too long" when I am running the same sql with
    the same python script with a large dataset.

    Thanks.
    On Tue, Mar 1, 2011 at 2:53 PM, Steven Wong wrote:

    Looks like this is the command line it was executing:



    2011-03-01 14:46:13,733 INFO org.apache.hadoop.hive.ql.exec.ScriptOperator:
    Executing [/usr/bin/python2.6, user_id_output.py, hbase]





    *From:* Irfan Mohammed
    *Sent:* Tuesday, March 01, 2011 1:39 PM

    *To:* user@hive.apache.org
    *Subject:* Re: cannot start the transform script. reason : "argument list
    too long"



    Thanks Ajo.



    The hive sql and the python script run fine with a dataset of ( < 10M )
    records. It is only when running with a large dataset of ( > 100M ) records,
    that I run into this error. I have been trying to get to the number of
    records (10-100M) or the invalid data which is causing the error.



    I am trying to get to the actual error from the UnixProcess instantiation.



    On Tue, Mar 1, 2011 at 1:23 PM, Ajo Fod wrote:

    instead of


    using 'python2.6 user_id_output.py hbase'

    try something like this:

    using 'user_id_output.py'
    ... and a #! line with the location of the python binary.

    I think you can include a parameter too in the call like :
    using 'user_id_output.py hbase'

    Cheers,
    Ajo.



    On Tue, Mar 1, 2011 at 8:22 AM, Irfan Mohammed wrote:

    Hi,



    I have a hive script [given below] which calls a python script using
    transform and for large datasets [ > 100M rows ], the reducer is not able to
    start the python process and the error message is "argument list too long".
    The detailed error stack is given below.



    The python script takes only 1 static argument "hbase". For small datasets
    [ 1M rows ], the script works fine.

    1. Is this problem related to the number of open file handles on the
    reducer box?
    2. How do I get the correct error message?
    3. Is there a way to get to the actual unix process with arguments it
    is instantiating?

    Thanks,

    Irfan


    script >>>>>>>>>>>>>>>>>>>>>>>>>>>


    true && echo "

    set hive.exec.compress.output=true;

    set io.seqfile.compression.type=BLOCK;

    set mapred.output.compression.type=BLOCK;

    set
    mapred.output.compression.codec=org.apache.hadoop.io.compress.GzipCodec;



    set hive.merge.mapfiles=false;

    set hive.exec.dynamic.partition=true;

    set hive.exec.dynamic.partition.mode=nonstrict;

    set hive.exec.max.dynamic.partitions=20000;



    set hive.map.aggr=true;

    set mapred.reduce.tasks=200;



    add file /home/irfan/scripts/user_id_output.py;



    from (

    select

    user_id,

    source_id,

    load_id,

    user_id_json,

    pid

    from

    user_id_bucket

    where

    1 = 1

    and length(user_id) = 40

    distribute by pid

    sort by user_id, load_id desc

    ) T1

    insert overwrite table user_id_hbase_stg1 partition (pid)

    SELECT

    transform

    (

    T1.user_id

    , T1.source_id

    , T1.load_id

    , T1.user_id_json

    , T1.pid

    )

    using 'python2.6 user_id_output.py hbase'

    as user_id, user_id_info1, user_id_info2, user_id_info3, user_id_info4,
    user_id_info5, user_id_info3, pid

    ;



    " > ${SQL_FILE};



    true && ${HHIVE} -f ${SQL_FILE} 1>${TXT_FILE} 2>${LOG_FILE};


    error log >>>>>>>>>>>>>>>>>>>>>>>>>


    2011-03-01 14:46:13,705 INFO
    org.apache.hadoop.hive.ql.exec.ExtractOperator: Initializing Self 5 OP

    2011-03-01 14:46:13,711 INFO
    org.apache.hadoop.hive.ql.exec.ExtractOperator: Operator 5 OP initialized

    2011-03-01 14:46:13,711 INFO
    org.apache.hadoop.hive.ql.exec.ExtractOperator: Initializing children of 5
    OP

    2011-03-01 14:46:13,711 INFO org.apache.hadoop.hive.ql.exec.SelectOperator:
    Initializing child 6 SEL

    2011-03-01 14:46:13,711 INFO org.apache.hadoop.hive.ql.exec.SelectOperator:
    Initializing Self 6 SEL

    2011-03-01 14:46:13,711 INFO org.apache.hadoop.hive.ql.exec.SelectOperator:
    SELECT
    struct<_col0:string,_col1:string,_col2:string,_col3:string,_col4:string>

    2011-03-01 14:46:13,711 INFO org.apache.hadoop.hive.ql.exec.SelectOperator:
    Operator 6 SEL initialized

    2011-03-01 14:46:13,711 INFO org.apache.hadoop.hive.ql.exec.SelectOperator:
    Initializing children of 6 SEL

    2011-03-01 14:46:13,711 INFO org.apache.hadoop.hive.ql.exec.ScriptOperator:
    Initializing child 7 SCR

    2011-03-01 14:46:13,711 INFO org.apache.hadoop.hive.ql.exec.ScriptOperator:
    Initializing Self 7 SCR

    2011-03-01 14:46:13,728 INFO org.apache.hadoop.hive.ql.exec.ScriptOperator:
    Operator 7 SCR initialized

    2011-03-01 14:46:13,728 INFO org.apache.hadoop.hive.ql.exec.ScriptOperator:
    Initializing children of 7 SCR

    2011-03-01 14:46:13,728 INFO
    org.apache.hadoop.hive.ql.exec.FileSinkOperator: Initializing child 8 FS

    2011-03-01 14:46:13,728 INFO
    org.apache.hadoop.hive.ql.exec.FileSinkOperator: Initializing Self 8 FS

    2011-03-01 14:46:13,730 INFO
    org.apache.hadoop.hive.ql.exec.FileSinkOperator: Operator 8 FS initialized

    2011-03-01 14:46:13,730 INFO
    org.apache.hadoop.hive.ql.exec.FileSinkOperator: Initialization Done 8 FS

    2011-03-01 14:46:13,730 INFO org.apache.hadoop.hive.ql.exec.ScriptOperator:
    Initialization Done 7 SCR

    2011-03-01 14:46:13,730 INFO org.apache.hadoop.hive.ql.exec.SelectOperator:
    Initialization Done 6 SEL

    2011-03-01 14:46:13,730 INFO
    org.apache.hadoop.hive.ql.exec.ExtractOperator: Initialization Done 5 OP

    2011-03-01 14:46:13,733 INFO ExecReducer: ExecReducer: processing 1 rows:
    used memory = 89690888

    2011-03-01 14:46:13,733 INFO
    org.apache.hadoop.hive.ql.exec.ExtractOperator: 5 forwarding 1 rows

    2011-03-01 14:46:13,733 INFO org.apache.hadoop.hive.ql.exec.SelectOperator:
    6 forwarding 1 rows

    2011-03-01 14:46:13,733 INFO org.apache.hadoop.hive.ql.exec.ScriptOperator:
    Executing [/usr/bin/python2.6, user_id_output.py, hbase]

    2011-03-01 14:46:13,733 INFO org.apache.hadoop.hive.ql.exec.ScriptOperator:
    tablename=null

    2011-03-01 14:46:13,733 INFO org.apache.hadoop.hive.ql.exec.ScriptOperator:
    partname=null

    2011-03-01 14:46:13,733 INFO org.apache.hadoop.hive.ql.exec.ScriptOperator:
    alias=null

    2011-03-01 14:46:13,777 FATAL ExecReducer:
    org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while
    processing row
    (tag=0) {"key":{"reducesinkkey0":"AA11223344","reducesinkkey1":"20110210_02"},"value":{"_col0":"xxxxx","_col1":"m1","_col2":"20110210_02","_col3":"{'m07':
    'x12', 'm02': 'x34', 'm01': 'm45'}","_col4":"0A9"},"alias":0}

    at org.apache.hadoop.hive.ql.exec.ExecReducer.reduce(ExecReducer.java:253)

    at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:467)

    at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:415)

    at org.apache.hadoop.mapred.Child$4.run(Child.java:217)

    at java.security.AccessController.doPrivileged(Native Method)

    at javax.security.auth.Subject.doAs(Subject.java:396)

    at
    org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1063)

    at org.apache.hadoop.mapred.Child.main(Child.java:211)

    Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Cannot
    initialize ScriptOperator

    at
    org.apache.hadoop.hive.ql.exec.ScriptOperator.processOp(ScriptOperator.java:320)

    at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:457)

    at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:697)

    at
    org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84)

    at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:457)

    at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:697)

    at
    org.apache.hadoop.hive.ql.exec.ExtractOperator.processOp(ExtractOperator.java:45)

    at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:457)

    at org.apache.hadoop.hive.ql.exec.ExecReducer.reduce(ExecReducer.java:244)

    ... 7 more

    Caused by: java.io.IOException: Cannot run program "/usr/bin/python2.6":
    java.io.IOException: error=7, Argument list too long

    at java.lang.ProcessBuilder.start(ProcessBuilder.java:459)

    at
    org.apache.hadoop.hive.ql.exec.ScriptOperator.processOp(ScriptOperator.java:279)

    ... 15 more

    Caused by: java.io.IOException: java.io.IOException: error=7, Argument list
    too long

    at java.lang.UNIXProcess.<init>(UNIXProcess.java:148)

    at java.lang.ProcessImpl.start(ProcessImpl.java:65)

    at java.lang.ProcessBuilder.start(ProcessBuilder.java:452)

    ... 16 more



    2011-03-01 14:46:13,779 WARN org.apache.hadoop.mapred.Child: Error running
    child

    java.lang.RuntimeException:
    org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while
    processing row
    (tag=0) {"key":{"reducesinkkey0":"AA11223344","reducesinkkey1":"20110210_02"},"value":{"_col0":"xxxxx","_col1":"m1","_col2":"20110210_02","_col3":"{'m07':
    'x12', 'm02': 'x34', 'm01': 'm45'}","_col4":"0A9"},"alias":0}

    at org.apache.hadoop.hive.ql.exec.ExecReducer.reduce(ExecReducer.java:265)

    at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:467)

    at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:415)

    at org.apache.hadoop.mapred.Child$4.run(Child.java:217)

    at java.security.AccessController.doPrivileged(Native Method)

    at javax.security.auth.Subject.doAs(Subject.java:396)

    at
    org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1063)

    at org.apache.hadoop.mapred.Child.main(Child.java:211)

    Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime
    Error while processing row
    (tag=0) {"key":{"reducesinkkey0":"AA11223344","reducesinkkey1":"20110210_02"},"value":{"_col0":"xxxxx","_col1":"m1","_col2":"20110210_02","_col3":"{'m07':
    'x12', 'm02': 'x34', 'm01': 'm45'}","_col4":"0A9"},"alias":0}

    at org.apache.hadoop.hive.ql.exec.ExecReducer.reduce(ExecReducer.java:253)

    ... 7 more

    Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Cannot
    initialize ScriptOperator

    at
    org.apache.hadoop.hive.ql.exec.ScriptOperator.processOp(ScriptOperator.java:320)

    at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:457)

    at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:697)

    at
    org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84)

    at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:457)

    at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:697)

    at
    org.apache.hadoop.hive.ql.exec.ExtractOperator.processOp(ExtractOperator.java:45)

    at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:457)

    at org.apache.hadoop.hive.ql.exec.ExecReducer.reduce(ExecReducer.java:244)

    ... 7 more

    Caused by: java.io.IOException: Cannot run program "/usr/bin/python2.6":
    java.io.IOException: error=7, Argument list too long

    at java.lang.ProcessBuilder.start(ProcessBuilder.java:459)

    at
    org.apache.hadoop.hive.ql.exec.ScriptOperator.processOp(ScriptOperator.java:279)

    ... 15 more

    Caused by: java.io.IOException: java.io.IOException: error=7, Argument list
    too long

    at java.lang.UNIXProcess.<init>(UNIXProcess.java:148)

    at java.lang.ProcessImpl.start(ProcessImpl.java:65)

    at java.lang.ProcessBuilder.start(ProcessBuilder.java:452)

    ... 16 more

    2011-03-01 14:46:13,784 INFO org.apache.hadoop.mapred.Task: Runnning
    cleanup for the task





  • Dave Brondsema at Mar 2, 2011 at 2:54 pm
    We've gotten this error a couple of times too - it is very misleading, not
    correct at all. IIRC, I determined the root cause is selecting too many
    input files (even though those do NOT get passed as arguments to transform
    script). For example, this happened once we had a lot of dynamic partitions
    and were querying from too many at once. Hope that helps!
    On Tue, Mar 1, 2011 at 5:56 PM, Irfan Mohammed wrote:

    Yes. That is the command it is executing but what I do not understand is
    why I am getting "argument list too long" when I am running the same sql
    with the same python script with a large dataset.

    Thanks.

    On Tue, Mar 1, 2011 at 2:53 PM, Steven Wong wrote:

    Looks like this is the command line it was executing:



    2011-03-01 14:46:13,733 INFO
    org.apache.hadoop.hive.ql.exec.ScriptOperator: Executing
    [/usr/bin/python2.6, user_id_output.py, hbase]





    *From:* Irfan Mohammed
    *Sent:* Tuesday, March 01, 2011 1:39 PM

    *To:* user@hive.apache.org
    *Subject:* Re: cannot start the transform script. reason : "argument list
    too long"



    Thanks Ajo.



    The hive sql and the python script run fine with a dataset of ( < 10M )
    records. It is only when running with a large dataset of ( > 100M ) records,
    that I run into this error. I have been trying to get to the number of
    records (10-100M) or the invalid data which is causing the error.



    I am trying to get to the actual error from the UnixProcess instantiation.



    On Tue, Mar 1, 2011 at 1:23 PM, Ajo Fod wrote:

    instead of


    using 'python2.6 user_id_output.py hbase'

    try something like this:

    using 'user_id_output.py'
    ... and a #! line with the location of the python binary.

    I think you can include a parameter too in the call like :
    using 'user_id_output.py hbase'

    Cheers,
    Ajo.



    On Tue, Mar 1, 2011 at 8:22 AM, Irfan Mohammed <irfan.ma@gmail.com>
    wrote:

    Hi,



    I have a hive script [given below] which calls a python script using
    transform and for large datasets [ > 100M rows ], the reducer is not able to
    start the python process and the error message is "argument list too long".
    The detailed error stack is given below.



    The python script takes only 1 static argument "hbase". For small datasets
    [ 1M rows ], the script works fine.

    1. Is this problem related to the number of open file handles on the
    reducer box?
    2. How do I get the correct error message?
    3. Is there a way to get to the actual unix process with arguments it
    is instantiating?

    Thanks,

    Irfan


    script >>>>>>>>>>>>>>>>>>>>>>>>>>>


    true && echo "

    set hive.exec.compress.output=true;

    set io.seqfile.compression.type=BLOCK;

    set mapred.output.compression.type=BLOCK;

    set
    mapred.output.compression.codec=org.apache.hadoop.io.compress.GzipCodec;



    set hive.merge.mapfiles=false;

    set hive.exec.dynamic.partition=true;

    set hive.exec.dynamic.partition.mode=nonstrict;

    set hive.exec.max.dynamic.partitions=20000;



    set hive.map.aggr=true;

    set mapred.reduce.tasks=200;



    add file /home/irfan/scripts/user_id_output.py;



    from (

    select

    user_id,

    source_id,

    load_id,

    user_id_json,

    pid

    from

    user_id_bucket

    where

    1 = 1

    and length(user_id) = 40

    distribute by pid

    sort by user_id, load_id desc

    ) T1

    insert overwrite table user_id_hbase_stg1 partition (pid)

    SELECT

    transform

    (

    T1.user_id

    , T1.source_id

    , T1.load_id

    , T1.user_id_json

    , T1.pid

    )

    using 'python2.6 user_id_output.py hbase'

    as user_id, user_id_info1, user_id_info2, user_id_info3, user_id_info4,
    user_id_info5, user_id_info3, pid

    ;



    " > ${SQL_FILE};



    true && ${HHIVE} -f ${SQL_FILE} 1>${TXT_FILE} 2>${LOG_FILE};


    error log >>>>>>>>>>>>>>>>>>>>>>>>>


    2011-03-01 14:46:13,705 INFO
    org.apache.hadoop.hive.ql.exec.ExtractOperator: Initializing Self 5 OP

    2011-03-01 14:46:13,711 INFO
    org.apache.hadoop.hive.ql.exec.ExtractOperator: Operator 5 OP initialized

    2011-03-01 14:46:13,711 INFO
    org.apache.hadoop.hive.ql.exec.ExtractOperator: Initializing children of 5
    OP

    2011-03-01 14:46:13,711 INFO
    org.apache.hadoop.hive.ql.exec.SelectOperator: Initializing child 6 SEL

    2011-03-01 14:46:13,711 INFO
    org.apache.hadoop.hive.ql.exec.SelectOperator: Initializing Self 6 SEL

    2011-03-01 14:46:13,711 INFO
    org.apache.hadoop.hive.ql.exec.SelectOperator: SELECT
    struct<_col0:string,_col1:string,_col2:string,_col3:string,_col4:string>

    2011-03-01 14:46:13,711 INFO
    org.apache.hadoop.hive.ql.exec.SelectOperator: Operator 6 SEL initialized

    2011-03-01 14:46:13,711 INFO
    org.apache.hadoop.hive.ql.exec.SelectOperator: Initializing children of 6
    SEL

    2011-03-01 14:46:13,711 INFO
    org.apache.hadoop.hive.ql.exec.ScriptOperator: Initializing child 7 SCR

    2011-03-01 14:46:13,711 INFO
    org.apache.hadoop.hive.ql.exec.ScriptOperator: Initializing Self 7 SCR

    2011-03-01 14:46:13,728 INFO
    org.apache.hadoop.hive.ql.exec.ScriptOperator: Operator 7 SCR initialized

    2011-03-01 14:46:13,728 INFO
    org.apache.hadoop.hive.ql.exec.ScriptOperator: Initializing children of 7
    SCR

    2011-03-01 14:46:13,728 INFO
    org.apache.hadoop.hive.ql.exec.FileSinkOperator: Initializing child 8 FS

    2011-03-01 14:46:13,728 INFO
    org.apache.hadoop.hive.ql.exec.FileSinkOperator: Initializing Self 8 FS

    2011-03-01 14:46:13,730 INFO
    org.apache.hadoop.hive.ql.exec.FileSinkOperator: Operator 8 FS initialized

    2011-03-01 14:46:13,730 INFO
    org.apache.hadoop.hive.ql.exec.FileSinkOperator: Initialization Done 8 FS

    2011-03-01 14:46:13,730 INFO
    org.apache.hadoop.hive.ql.exec.ScriptOperator: Initialization Done 7 SCR

    2011-03-01 14:46:13,730 INFO
    org.apache.hadoop.hive.ql.exec.SelectOperator: Initialization Done 6 SEL

    2011-03-01 14:46:13,730 INFO
    org.apache.hadoop.hive.ql.exec.ExtractOperator: Initialization Done 5 OP

    2011-03-01 14:46:13,733 INFO ExecReducer: ExecReducer: processing 1 rows:
    used memory = 89690888

    2011-03-01 14:46:13,733 INFO
    org.apache.hadoop.hive.ql.exec.ExtractOperator: 5 forwarding 1 rows

    2011-03-01 14:46:13,733 INFO
    org.apache.hadoop.hive.ql.exec.SelectOperator: 6 forwarding 1 rows

    2011-03-01 14:46:13,733 INFO
    org.apache.hadoop.hive.ql.exec.ScriptOperator: Executing
    [/usr/bin/python2.6, user_id_output.py, hbase]

    2011-03-01 14:46:13,733 INFO
    org.apache.hadoop.hive.ql.exec.ScriptOperator: tablename=null

    2011-03-01 14:46:13,733 INFO
    org.apache.hadoop.hive.ql.exec.ScriptOperator: partname=null

    2011-03-01 14:46:13,733 INFO
    org.apache.hadoop.hive.ql.exec.ScriptOperator: alias=null

    2011-03-01 14:46:13,777 FATAL ExecReducer:
    org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while
    processing row
    (tag=0) {"key":{"reducesinkkey0":"AA11223344","reducesinkkey1":"20110210_02"},"value":{"_col0":"xxxxx","_col1":"m1","_col2":"20110210_02","_col3":"{'m07':
    'x12', 'm02': 'x34', 'm01': 'm45'}","_col4":"0A9"},"alias":0}

    at org.apache.hadoop.hive.ql.exec.ExecReducer.reduce(ExecReducer.java:253)

    at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:467)

    at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:415)

    at org.apache.hadoop.mapred.Child$4.run(Child.java:217)

    at java.security.AccessController.doPrivileged(Native Method)

    at javax.security.auth.Subject.doAs(Subject.java:396)

    at
    org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1063)

    at org.apache.hadoop.mapred.Child.main(Child.java:211)

    Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Cannot
    initialize ScriptOperator

    at
    org.apache.hadoop.hive.ql.exec.ScriptOperator.processOp(ScriptOperator.java:320)

    at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:457)

    at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:697)

    at
    org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84)

    at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:457)

    at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:697)

    at
    org.apache.hadoop.hive.ql.exec.ExtractOperator.processOp(ExtractOperator.java:45)

    at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:457)

    at org.apache.hadoop.hive.ql.exec.ExecReducer.reduce(ExecReducer.java:244)

    ... 7 more

    Caused by: java.io.IOException: Cannot run program "/usr/bin/python2.6":
    java.io.IOException: error=7, Argument list too long

    at java.lang.ProcessBuilder.start(ProcessBuilder.java:459)

    at
    org.apache.hadoop.hive.ql.exec.ScriptOperator.processOp(ScriptOperator.java:279)

    ... 15 more

    Caused by: java.io.IOException: java.io.IOException: error=7, Argument
    list too long

    at java.lang.UNIXProcess.<init>(UNIXProcess.java:148)

    at java.lang.ProcessImpl.start(ProcessImpl.java:65)

    at java.lang.ProcessBuilder.start(ProcessBuilder.java:452)

    ... 16 more



    2011-03-01 14:46:13,779 WARN org.apache.hadoop.mapred.Child: Error running
    child

    java.lang.RuntimeException:
    org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while
    processing row
    (tag=0) {"key":{"reducesinkkey0":"AA11223344","reducesinkkey1":"20110210_02"},"value":{"_col0":"xxxxx","_col1":"m1","_col2":"20110210_02","_col3":"{'m07':
    'x12', 'm02': 'x34', 'm01': 'm45'}","_col4":"0A9"},"alias":0}

    at org.apache.hadoop.hive.ql.exec.ExecReducer.reduce(ExecReducer.java:265)

    at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:467)

    at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:415)

    at org.apache.hadoop.mapred.Child$4.run(Child.java:217)

    at java.security.AccessController.doPrivileged(Native Method)

    at javax.security.auth.Subject.doAs(Subject.java:396)

    at
    org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1063)

    at org.apache.hadoop.mapred.Child.main(Child.java:211)

    Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime
    Error while processing row
    (tag=0) {"key":{"reducesinkkey0":"AA11223344","reducesinkkey1":"20110210_02"},"value":{"_col0":"xxxxx","_col1":"m1","_col2":"20110210_02","_col3":"{'m07':
    'x12', 'm02': 'x34', 'm01': 'm45'}","_col4":"0A9"},"alias":0}

    at org.apache.hadoop.hive.ql.exec.ExecReducer.reduce(ExecReducer.java:253)

    ... 7 more

    Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Cannot
    initialize ScriptOperator

    at
    org.apache.hadoop.hive.ql.exec.ScriptOperator.processOp(ScriptOperator.java:320)

    at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:457)

    at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:697)

    at
    org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84)

    at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:457)

    at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:697)

    at
    org.apache.hadoop.hive.ql.exec.ExtractOperator.processOp(ExtractOperator.java:45)

    at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:457)

    at org.apache.hadoop.hive.ql.exec.ExecReducer.reduce(ExecReducer.java:244)

    ... 7 more

    Caused by: java.io.IOException: Cannot run program "/usr/bin/python2.6":
    java.io.IOException: error=7, Argument list too long

    at java.lang.ProcessBuilder.start(ProcessBuilder.java:459)

    at
    org.apache.hadoop.hive.ql.exec.ScriptOperator.processOp(ScriptOperator.java:279)

    ... 15 more

    Caused by: java.io.IOException: java.io.IOException: error=7, Argument
    list too long

    at java.lang.UNIXProcess.<init>(UNIXProcess.java:148)

    at java.lang.ProcessImpl.start(ProcessImpl.java:65)

    at java.lang.ProcessBuilder.start(ProcessBuilder.java:452)

    ... 16 more

    2011-03-01 14:46:13,784 INFO org.apache.hadoop.mapred.Task: Runnning
    cleanup for the task






    --
    Dave Brondsema
    Software Engineer
    Geeknet

    www.geek.net

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupuser @
categorieshive, hadoop
postedMar 1, '11 at 4:22p
activeMar 2, '11 at 2:54p
posts6
users4
websitehive.apache.org

People

Translate

site design / logo © 2021 Grokbase