Grokbase Groups Hive user March 2011
FAQ
Yes. That is the command it is executing but what I do not understand is why
I am getting "argument list too long" when I am running the same sql with
the same python script with a large dataset.

Thanks.
On Tue, Mar 1, 2011 at 2:53 PM, Steven Wong wrote:

Looks like this is the command line it was executing:



2011-03-01 14:46:13,733 INFO org.apache.hadoop.hive.ql.exec.ScriptOperator:
Executing [/usr/bin/python2.6, user_id_output.py, hbase]





*From:* Irfan Mohammed
*Sent:* Tuesday, March 01, 2011 1:39 PM

*To:* user@hive.apache.org
*Subject:* Re: cannot start the transform script. reason : "argument list
too long"



Thanks Ajo.



The hive sql and the python script run fine with a dataset of ( < 10M )
records. It is only when running with a large dataset of ( > 100M ) records,
that I run into this error. I have been trying to get to the number of
records (10-100M) or the invalid data which is causing the error.



I am trying to get to the actual error from the UnixProcess instantiation.



On Tue, Mar 1, 2011 at 1:23 PM, Ajo Fod wrote:

instead of


using 'python2.6 user_id_output.py hbase'

try something like this:

using 'user_id_output.py'
... and a #! line with the location of the python binary.

I think you can include a parameter too in the call like :
using 'user_id_output.py hbase'

Cheers,
Ajo.



On Tue, Mar 1, 2011 at 8:22 AM, Irfan Mohammed wrote:

Hi,



I have a hive script [given below] which calls a python script using
transform and for large datasets [ > 100M rows ], the reducer is not able to
start the python process and the error message is "argument list too long".
The detailed error stack is given below.



The python script takes only 1 static argument "hbase". For small datasets
[ 1M rows ], the script works fine.

1. Is this problem related to the number of open file handles on the
reducer box?
2. How do I get the correct error message?
3. Is there a way to get to the actual unix process with arguments it
is instantiating?

Thanks,

Irfan


script >>>>>>>>>>>>>>>>>>>>>>>>>>>


true && echo "

set hive.exec.compress.output=true;

set io.seqfile.compression.type=BLOCK;

set mapred.output.compression.type=BLOCK;

set
mapred.output.compression.codec=org.apache.hadoop.io.compress.GzipCodec;



set hive.merge.mapfiles=false;

set hive.exec.dynamic.partition=true;

set hive.exec.dynamic.partition.mode=nonstrict;

set hive.exec.max.dynamic.partitions=20000;



set hive.map.aggr=true;

set mapred.reduce.tasks=200;



add file /home/irfan/scripts/user_id_output.py;



from (

select

user_id,

source_id,

load_id,

user_id_json,

pid

from

user_id_bucket

where

1 = 1

and length(user_id) = 40

distribute by pid

sort by user_id, load_id desc

) T1

insert overwrite table user_id_hbase_stg1 partition (pid)

SELECT

transform

(

T1.user_id

, T1.source_id

, T1.load_id

, T1.user_id_json

, T1.pid

)

using 'python2.6 user_id_output.py hbase'

as user_id, user_id_info1, user_id_info2, user_id_info3, user_id_info4,
user_id_info5, user_id_info3, pid

;



" > ${SQL_FILE};



true && ${HHIVE} -f ${SQL_FILE} 1>${TXT_FILE} 2>${LOG_FILE};


error log >>>>>>>>>>>>>>>>>>>>>>>>>


2011-03-01 14:46:13,705 INFO
org.apache.hadoop.hive.ql.exec.ExtractOperator: Initializing Self 5 OP

2011-03-01 14:46:13,711 INFO
org.apache.hadoop.hive.ql.exec.ExtractOperator: Operator 5 OP initialized

2011-03-01 14:46:13,711 INFO
org.apache.hadoop.hive.ql.exec.ExtractOperator: Initializing children of 5
OP

2011-03-01 14:46:13,711 INFO org.apache.hadoop.hive.ql.exec.SelectOperator:
Initializing child 6 SEL

2011-03-01 14:46:13,711 INFO org.apache.hadoop.hive.ql.exec.SelectOperator:
Initializing Self 6 SEL

2011-03-01 14:46:13,711 INFO org.apache.hadoop.hive.ql.exec.SelectOperator:
SELECT
struct<_col0:string,_col1:string,_col2:string,_col3:string,_col4:string>

2011-03-01 14:46:13,711 INFO org.apache.hadoop.hive.ql.exec.SelectOperator:
Operator 6 SEL initialized

2011-03-01 14:46:13,711 INFO org.apache.hadoop.hive.ql.exec.SelectOperator:
Initializing children of 6 SEL

2011-03-01 14:46:13,711 INFO org.apache.hadoop.hive.ql.exec.ScriptOperator:
Initializing child 7 SCR

2011-03-01 14:46:13,711 INFO org.apache.hadoop.hive.ql.exec.ScriptOperator:
Initializing Self 7 SCR

2011-03-01 14:46:13,728 INFO org.apache.hadoop.hive.ql.exec.ScriptOperator:
Operator 7 SCR initialized

2011-03-01 14:46:13,728 INFO org.apache.hadoop.hive.ql.exec.ScriptOperator:
Initializing children of 7 SCR

2011-03-01 14:46:13,728 INFO
org.apache.hadoop.hive.ql.exec.FileSinkOperator: Initializing child 8 FS

2011-03-01 14:46:13,728 INFO
org.apache.hadoop.hive.ql.exec.FileSinkOperator: Initializing Self 8 FS

2011-03-01 14:46:13,730 INFO
org.apache.hadoop.hive.ql.exec.FileSinkOperator: Operator 8 FS initialized

2011-03-01 14:46:13,730 INFO
org.apache.hadoop.hive.ql.exec.FileSinkOperator: Initialization Done 8 FS

2011-03-01 14:46:13,730 INFO org.apache.hadoop.hive.ql.exec.ScriptOperator:
Initialization Done 7 SCR

2011-03-01 14:46:13,730 INFO org.apache.hadoop.hive.ql.exec.SelectOperator:
Initialization Done 6 SEL

2011-03-01 14:46:13,730 INFO
org.apache.hadoop.hive.ql.exec.ExtractOperator: Initialization Done 5 OP

2011-03-01 14:46:13,733 INFO ExecReducer: ExecReducer: processing 1 rows:
used memory = 89690888

2011-03-01 14:46:13,733 INFO
org.apache.hadoop.hive.ql.exec.ExtractOperator: 5 forwarding 1 rows

2011-03-01 14:46:13,733 INFO org.apache.hadoop.hive.ql.exec.SelectOperator:
6 forwarding 1 rows

2011-03-01 14:46:13,733 INFO org.apache.hadoop.hive.ql.exec.ScriptOperator:
Executing [/usr/bin/python2.6, user_id_output.py, hbase]

2011-03-01 14:46:13,733 INFO org.apache.hadoop.hive.ql.exec.ScriptOperator:
tablename=null

2011-03-01 14:46:13,733 INFO org.apache.hadoop.hive.ql.exec.ScriptOperator:
partname=null

2011-03-01 14:46:13,733 INFO org.apache.hadoop.hive.ql.exec.ScriptOperator:
alias=null

2011-03-01 14:46:13,777 FATAL ExecReducer:
org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while
processing row
(tag=0) {"key":{"reducesinkkey0":"AA11223344","reducesinkkey1":"20110210_02"},"value":{"_col0":"xxxxx","_col1":"m1","_col2":"20110210_02","_col3":"{'m07':
'x12', 'm02': 'x34', 'm01': 'm45'}","_col4":"0A9"},"alias":0}

at org.apache.hadoop.hive.ql.exec.ExecReducer.reduce(ExecReducer.java:253)

at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:467)

at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:415)

at org.apache.hadoop.mapred.Child$4.run(Child.java:217)

at java.security.AccessController.doPrivileged(Native Method)

at javax.security.auth.Subject.doAs(Subject.java:396)

at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1063)

at org.apache.hadoop.mapred.Child.main(Child.java:211)

Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Cannot
initialize ScriptOperator

at
org.apache.hadoop.hive.ql.exec.ScriptOperator.processOp(ScriptOperator.java:320)

at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:457)

at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:697)

at
org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84)

at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:457)

at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:697)

at
org.apache.hadoop.hive.ql.exec.ExtractOperator.processOp(ExtractOperator.java:45)

at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:457)

at org.apache.hadoop.hive.ql.exec.ExecReducer.reduce(ExecReducer.java:244)

... 7 more

Caused by: java.io.IOException: Cannot run program "/usr/bin/python2.6":
java.io.IOException: error=7, Argument list too long

at java.lang.ProcessBuilder.start(ProcessBuilder.java:459)

at
org.apache.hadoop.hive.ql.exec.ScriptOperator.processOp(ScriptOperator.java:279)

... 15 more

Caused by: java.io.IOException: java.io.IOException: error=7, Argument list
too long

at java.lang.UNIXProcess.<init>(UNIXProcess.java:148)

at java.lang.ProcessImpl.start(ProcessImpl.java:65)

at java.lang.ProcessBuilder.start(ProcessBuilder.java:452)

... 16 more



2011-03-01 14:46:13,779 WARN org.apache.hadoop.mapred.Child: Error running
child

java.lang.RuntimeException:
org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while
processing row
(tag=0) {"key":{"reducesinkkey0":"AA11223344","reducesinkkey1":"20110210_02"},"value":{"_col0":"xxxxx","_col1":"m1","_col2":"20110210_02","_col3":"{'m07':
'x12', 'm02': 'x34', 'm01': 'm45'}","_col4":"0A9"},"alias":0}

at org.apache.hadoop.hive.ql.exec.ExecReducer.reduce(ExecReducer.java:265)

at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:467)

at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:415)

at org.apache.hadoop.mapred.Child$4.run(Child.java:217)

at java.security.AccessController.doPrivileged(Native Method)

at javax.security.auth.Subject.doAs(Subject.java:396)

at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1063)

at org.apache.hadoop.mapred.Child.main(Child.java:211)

Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime
Error while processing row
(tag=0) {"key":{"reducesinkkey0":"AA11223344","reducesinkkey1":"20110210_02"},"value":{"_col0":"xxxxx","_col1":"m1","_col2":"20110210_02","_col3":"{'m07':
'x12', 'm02': 'x34', 'm01': 'm45'}","_col4":"0A9"},"alias":0}

at org.apache.hadoop.hive.ql.exec.ExecReducer.reduce(ExecReducer.java:253)

... 7 more

Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Cannot
initialize ScriptOperator

at
org.apache.hadoop.hive.ql.exec.ScriptOperator.processOp(ScriptOperator.java:320)

at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:457)

at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:697)

at
org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84)

at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:457)

at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:697)

at
org.apache.hadoop.hive.ql.exec.ExtractOperator.processOp(ExtractOperator.java:45)

at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:457)

at org.apache.hadoop.hive.ql.exec.ExecReducer.reduce(ExecReducer.java:244)

... 7 more

Caused by: java.io.IOException: Cannot run program "/usr/bin/python2.6":
java.io.IOException: error=7, Argument list too long

at java.lang.ProcessBuilder.start(ProcessBuilder.java:459)

at
org.apache.hadoop.hive.ql.exec.ScriptOperator.processOp(ScriptOperator.java:279)

... 15 more

Caused by: java.io.IOException: java.io.IOException: error=7, Argument list
too long

at java.lang.UNIXProcess.<init>(UNIXProcess.java:148)

at java.lang.ProcessImpl.start(ProcessImpl.java:65)

at java.lang.ProcessBuilder.start(ProcessBuilder.java:452)

... 16 more

2011-03-01 14:46:13,784 INFO org.apache.hadoop.mapred.Task: Runnning
cleanup for the task





Search Discussions

Discussion Posts

Previous

Follow ups

Related Discussions

Discussion Navigation
viewthread | post
posts ‹ prev | 5 of 6 | next ›
Discussion Overview
groupuser @
categorieshive, hadoop
postedMar 1, '11 at 4:22p
activeMar 2, '11 at 2:54p
posts6
users4
websitehive.apache.org

People

Translate

site design / logo © 2021 Grokbase