FAQ
can't package zip file with hadoop streaming -file argument
-----------------------------------------------------------

Key: HADOOP-3811
URL: https://issues.apache.org/jira/browse/HADOOP-3811
Project: Hadoop Core
Issue Type: Bug
Components: contrib/streaming
Affects Versions: 0.17.0
Reporter: Karl Anderson


I'm unable to ship a file with a .zip suffix to the mapper using the -file argument for hadoop streaming. I am able to ship it if I change the suffix to .zipp. Is this a bug, or perhaps has something to do with the jar file format which is used to send files to the instance?

For example, with this hadoop invocation, and local files "/tmp/boto.zip" and "/tmp/boto.zipp" which are copies of each other:

$HADOOP_HOME/bin/hadoop jar $HADOOP_HOME/contrib/streaming/hadoop-0.17.0-streaming.jar -mapper $KCLUSTER_SRC/testmapper.py -reducer $KCLUSTER_SRC/testreducer.py -input input/foo -output output -file /tmp/foo.txt -file /tmp/boto.zip -file /tmp/boto.zipp

I see this line in the invocation standard output:

packageJobJar: [/tmp/foo.txt, /tmp/boto.zip, /tmp/boto.zipp, /tmp/hadoop-karl/hadoop-unjar6899/] [] /tmp/streamjob6900.jar tmpDir=null

But in the current directory of the mapper process, "boto.zip" does not exist, while "boto.zipp" does.


--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Search Discussions

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcommon-dev @
categorieshadoop
postedJul 22, '08 at 10:17p
activeJul 22, '08 at 10:17p
posts1
users1
websitehadoop.apache.org...
irc#hadoop

1 user in discussion

Karl Anderson (JIRA): 1 post

People

Translate

site design / logo © 2022 Grokbase