In the documentation for Hadoop Streaming it says that the "-input" option
can be specified multiple times for multiples input directories. The same
does not seem to work with Pipes.

Is there some way to specify multiple input directories for pipes jobs?


ps. With muliple input dirs this is what happens (i.e. there is no clear
error message of any sort).

*+ bin/hadoop pipes -conf pipes.xml -input /in-dir-har/test.har -input
/in-dir -output /out-dir
bin/hadoop pipes
[-input <path>] // Input directory
[-output <path>] // Output directory
[-jar <jar file> // jar filename
[-inputformat <class>] // InputFormat class
[-map <class>] // Java Map class
[-partitioner <class>] // Java Partitioner
[-reduce <class>] // Java Reduce class
[-writer <class>] // Java RecordWriter
[-program <executable>] // executable URI
[-reduces <num>] // number of reduces

Generic options supported are
-conf <configuration file> specify an application configuration file
-D <property=value> use value for given property
-fs <local|namenode:port> specify a namenode
-jt <local|jobtracker:port> specify a job tracker
-files <comma separated list of files> specify comma separated files to
be copied to the map reduce cluster
-libjars <comma separated list of jars> specify comma separated jar files
to include in the classpath.
-archives <comma separated list of archives> specify comma separated
archives to be unarchived on the compute machines.

The general command line syntax is
bin/hadoop command [genericOptions] [commandOptions]

Search Discussions

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcommon-user @
postedJun 18, '09 at 11:29p
activeJun 18, '09 at 11:29p

1 user in discussion

Roshan James: 1 post



site design / logo © 2022 Grokbase