FAQ
I am running basic hadoop examples on amazon emr and I am stuck at a very
simple place. I am apparently not passing the right "classname" for
inputFormat
From hadoop documentation it seems like "TextInputFormat" is a valid option
for input format

I am running a simple sort example using mapreduce.

Here is the command variations I tried, all to vain:


$usr/local/hadoop/bin/hadoop jar /path to hadoop
examples/hadoop-0.18.0-examples.jar sort -inFormat TextInputFormat
-outFormat TextOutputFormat /path to datainput/datain/ /path to data
output/dataout

The sort function does not declare "TextInputFormat" in its import list.
Could that be a problem
?
Could it be a version problem?


Any help is aprpeciated!
Shivani



--
Research Scholar,
School of Electrical and Computer Engineering
Purdue University
West Lafayette IN
web.ics.purdue.edu/~sgrao

Search Discussions

  • Simon at Feb 28, 2011 at 3:37 am
    Firstly, I think your hadoop version is a bit too old, maybe you can try
    version number larger than 20.
    And try to run the sort sample with the following command.
    bin/hadoop jar hadoop-*-examples.jar sort [-m <#maps>] [-r <#reduces>]
    <in-dir> <out-dir>

    HTH.
    Simon
    On Fri, Feb 25, 2011 at 5:37 PM, Shivani Rao wrote:

    I am running basic hadoop examples on amazon emr and I am stuck at a very
    simple place. I am apparently not passing the right "classname" for
    inputFormat

    From hadoop documentation it seems like "TextInputFormat" is a valid option
    for input format

    I am running a simple sort example using mapreduce.

    Here is the command variations I tried, all to vain:


    $usr/local/hadoop/bin/hadoop jar /path to hadoop
    examples/hadoop-0.18.0-examples.jar sort -inFormat TextInputFormat
    -outFormat TextOutputFormat /path to datainput/datain/ /path to data
    output/dataout

    The sort function does not declare "TextInputFormat" in its import list.
    Could that be a problem
    ?
    Could it be a version problem?


    Any help is aprpeciated!
    Shivani



    --
    Research Scholar,
    School of Electrical and Computer Engineering
    Purdue University
    West Lafayette IN
    web.ics.purdue.edu/~sgrao <http://web.ics.purdue.edu/%7Esgrao>


    --
    Regards,
    Simon
  • Raoshivani at Mar 3, 2011 at 6:18 pm
    Hello Simon,

    I tried with hadoop-0.20 examples and still the input format error for the
    sort program. I took a second look at the sort.java code and looks like the
    default class is SequeceFileInputFormat

    Class<? extends InputFormat> inputFormatClass =
    SequenceFileInputFormat.class;

    So if I do not specify a class I am going to get an input format error

    I am unable to specify the right inputformat class.

    Any ideas?

    Regards,
    Shivani

    --
    View this message in context: http://lucene.472066.n3.nabble.com/a-hadoop-input-format-question-tp2588087p2627190.html
    Sent from the Hadoop lucene-users mailing list archive at Nabble.com.
  • Raoshivani at Mar 3, 2011 at 6:21 pm
    Hello Simon,

    I tried with hadoop-0.20 examples and still the input format error for the
    sort program. I took a second look at the sort.java code and looks like the
    default class is SequeceFileInputFormat

    Class<? extends InputFormat> inputFormatClass =
    SequenceFileInputFormat.class;

    So if I do not specify a class I am going to get an input format error

    I am unable to specify the right inputformat class.

    Any ideas?

    Regards,
    Shivani

    --
    View this message in context: http://lucene.472066.n3.nabble.com/a-hadoop-input-format-question-tp2588087p2627274.html
    Sent from the Hadoop lucene-users mailing list archive at Nabble.com.
  • Shivani Rao at Mar 3, 2011 at 4:55 am
    Problems running local installation of hadoop on single-node cluster

    I followed instructions given by tutorials to run hadoop-0.21 on a single node cluster.

    The first problem I encountered was that of HADOOP-6953. Thankfully that has got fixed.

    The other problem I am facing is that the datanode does not start. This I guess because when I run stop-dfs.sh for datanode, I get a message
    "no datanode to stop"

    I am wondering if it is related remotely to the difference in the IP addresses on my computer

    127.0.0.1 localhost
    127.0.1.1 my-laptop

    Although I am aware of this, I do not know how to fix this.

    I am unable to even run a simple pi estimate example on the haddop installation

    This is the output I get is

    bin/hadoop jar hadoop-mapred-examples-0.21.0.jar pi 10 10
    Number of Maps = 10
    Samples per Map = 10
    11/03/02 23:38:47 INFO security.Groups: Group mapping impl=org.apache.hadoop.security.ShellBasedUnixGroupsMapping; cacheTimeout=300000

    And nothing else for long long time.

    I have not set the dfs.namedir and dfs.datadir in my hdfs-site.xml. But After running bin/hadoop namenode -format, I see that the tmp.dir has a folder with dfs/data and dfs/data folders for the two directories.

    what Am I doing wrong? Any help is appreciated.

    Here are my configuration files

    Regards,
    Shivani

    hdfs-site.xml

    <property>
    <name>dfs.replication</name>
    <value>1</value>
    <description>Default block replication.
    The actual number of replications can be specified when the file is created.
    The default is used if replication is not specified in create time.
    </description>
    </property>


    core-site.xml

    <property>
    <name>hadoop.tmp.dir</name>
    <value>/usr/local/hadoop-${user.name}</value>
    <description>A base for other temporary directories.</description>
    </property>

    <property>
    <name>fs.default.name</name>
    <value>hdfs://localhost:54310</value>
    <description>The name of the default file system. A URI whose
    scheme and authority determine the FileSystem implementation. The
    uri's scheme determines the config property (fs.SCHEME.impl) naming
    the FileSystem implementation class. The uri's authority is used to
    determine the host, port, etc. for a filesystem.</description>
    </property>



    mapred-site.xml

    <property>
    <name>mapred.job.tracker</name>
    <value>localhost:54311</value>
    <description>The host and port that the MapReduce job tracker runs
    at. If "local", then jobs are run in-process as a single map
    and reduce task.
    </description>
    </property>
  • Rahul patodi at Mar 3, 2011 at 5:04 am
    Hi,
    Please check logs, there might be some error occured while starting daemons
    Please post the error
    On Thu, Mar 3, 2011 at 10:24 AM, Shivani Rao wrote:

    Problems running local installation of hadoop on single-node cluster

    I followed instructions given by tutorials to run hadoop-0.21 on a single
    node cluster.

    The first problem I encountered was that of HADOOP-6953. Thankfully that
    has got fixed.

    The other problem I am facing is that the datanode does not start. This I
    guess because when I run stop-dfs.sh for datanode, I get a message
    "no datanode to stop"

    I am wondering if it is related remotely to the difference in the IP
    addresses on my computer

    127.0.0.1 localhost
    127.0.1.1 my-laptop

    Although I am aware of this, I do not know how to fix this.

    I am unable to even run a simple pi estimate example on the haddop
    installation

    This is the output I get is

    bin/hadoop jar hadoop-mapred-examples-0.21.0.jar pi 10 10
    Number of Maps = 10
    Samples per Map = 10
    11/03/02 23:38:47 INFO security.Groups: Group mapping
    impl=org.apache.hadoop.security.ShellBasedUnixGroupsMapping;
    cacheTimeout=300000

    And nothing else for long long time.

    I have not set the dfs.namedir and dfs.datadir in my hdfs-site.xml. But
    After running bin/hadoop namenode -format, I see that the tmp.dir has a
    folder with dfs/data and dfs/data folders for the two directories.

    what Am I doing wrong? Any help is appreciated.

    Here are my configuration files

    Regards,
    Shivani

    hdfs-site.xml

    <property>
    <name>dfs.replication</name>
    <value>1</value>
    <description>Default block replication.
    The actual number of replications can be specified when the file is
    created.
    The default is used if replication is not specified in create time.
    </description>
    </property>


    core-site.xml

    <property>
    <name>hadoop.tmp.dir</name>
    <value>/usr/local/hadoop-${user.name}</value>
    <description>A base for other temporary directories.</description>
    </property>

    <property>
    <name>fs.default.name</name>
    <value>hdfs://localhost:54310</value>
    <description>The name of the default file system. A URI whose
    scheme and authority determine the FileSystem implementation. The
    uri's scheme determines the config property (fs.SCHEME.impl) naming
    the FileSystem implementation class. The uri's authority is used to
    determine the host, port, etc. for a filesystem.</description>
    </property>



    mapred-site.xml

    <property>
    <name>mapred.job.tracker</name>
    <value>localhost:54311</value>
    <description>The host and port that the MapReduce job tracker runs
    at. If "local", then jobs are run in-process as a single map
    and reduce task.
    </description>
    </property>


Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcommon-user @
categorieshadoop
postedFeb 27, '11 at 1:05p
activeMar 3, '11 at 6:21p
posts6
users4
websitehadoop.apache.org...
irc#hadoop

People

Translate

site design / logo © 2022 Grokbase