FAQ
I am a newbie to Unix/Hadoop and have basic questions about CDH3 setup.


I installed CDH3 on Ubuntu 11.0 Unix box. I want to setup a sudo
cluster where I can run my pig jobs under mapreduce mode.
How do I achieve that?

1. I couldd not find the core-site.xml. hdfs-site.xml and mapred-site.xml
files with all default parameters set? Where are these located.
(I see the files under example-conf. dir, but I guess they are example
files)
2. I see several config files under /usr/lib/hadoop/conf. But all of them
are empty files, with the comments that these can be used to override the
configuration, but these are read-only files. What is the intention of
these files being read-only.


Many Thanks,
Prashant

Search Discussions

  • Mingxi Wu at Apr 16, 2012 at 1:55 am
    Hi,

    I use hadoop cloudera 0.20.2-cdh3u0.

    I have a program which uploads local files to HDFS every hour.

    Basically, I open a gzip input stream by in= new GZIPInputStream(fin); And write to HDFS file. After less than two days, it will hang. It hangs at FSDataOutputStream.close(86).
    Here is the stack:

    State: WAITING Running 16660 ms (user 13770 ms) blocked 11276 times for <> ms waiting 11209 times for <> ms
    LockName: java.util.LinkedList@f1ca0de LockOwnerId: -1
    java.lang.Object.wait(-2)
    java.lang.Object.wait(485)
    org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.waitForAckedSeqno(3468)
    org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.flushInternal(3457)
    org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.closeInternal(3549)
    org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.close(3488)
    org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(61)
    org.apache.hadoop.fs.FSDataOutputStream.close(86)
    org.apache.hadoop.io.IOUtils.copyBytes(59)
    org.apache.hadoop.io.IOUtils.copyBytes(74)

    Any suggestion to avoid this issue? It seems this is a bug in hadoop. I found this issue is less severe when my upload server do one upload at a time, instead of using multiple concurrent uploads.

    Thanks,

    Mingxi
  • Uma Maheswara Rao G at Apr 16, 2012 at 4:29 am
    Hi Mingxi,

    In your thread dump, did you check DataStreamer thread? is it running?

    If DataStreamer thread is not running, then this issue would be mostly same as HDFS-2850.

    Did you find any OOME in your clients?

    Regards,
    Uma
    ________________________________________
    From: Mingxi Wu [Mingxi.Wu@turn.com]
    Sent: Monday, April 16, 2012 7:25 AM
    To: common-user@hadoop.apache.org
    Subject: upload hang at DFSClient$DFSOutputStream.close(3488)

    Hi,

    I use hadoop cloudera 0.20.2-cdh3u0.

    I have a program which uploads local files to HDFS every hour.

    Basically, I open a gzip input stream by in= new GZIPInputStream(fin); And write to HDFS file. After less than two days, it will hang. It hangs at FSDataOutputStream.close(86).
    Here is the stack:

    State: WAITING Running 16660 ms (user 13770 ms) blocked 11276 times for <> ms waiting 11209 times for <> ms
    LockName: java.util.LinkedList@f1ca0de LockOwnerId: -1
    java.lang.Object.wait(-2)
    java.lang.Object.wait(485)
    org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.waitForAckedSeqno(3468)
    org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.flushInternal(3457)
    org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.closeInternal(3549)
    org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.close(3488)
    org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(61)
    org.apache.hadoop.fs.FSDataOutputStream.close(86)
    org.apache.hadoop.io.IOUtils.copyBytes(59)
    org.apache.hadoop.io.IOUtils.copyBytes(74)

    Any suggestion to avoid this issue? It seems this is a bug in hadoop. I found this issue is less severe when my upload server do one upload at a time, instead of using multiple concurrent uploads.

    Thanks,

    Mingxi
  • Manish Bhoge at Apr 16, 2012 at 2:38 am
    Prashant,
    Post your questions to cdh-user@cloudera.org.

    Follow CDH3 installation guide. After installing package and individual components you need to configure all configuration files like core-site.xml, hdfs-site.xml etc.

    Thanks
    Manish
    Sent from my BlackBerry, pls excuse typo

    -----Original Message-----
    From: shan s <mysub987@gmail.com>
    Date: Mon, 16 Apr 2012 02:49:51
    To: <common-user@hadoop.apache.org>
    Reply-To: common-user@hadoop.apache.org
    Subject: Basic setup questions on Ubuntu

    I am a newbie to Unix/Hadoop and have basic questions about CDH3 setup.


    I installed CDH3 on Ubuntu 11.0 Unix box. I want to setup a sudo
    cluster where I can run my pig jobs under mapreduce mode.
    How do I achieve that?

    1. I couldd not find the core-site.xml. hdfs-site.xml and mapred-site.xml
    files with all default parameters set? Where are these located.
    (I see the files under example-conf. dir, but I guess they are example
    files)
    2. I see several config files under /usr/lib/hadoop/conf. But all of them
    are empty files, with the comments that these can be used to override the
    configuration, but these are read-only files. What is the intention of
    these files being read-only.


    Many Thanks,
    Prashant

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcommon-user @
categorieshadoop
postedApr 15, '12 at 9:20p
activeApr 16, '12 at 4:29a
posts4
users4
websitehadoop.apache.org...
irc#hadoop

People

Translate

site design / logo © 2022 Grokbase