FAQ
Hello!

I have designed a UI that supports data transfer and fetching from the
remote port to HDFS. Now, if I need to do some kind of search, i.e; there
are N no. of files persent in HDFS. If I need to make a check whether a
particular file or folder is present or not- I will have to simply use
Hadoop api-FileSystem and will have to write few lines of code by using the
methods of it...right??filesystem.exists(Path of the file)...right??

When we would be using Map-Reduce?

What all ways You could suggest me for its usage and applications. I have
gone through the Google's doc;but, want the precise detail of it.

--
Regards!
Sugandha

Search Discussions

  • Sugandha Naolekar at Jul 9, 2009 at 5:49 am
    Hello!

    I have a 7 node hadoop cluster!

    As of now, I am able to transfer(dump) the data in HDFS from a remote
    node(not a part of hadoop cluster). And through web UI, I am able to
    download the same.

    -> but, If I need to restrict that web UI to few users only, what am I
    supposed to do?

    -> Also, if I need to do some kind of search, i.e; whether a particular file
    or folder is available or not in HDFS..??? Will I be able to do it simply,
    by writing a code using hadoop Filesystem api's.? Will it be fast and
    efficient in case of data extending to huge amount?

    -> Also, after above tasks, I want to implement compression algorithms. The
    data that is getting placed in HDFS, shold be placed in a compressed format.
    Will I have to use hadoop api's only, or some map-reudce techniques? In
    those complete episode, Map-reduce is necessary? If yes, where??


    --
    Regards!
    Sugandha
  • Alex Loddengaard at Jul 9, 2009 at 5:46 pm
    Answers in-line. Let me know if any questions follow.

    Alex

    On Wed, Jul 8, 2009 at 10:49 PM, Sugandha Naolekar
    wrote:
    Hello!

    I have a 7 node hadoop cluster!

    As of now, I am able to transfer(dump) the data in HDFS from a remote
    node(not a part of hadoop cluster). And through web UI, I am able to
    download the same.

    -> but, If I need to restrict that web UI to few users only, what am I
    supposed to do?
    Hadoop doesn't have any mechanism for authentication, so you'll have to do
    this with Linux tools. It's also dangerous to restrict access to the web
    ports, because those same ports are used by the Hadoop daemons themselves.
    You could use iptables to create an IP whitelist, and include your users'
    IPs, as well as your nodes' IPs. There may be a way to massage Jetty to
    restrict access, but I don't know enough about Jetty to be able to say for
    sure.
    -> Also, if I need to do some kind of search, i.e; whether a particular
    file
    or folder is available or not in HDFS..??? Will I be able to do it simply,
    by writing a code using hadoop Filesystem api's.? Will it be fast and
    efficient in case of data extending to huge amount?
    The API should be sufficient here. Another possibility, if you'd rather not
    use Java, is to get the Fuse contrib project working and mount HDFS onto a
    Linux box. Then you could use Python, bash, or whatever to do your file
    traversals. Note that fuse isn't widely used, so it may be hard to get
    going (I've never done it).

    -> Also, after above tasks, I want to implement compression algorithms. The
    data that is getting placed in HDFS, shold be placed in a compressed
    format.
    Will I have to use hadoop api's only, or some map-reudce techniques? In
    those complete episode, Map-reduce is necessary? If yes, where??
    There are a few different ways to do this. Probably the easiest being the
    following. First, put your data in HDFS in its original format. Then, use
    IdentityMapper and IdentityReducer to read your (assumingly plain text) data
    via TextInputFormat, and configure your job to use SequenceFileOutputFormat
    (to learn about the different compression options, see <
    http://hadoop.apache.org/core/docs/current/api/org/apache/hadoop/io/SequenceFile.html>).
    After this map reduce job is done, you will have your original data, and
    your data in SequenceFiles. Make sense?


    --
    Regards!
    Sugandha

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcommon-user @
categorieshadoop
postedJul 7, '09 at 10:35a
activeJul 9, '09 at 5:46p
posts3
users2
websitehadoop.apache.org...
irc#hadoop

People

Translate

site design / logo © 2022 Grokbase