FAQ
Hi!

Yes, I'm aware that it's not good idea build "ordinary" filesystem above
Hadoop. Let's say that I try to build system for my users where is 500 GB
space for every user. It seems that Hadoop can write/store 500 GB fine, but
reading and altering data later isn't easy (at least not altering).

How the big boys do this? E.g. Google filesystem, Gmail is above that (and
still latency time seems fine for the remote enduser)? How about Amazon S3?
Do the big players implement some caching layers above Hadoop like system?

My dream is to have system with easy to add more space when needed, with all
those automatic features: balancing, recovery of data (keeping it really
there no matter what happens) etc. I guess I'm not alone there.

BR,
--
-- MJo

Search Discussions

  • Andreas Kostyrka at Apr 8, 2008 at 1:47 pm
    HDFS has slightly different design goals. It's not meant as a general
    purpose filesystem, it's meant as the fast sequential input/output
    storage thing meant for hadoops map/reduce.

    Andreas

    Am Dienstag, den 08.04.2008, 16:24 +0300 schrieb Mika Joukainen:
    Hi!

    Yes, I'm aware that it's not good idea build "ordinary" filesystem above
    Hadoop. Let's say that I try to build system for my users where is 500 GB
    space for every user. It seems that Hadoop can write/store 500 GB fine, but
    reading and altering data later isn't easy (at least not altering).

    How the big boys do this? E.g. Google filesystem, Gmail is above that (and
    still latency time seems fine for the remote enduser)? How about Amazon S3?
    Do the big players implement some caching layers above Hadoop like system?

    My dream is to have system with easy to add more space when needed, with all
    those automatic features: balancing, recovery of data (keeping it really
    there no matter what happens) etc. I guess I'm not alone there.

    BR,
  • Mika Joukainen at Apr 22, 2008 at 7:24 pm

    On Tue, Apr 8, 2008 at 4:47 PM, Andreas Kostyrka wrote:

    HDFS has slightly different design goals. It's not meant as a general
    purpose filesystem, it's meant as the fast sequential input/output
    storage thing meant for hadoops map/reduce.

    Andreas

    Am Dienstag, den 08.04.2008, 16:24 +0300 schrieb Mika Joukainen:
    Hi!

    Yes, I'm aware that it's not good idea build "ordinary" filesystem above
    Hadoop. Let's say that I try to build system for my users where is 500 GB
    space for every user. It seems that Hadoop can write/store 500 GB fine, but
    reading and altering data later isn't easy (at least not altering).

    How the big boys do this? E.g. Google filesystem, Gmail is above that (and
    still latency time seems fine for the remote enduser)? How about Amazon S3?
    Do the big players implement some caching layers above Hadoop like system?
    My dream is to have system with easy to add more space when needed, with all
    those automatic features: balancing, recovery of data (keeping it really
    there no matter what happens) etc. I guess I'm not alone there.
    All right, I have to refrase: like to have storage system for files which
    are inserted by the users. Users are going to use normal human operable sw
    entities ;) System is going to have: fault tolerance, parallelism etc. ==
    HDFS, isn't it.

    Therefore could these help to achieve goal:
    https://issues.apache.org/jira/browse/HADOOP-3246 "FTP client over HDFS"
    https://issues.apache.org/jira/browse/HADOOP-496 "Expose HDFS as a WebDAV
    store"
    and maybe https://issues.apache.org/jira/browse/HADOOP-3199 "Need an FTP
    Server implementation over HDFS"

    Especially https://issues.apache.org/jira/browse/HADOOP-3246 "FTP client
    over HDFS"?

    BR,
    -- MJo
  • Allen Wittenauer at Apr 22, 2008 at 7:43 pm

    On 4/22/08 12:23 PM, "Mika Joukainen" wrote:

    All right, I have to refrase: like to have storage system for files which
    are inserted by the users. Users are going to use normal human operable sw
    entities ;) System is going to have: fault tolerance, parallelism etc. ==
    HDFS, isn't it.
    No, it isn't. You're looking for Lustre and similar file systems.

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcommon-user @
categorieshadoop
postedApr 8, '08 at 1:35p
activeApr 22, '08 at 7:43p
posts4
users3
websitehadoop.apache.org...
irc#hadoop

People

Translate

site design / logo © 2021 Grokbase