FAQ
I would like someone to compare and contrast CIFS and HDFS? Or...if that
is not a valid comparison...please explain to me why it's not a valid
comparison.

Thanks,
Trevor

. This message and any attachments contain information from Union Pacific which may be confidential and/or privileged.
If you are not the intended recipient, be aware that any disclosure, copying, distribution or use of the contents of this message is strictly prohibited by law. If you receive this message in error, please contact the sender immediately and delete the message and any attachments.

Search Discussions

  • Ted Dunning at Oct 15, 2007 at 9:02 pm
    CIFS is a file system, that doesn't scale particularly well, nor does it
    support parallelisation of programs very well.

    HDFS isn't quite a file system. At least not in the sense of something that
    does all the things that you expect a file system to do (CRUD operations,
    access control, meta-data). On the other hand, it does support one
    particularly parallel execution model very nicely.

    On 10/15/07 12:23 PM, "TREVORSTEWART@UP.COM" wrote:


    I would like someone to compare and contrast CIFS and HDFS? Or...if that
    is not a valid comparison...please explain to me why it's not a valid
    comparison.

    Thanks,
    Trevor

    .
    This message and any attachments contain information from Union Pacific which
    may be confidential and/or privileged.
    If you are not the intended recipient, be aware that any disclosure, copying,
    distribution or use of the contents of this message is strictly prohibited by
    law. If you receive this message in error, please contact the sender
    immediately and delete the message and any attachments.
  • Joydeep Sen Sarma at Oct 15, 2007 at 10:20 pm
    Not a valid comparison. CIFS is a remote file access protocol only. HDFS
    is a file system (that comes bundled with a remote file access
    protocol).

    It may be possible to build a CIFS gateway for HDFS.

    One interesting point of comparison at the protocol level is the level
    of parallelism. Compared to HDFS protocol - CIFS exposes less
    parallelism. DFS/CIFS has the concept of junction points that allows
    directories from different storage servers to be stitched into one
    namespace. There are commercial products that make this easy. However -
    this allows parallelism at directory level only - whereas HDFS protocol
    allows a single file to be distributed across different servers.

    (And as was pointed out - CIFS supports many other file system
    operations - ACLs, oplocks and what not that HDFS doesn't).

    -----Original Message-----
    From: TREVORSTEWART@UP.COM
    Sent: Monday, October 15, 2007 12:24 PM
    To: hadoop-user@lucene.apache.org
    Subject: HDFS vs. CIFS


    I would like someone to compare and contrast CIFS and HDFS? Or...if
    that
    is not a valid comparison...please explain to me why it's not a valid
    comparison.

    Thanks,
    Trevor

    .
    This message and any attachments contain information from Union Pacific
    which may be confidential and/or privileged.
    If you are not the intended recipient, be aware that any disclosure,
    copying, distribution or use of the contents of this message is strictly
    prohibited by law. If you receive this message in error, please contact
    the sender immediately and delete the message and any attachments.
  • Trevorstewart at Oct 16, 2007 at 3:53 pm
    Hmmm...OK...

    Let me explain my requirements here and see if you all can tell me if
    Hadoop provides the functionality I need.

    I'm building a highly perfomant, highly available (no less than 4 9's), raw
    storage subsystem. It will be write once for the initial dataset (binary
    data) but will have the ability to maintain metadata associated to the
    binary data. The metadata will be "queryiable" and therefore indexed
    (want to use Lucene for this purpose). It must have the ability to store
    petrabytes of data. We will use either NetApps or DMX3 storage media.

    Please discuss...






    "Joydeep Sen
    Sarma"
    <jssarma@facebook To
    .com> <hadoop-user@lucene.apache.org>
    cc
    10/15/2007 05:20
    PM Subject
    RE: HDFS vs. CIFS

    Please respond to
    hadoop-user@lucen
    e.apache.org






    Not a valid comparison. CIFS is a remote file access protocol only. HDFS
    is a file system (that comes bundled with a remote file access
    protocol).

    It may be possible to build a CIFS gateway for HDFS.

    One interesting point of comparison at the protocol level is the level
    of parallelism. Compared to HDFS protocol - CIFS exposes less
    parallelism. DFS/CIFS has the concept of junction points that allows
    directories from different storage servers to be stitched into one
    namespace. There are commercial products that make this easy. However -
    this allows parallelism at directory level only - whereas HDFS protocol
    allows a single file to be distributed across different servers.

    (And as was pointed out - CIFS supports many other file system
    operations - ACLs, oplocks and what not that HDFS doesn't).

    -----Original Message-----
    From: TREVORSTEWART@UP.COM
    Sent: Monday, October 15, 2007 12:24 PM
    To: hadoop-user@lucene.apache.org
    Subject: HDFS vs. CIFS


    I would like someone to compare and contrast CIFS and HDFS? Or...if
    that
    is not a valid comparison...please explain to me why it's not a valid
    comparison.

    Thanks,
    Trevor

    .
    This message and any attachments contain information from Union Pacific
    which may be confidential and/or privileged.
    If you are not the intended recipient, be aware that any disclosure,
    copying, distribution or use of the contents of this message is strictly
    prohibited by law. If you receive this message in error, please contact
    the sender immediately and delete the message and any attachments.




    . This message and any attachments contain information from Union Pacific which may be confidential and/or privileged.
    If you are not the intended recipient, be aware that any disclosure, copying, distribution or use of the contents of this message is strictly prohibited by law. If you receive this message in error, please contact the sender immediately and delete the message and any attachments.
  • Ted Dunning at Oct 16, 2007 at 5:32 pm
    First, it is PETAbytes, not petRabytes.

    Secondly, if you are committed to using NetApps or DMX3, then you really
    don't need (or want HDFS).

    Thirdly, if you are committed to using a distributed file store like HDFS
    (or MogileFS or KFS), then you don't need NetApps. Distributed file systems
    were designed exactly to eliminate the need for highly engineered storage
    systems by allowing the use of entire redundant computers rather than
    cleverly interconnected disks.

    So you really have two classes of designs:

    A) traditional big iron

    B) trendy, but not entirely ready for prime time distributed file stores
    like HDFS

    The first option will probably work and will cost about 2x more (based on my
    experience, your mileage will vary). The second option will require more
    hand-holding and won't come with a support contract, but you would be able
    to do some things with it that are impossible in a traditional sense.


    My guess is that if you are still asking basic questions like this that are
    answered in the FAQ, then you will be better off paying NetApp for
    engineering time than building this system on your own.

    On 10/16/07 8:52 AM, "TREVORSTEWART@UP.COM" wrote:

    Hmmm...OK...

    Let me explain my requirements here and see if you all can tell me if
    Hadoop provides the functionality I need.

    I'm building a highly perfomant, highly available (no less than 4 9's), raw
    storage subsystem. It will be write once for the initial dataset (binary
    data) but will have the ability to maintain metadata associated to the
    binary data. The metadata will be "queryiable" and therefore indexed
    (want to use Lucene for this purpose). It must have the ability to store
    petrabytes of data. We will use either NetApps or DMX3 storage media.

    Please discuss...






    "Joydeep Sen
    Sarma"
    <jssarma@facebook To
    .com> <hadoop-user@lucene.apache.org>
    cc
    10/15/2007 05:20
    PM Subject
    RE: HDFS vs. CIFS

    Please respond to
    hadoop-user@lucen
    e.apache.org






    Not a valid comparison. CIFS is a remote file access protocol only. HDFS
    is a file system (that comes bundled with a remote file access
    protocol).

    It may be possible to build a CIFS gateway for HDFS.

    One interesting point of comparison at the protocol level is the level
    of parallelism. Compared to HDFS protocol - CIFS exposes less
    parallelism. DFS/CIFS has the concept of junction points that allows
    directories from different storage servers to be stitched into one
    namespace. There are commercial products that make this easy. However -
    this allows parallelism at directory level only - whereas HDFS protocol
    allows a single file to be distributed across different servers.

    (And as was pointed out - CIFS supports many other file system
    operations - ACLs, oplocks and what not that HDFS doesn't).

    -----Original Message-----
    From: TREVORSTEWART@UP.COM
    Sent: Monday, October 15, 2007 12:24 PM
    To: hadoop-user@lucene.apache.org
    Subject: HDFS vs. CIFS


    I would like someone to compare and contrast CIFS and HDFS? Or...if
    that
    is not a valid comparison...please explain to me why it's not a valid
    comparison.

    Thanks,
    Trevor

    .
    This message and any attachments contain information from Union Pacific
    which may be confidential and/or privileged.
    If you are not the intended recipient, be aware that any disclosure,
    copying, distribution or use of the contents of this message is strictly
    prohibited by law. If you receive this message in error, please contact
    the sender immediately and delete the message and any attachments.




    .
    This message and any attachments contain information from Union Pacific which
    may be confidential and/or privileged.
    If you are not the intended recipient, be aware that any disclosure, copying,
    distribution or use of the contents of this message is strictly prohibited by
    law. If you receive this message in error, please contact the sender
    immediately and delete the message and any attachments.
  • Trevorstewart at Oct 16, 2007 at 5:47 pm
    Well then...color me humbled Mr. Dunning.

    I apologize for monopolizing your quite obviously precious time.

    BTW...I don't believe these questions are answered in the FAQ.

    Thank you for making the open source experience SO enjoyable.





    Ted Dunning
    <tdunning@veoh.co
    m> To
    <hadoop-user@lucene.apache.org>
    10/16/2007 12:32 cc
    PM
    Subject
    Re: HDFS vs. CIFS
    Please respond to
    hadoop-user@lucen
    e.apache.org








    First, it is PETAbytes, not petRabytes.

    Secondly, if you are committed to using NetApps or DMX3, then you really
    don't need (or want HDFS).

    Thirdly, if you are committed to using a distributed file store like HDFS
    (or MogileFS or KFS), then you don't need NetApps. Distributed file
    systems
    were designed exactly to eliminate the need for highly engineered storage
    systems by allowing the use of entire redundant computers rather than
    cleverly interconnected disks.

    So you really have two classes of designs:

    A) traditional big iron

    B) trendy, but not entirely ready for prime time distributed file stores
    like HDFS

    The first option will probably work and will cost about 2x more (based on
    my
    experience, your mileage will vary). The second option will require more
    hand-holding and won't come with a support contract, but you would be able
    to do some things with it that are impossible in a traditional sense.


    My guess is that if you are still asking basic questions like this that are
    answered in the FAQ, then you will be better off paying NetApp for
    engineering time than building this system on your own.

    On 10/16/07 8:52 AM, "TREVORSTEWART@UP.COM" wrote:

    Hmmm...OK...

    Let me explain my requirements here and see if you all can tell me if
    Hadoop provides the functionality I need.

    I'm building a highly perfomant, highly available (no less than 4 9's), raw
    storage subsystem. It will be write once for the initial dataset (binary
    data) but will have the ability to maintain metadata associated to the
    binary data. The metadata will be "queryiable" and therefore indexed
    (want to use Lucene for this purpose). It must have the ability to store
    petrabytes of data. We will use either NetApps or DMX3 storage media.

    Please discuss...






    "Joydeep Sen
    Sarma"
    <jssarma@facebook To
    .com> <hadoop-user@lucene.apache.org> cc
    10/15/2007 05:20
    PM Subject
    RE: HDFS vs. CIFS

    Please respond to
    hadoop-user@lucen
    e.apache.org






    Not a valid comparison. CIFS is a remote file access protocol only. HDFS
    is a file system (that comes bundled with a remote file access
    protocol).

    It may be possible to build a CIFS gateway for HDFS.

    One interesting point of comparison at the protocol level is the level
    of parallelism. Compared to HDFS protocol - CIFS exposes less
    parallelism. DFS/CIFS has the concept of junction points that allows
    directories from different storage servers to be stitched into one
    namespace. There are commercial products that make this easy. However -
    this allows parallelism at directory level only - whereas HDFS protocol
    allows a single file to be distributed across different servers.

    (And as was pointed out - CIFS supports many other file system
    operations - ACLs, oplocks and what not that HDFS doesn't).

    -----Original Message-----
    From: TREVORSTEWART@UP.COM
    Sent: Monday, October 15, 2007 12:24 PM
    To: hadoop-user@lucene.apache.org
    Subject: HDFS vs. CIFS


    I would like someone to compare and contrast CIFS and HDFS? Or...if
    that
    is not a valid comparison...please explain to me why it's not a valid
    comparison.

    Thanks,
    Trevor

    .
    This message and any attachments contain information from Union Pacific
    which may be confidential and/or privileged.
    If you are not the intended recipient, be aware that any disclosure,
    copying, distribution or use of the contents of this message is strictly
    prohibited by law. If you receive this message in error, please contact
    the sender immediately and delete the message and any attachments.




    .
    This message and any attachments contain information from Union Pacific which
    may be confidential and/or privileged.
    If you are not the intended recipient, be aware that any disclosure, copying,
    distribution or use of the contents of this message is strictly
    prohibited by
    law. If you receive this message in error, please contact the sender
    immediately and delete the message and any attachments.



    . This message and any attachments contain information from Union Pacific which may be confidential and/or privileged.
    If you are not the intended recipient, be aware that any disclosure, copying, distribution or use of the contents of this message is strictly prohibited by law. If you receive this message in error, please contact the sender immediately and delete the message and any attachments.
  • Ted Dunning at Oct 16, 2007 at 5:53 pm
    Apologies off-list. That wasn't intended to be rude.

    On 10/16/07 10:46 AM, "TREVORSTEWART@UP.COM" wrote:

    Well then...color me humbled Mr. Dunning.

    I apologize for monopolizing your quite obviously precious time.

    BTW...I don't believe these questions are answered in the FAQ.

    Thank you for making the open source experience SO enjoyable.





    Ted Dunning
    <tdunning@veoh.co
    m> To
    <hadoop-user@lucene.apache.org>
    10/16/2007 12:32 cc
    PM
    Subject
    Re: HDFS vs. CIFS
    Please respond to
    hadoop-user@lucen
    e.apache.org








    First, it is PETAbytes, not petRabytes.

    Secondly, if you are committed to using NetApps or DMX3, then you really
    don't need (or want HDFS).

    Thirdly, if you are committed to using a distributed file store like HDFS
    (or MogileFS or KFS), then you don't need NetApps. Distributed file
    systems
    were designed exactly to eliminate the need for highly engineered storage
    systems by allowing the use of entire redundant computers rather than
    cleverly interconnected disks.

    So you really have two classes of designs:

    A) traditional big iron

    B) trendy, but not entirely ready for prime time distributed file stores
    like HDFS

    The first option will probably work and will cost about 2x more (based on
    my
    experience, your mileage will vary). The second option will require more
    hand-holding and won't come with a support contract, but you would be able
    to do some things with it that are impossible in a traditional sense.


    My guess is that if you are still asking basic questions like this that are
    answered in the FAQ, then you will be better off paying NetApp for
    engineering time than building this system on your own.

    On 10/16/07 8:52 AM, "TREVORSTEWART@UP.COM" wrote:

    Hmmm...OK...

    Let me explain my requirements here and see if you all can tell me if
    Hadoop provides the functionality I need.

    I'm building a highly perfomant, highly available (no less than 4 9's), raw
    storage subsystem. It will be write once for the initial dataset (binary
    data) but will have the ability to maintain metadata associated to the
    binary data. The metadata will be "queryiable" and therefore indexed
    (want to use Lucene for this purpose). It must have the ability to store
    petrabytes of data. We will use either NetApps or DMX3 storage media.

    Please discuss...






    "Joydeep Sen
    Sarma"
    <jssarma@facebook To
    .com> <hadoop-user@lucene.apache.org> cc
    10/15/2007 05:20
    PM Subject
    RE: HDFS vs. CIFS

    Please respond to
    hadoop-user@lucen
    e.apache.org






    Not a valid comparison. CIFS is a remote file access protocol only. HDFS
    is a file system (that comes bundled with a remote file access
    protocol).

    It may be possible to build a CIFS gateway for HDFS.

    One interesting point of comparison at the protocol level is the level
    of parallelism. Compared to HDFS protocol - CIFS exposes less
    parallelism. DFS/CIFS has the concept of junction points that allows
    directories from different storage servers to be stitched into one
    namespace. There are commercial products that make this easy. However -
    this allows parallelism at directory level only - whereas HDFS protocol
    allows a single file to be distributed across different servers.

    (And as was pointed out - CIFS supports many other file system
    operations - ACLs, oplocks and what not that HDFS doesn't).

    -----Original Message-----
    From: TREVORSTEWART@UP.COM
    Sent: Monday, October 15, 2007 12:24 PM
    To: hadoop-user@lucene.apache.org
    Subject: HDFS vs. CIFS


    I would like someone to compare and contrast CIFS and HDFS? Or...if
    that
    is not a valid comparison...please explain to me why it's not a valid
    comparison.

    Thanks,
    Trevor

    .
    This message and any attachments contain information from Union Pacific
    which may be confidential and/or privileged.
    If you are not the intended recipient, be aware that any disclosure,
    copying, distribution or use of the contents of this message is strictly
    prohibited by law. If you receive this message in error, please contact
    the sender immediately and delete the message and any attachments.




    .
    This message and any attachments contain information from Union Pacific which
    may be confidential and/or privileged.
    If you are not the intended recipient, be aware that any disclosure, copying,
    distribution or use of the contents of this message is strictly
    prohibited by
    law. If you receive this message in error, please contact the sender
    immediately and delete the message and any attachments.



    .
    This message and any attachments contain information from Union Pacific which
    may be confidential and/or privileged.
    If you are not the intended recipient, be aware that any disclosure, copying,
    distribution or use of the contents of this message is strictly prohibited by
    law. If you receive this message in error, please contact the sender
    immediately and delete the message and any attachments.
  • Garth Patil at Oct 16, 2007 at 6:04 pm
    http://lucene.apache.org/hadoop/hdfs_design.html
    It's not the FAQ, but this document should provide a pretty good
    description of the (intended) features of HDFS. It does not provide a
    complete contrast against other distributed file systems, but it gives
    you an idea of what it was designed for, and how and why the interface
    differs from normal POSIX filesystems. Based on the nature of your
    application, this should help you make a decision on whether HDFS is
    right for you.
    http://labs.google.com/papers/gfs.html
    The Google GFS paper also provides a good idea of how this type of
    filesystem differs from what you are used to.
    Best,
    Garth
    On 10/16/07, Ted Dunning wrote:


    Apologies off-list. That wasn't intended to be rude.

    On 10/16/07 10:46 AM, "TREVORSTEWART@UP.COM" wrote:

    Well then...color me humbled Mr. Dunning.

    I apologize for monopolizing your quite obviously precious time.

    BTW...I don't believe these questions are answered in the FAQ.

    Thank you for making the open source experience SO enjoyable.





    Ted Dunning
    <tdunning@veoh.co
    m> To
    <hadoop-user@lucene.apache.org>
    10/16/2007 12:32 cc
    PM
    Subject
    Re: HDFS vs. CIFS
    Please respond to
    hadoop-user@lucen
    e.apache.org








    First, it is PETAbytes, not petRabytes.

    Secondly, if you are committed to using NetApps or DMX3, then you really
    don't need (or want HDFS).

    Thirdly, if you are committed to using a distributed file store like HDFS
    (or MogileFS or KFS), then you don't need NetApps. Distributed file
    systems
    were designed exactly to eliminate the need for highly engineered storage
    systems by allowing the use of entire redundant computers rather than
    cleverly interconnected disks.

    So you really have two classes of designs:

    A) traditional big iron

    B) trendy, but not entirely ready for prime time distributed file stores
    like HDFS

    The first option will probably work and will cost about 2x more (based on
    my
    experience, your mileage will vary). The second option will require more
    hand-holding and won't come with a support contract, but you would be able
    to do some things with it that are impossible in a traditional sense.


    My guess is that if you are still asking basic questions like this that are
    answered in the FAQ, then you will be better off paying NetApp for
    engineering time than building this system on your own.

    On 10/16/07 8:52 AM, "TREVORSTEWART@UP.COM" wrote:

    Hmmm...OK...

    Let me explain my requirements here and see if you all can tell me if
    Hadoop provides the functionality I need.

    I'm building a highly perfomant, highly available (no less than 4 9's), raw
    storage subsystem. It will be write once for the initial dataset (binary
    data) but will have the ability to maintain metadata associated to the
    binary data. The metadata will be "queryiable" and therefore indexed
    (want to use Lucene for this purpose). It must have the ability to store
    petrabytes of data. We will use either NetApps or DMX3 storage media.

    Please discuss...






    "Joydeep Sen
    Sarma"
    <jssarma@facebook To
    .com> <hadoop-user@lucene.apache.org> cc
    10/15/2007 05:20
    PM Subject
    RE: HDFS vs. CIFS

    Please respond to
    hadoop-user@lucen
    e.apache.org






    Not a valid comparison. CIFS is a remote file access protocol only. HDFS
    is a file system (that comes bundled with a remote file access
    protocol).

    It may be possible to build a CIFS gateway for HDFS.

    One interesting point of comparison at the protocol level is the level
    of parallelism. Compared to HDFS protocol - CIFS exposes less
    parallelism. DFS/CIFS has the concept of junction points that allows
    directories from different storage servers to be stitched into one
    namespace. There are commercial products that make this easy. However -
    this allows parallelism at directory level only - whereas HDFS protocol
    allows a single file to be distributed across different servers.

    (And as was pointed out - CIFS supports many other file system
    operations - ACLs, oplocks and what not that HDFS doesn't).

    -----Original Message-----
    From: TREVORSTEWART@UP.COM
    Sent: Monday, October 15, 2007 12:24 PM
    To: hadoop-user@lucene.apache.org
    Subject: HDFS vs. CIFS


    I would like someone to compare and contrast CIFS and HDFS? Or...if
    that
    is not a valid comparison...please explain to me why it's not a valid
    comparison.

    Thanks,
    Trevor

    .
    This message and any attachments contain information from Union Pacific
    which may be confidential and/or privileged.
    If you are not the intended recipient, be aware that any disclosure,
    copying, distribution or use of the contents of this message is strictly
    prohibited by law. If you receive this message in error, please contact
    the sender immediately and delete the message and any attachments.




    .
    This message and any attachments contain information from Union Pacific which
    may be confidential and/or privileged.
    If you are not the intended recipient, be aware that any disclosure, copying,
    distribution or use of the contents of this message is strictly
    prohibited by
    law. If you receive this message in error, please contact the sender
    immediately and delete the message and any attachments.



    .
    This message and any attachments contain information from Union Pacific which
    may be confidential and/or privileged.
    If you are not the intended recipient, be aware that any disclosure, copying,
    distribution or use of the contents of this message is strictly prohibited by
    law. If you receive this message in error, please contact the sender
    immediately and delete the message and any attachments.
  • Trevorstewart at Oct 17, 2007 at 1:19 pm
    Fair enough...

    I appreciate your reply and apologize for my misinterpretation of your
    intent. Someone else pointed me to the architecture documentation which I
    shall peruse inorder to gain a better understanding of this product.

    The primary feature that attracted me to Hadoop was the ability to maintain
    a single namespace across resources. This will become increasingly
    important as we add logical volumes to our storage array, whether they be
    NetApps, DMX3, or commodity hardware (servers). I have, up until I came
    across hadoop, been focusing primarily on CIFS and want to further
    investigate other distributed file systems in order to either rule them out
    or to further realize their capabilities and how they may apply to the
    problem at hand.

    Thank you all for your replies.

    Trevor Stewart
    Union Pacific Railroad





    Ted Dunning
    <tdunning@veoh.co
    m> To
    <hadoop-user@lucene.apache.org>
    10/16/2007 12:53 cc
    PM
    Subject
    Re: HDFS vs. CIFS
    Please respond to
    hadoop-user@lucen
    e.apache.org









    Apologies off-list. That wasn't intended to be rude.

    On 10/16/07 10:46 AM, "TREVORSTEWART@UP.COM" wrote:

    Well then...color me humbled Mr. Dunning.

    I apologize for monopolizing your quite obviously precious time.

    BTW...I don't believe these questions are answered in the FAQ.

    Thank you for making the open source experience SO enjoyable.





    Ted Dunning
    <tdunning@veoh.co
    m> To
    <hadoop-user@lucene.apache.org>
    10/16/2007 12:32 cc
    PM Subject
    Re: HDFS vs. CIFS
    Please respond to
    hadoop-user@lucen
    e.apache.org








    First, it is PETAbytes, not petRabytes.

    Secondly, if you are committed to using NetApps or DMX3, then you really
    don't need (or want HDFS).

    Thirdly, if you are committed to using a distributed file store like HDFS
    (or MogileFS or KFS), then you don't need NetApps. Distributed file
    systems
    were designed exactly to eliminate the need for highly engineered storage
    systems by allowing the use of entire redundant computers rather than
    cleverly interconnected disks.

    So you really have two classes of designs:

    A) traditional big iron

    B) trendy, but not entirely ready for prime time distributed file stores
    like HDFS

    The first option will probably work and will cost about 2x more (based on
    my
    experience, your mileage will vary). The second option will require more
    hand-holding and won't come with a support contract, but you would be able
    to do some things with it that are impossible in a traditional sense.


    My guess is that if you are still asking basic questions like this that are
    answered in the FAQ, then you will be better off paying NetApp for
    engineering time than building this system on your own.

    On 10/16/07 8:52 AM, "TREVORSTEWART@UP.COM" wrote:

    Hmmm...OK...

    Let me explain my requirements here and see if you all can tell me if
    Hadoop provides the functionality I need.

    I'm building a highly perfomant, highly available (no less than 4 9's), raw
    storage subsystem. It will be write once for the initial dataset
    (binary
    data) but will have the ability to maintain metadata associated to the
    binary data. The metadata will be "queryiable" and therefore indexed
    (want to use Lucene for this purpose). It must have the ability to
    store
    petrabytes of data. We will use either NetApps or DMX3 storage media.

    Please discuss...






    "Joydeep Sen
    Sarma"
    <jssarma@facebook To
    .com> <hadoop-user@lucene.apache.org> cc
    10/15/2007 05:20
    PM Subject
    RE: HDFS vs. CIFS

    Please respond to
    hadoop-user@lucen
    e.apache.org






    Not a valid comparison. CIFS is a remote file access protocol only. HDFS
    is a file system (that comes bundled with a remote file access
    protocol).

    It may be possible to build a CIFS gateway for HDFS.

    One interesting point of comparison at the protocol level is the level
    of parallelism. Compared to HDFS protocol - CIFS exposes less
    parallelism. DFS/CIFS has the concept of junction points that allows
    directories from different storage servers to be stitched into one
    namespace. There are commercial products that make this easy. However -
    this allows parallelism at directory level only - whereas HDFS protocol
    allows a single file to be distributed across different servers.

    (And as was pointed out - CIFS supports many other file system
    operations - ACLs, oplocks and what not that HDFS doesn't).

    -----Original Message-----
    From: TREVORSTEWART@UP.COM
    Sent: Monday, October 15, 2007 12:24 PM
    To: hadoop-user@lucene.apache.org
    Subject: HDFS vs. CIFS


    I would like someone to compare and contrast CIFS and HDFS? Or...if
    that
    is not a valid comparison...please explain to me why it's not a valid
    comparison.

    Thanks,
    Trevor

    .
    This message and any attachments contain information from Union Pacific
    which may be confidential and/or privileged.
    If you are not the intended recipient, be aware that any disclosure,
    copying, distribution or use of the contents of this message is strictly
    prohibited by law. If you receive this message in error, please contact
    the sender immediately and delete the message and any attachments.




    .
    This message and any attachments contain information from Union Pacific which
    may be confidential and/or privileged.
    If you are not the intended recipient, be aware that any disclosure, copying,
    distribution or use of the contents of this message is strictly
    prohibited by
    law. If you receive this message in error, please contact the sender
    immediately and delete the message and any attachments.



    .
    This message and any attachments contain information from Union Pacific which
    may be confidential and/or privileged.
    If you are not the intended recipient, be aware that any disclosure, copying,
    distribution or use of the contents of this message is strictly
    prohibited by
    law. If you receive this message in error, please contact the sender
    immediately and delete the message and any attachments.



    . This message and any attachments contain information from Union Pacific which may be confidential and/or privileged.
    If you are not the intended recipient, be aware that any disclosure, copying, distribution or use of the contents of this message is strictly prohibited by law. If you receive this message in error, please contact the sender immediately and delete the message and any attachments.
  • Ted Dunning at Oct 17, 2007 at 4:33 pm
    I recommend you look at MogileFS as well.

    The intended usage is very different from hadoop. Hadoop is intended to
    store fewer (<20 million) larger (preferably >10MB) files. It provides
    strong capabilities for splitting large files across machines and hooks for
    disk local computation.

    Mogile is intended for large numbers of smaller files that are intended to
    be stored in toto in a conventional file system on a large collection of
    hosts. It provides no file splitting and no local computing hooks. There
    is a framework for running scripts on large collections of files.

    Neither of these exposes a normal file system, but there is a Fuse
    implementation for MogileFS and an early webDAV interface for HDFS. Both of
    these will be more mature over time.

    Whether either is useful for you is up to you.

    For that matter, you might look at KFS as well. I know nothing about it,
    but the authors seem to like it. :-)

    See http://blog.kosmix.com/2007/09/kosmos_filesystem_release.html
    On 10/17/07 6:19 AM, "TREVORSTEWART@UP.COM" wrote:

    Fair enough...

    I appreciate your reply and apologize for my misinterpretation of your
    intent. Someone else pointed me to the architecture documentation which I
    shall peruse inorder to gain a better understanding of this product.

    The primary feature that attracted me to Hadoop was the ability to maintain
    a single namespace across resources. This will become increasingly
    important as we add logical volumes to our storage array, whether they be
    NetApps, DMX3, or commodity hardware (servers). I have, up until I came
    across hadoop, been focusing primarily on CIFS and want to further
    investigate other distributed file systems in order to either rule them out
    or to further realize their capabilities and how they may apply to the
    problem at hand.

    Thank you all for your replies.

    Trevor Stewart
    Union Pacific Railroad





    Ted Dunning
    <tdunning@veoh.co
    m> To
    <hadoop-user@lucene.apache.org>
    10/16/2007 12:53 cc
    PM
    Subject
    Re: HDFS vs. CIFS
    Please respond to
    hadoop-user@lucen
    e.apache.org









    Apologies off-list. That wasn't intended to be rude.

    On 10/16/07 10:46 AM, "TREVORSTEWART@UP.COM" wrote:

    Well then...color me humbled Mr. Dunning.

    I apologize for monopolizing your quite obviously precious time.

    BTW...I don't believe these questions are answered in the FAQ.

    Thank you for making the open source experience SO enjoyable.





    Ted Dunning
    <tdunning@veoh.co
    m> To
    <hadoop-user@lucene.apache.org>
    10/16/2007 12:32 cc
    PM Subject
    Re: HDFS vs. CIFS
    Please respond to
    hadoop-user@lucen
    e.apache.org








    First, it is PETAbytes, not petRabytes.

    Secondly, if you are committed to using NetApps or DMX3, then you really
    don't need (or want HDFS).

    Thirdly, if you are committed to using a distributed file store like HDFS
    (or MogileFS or KFS), then you don't need NetApps. Distributed file
    systems
    were designed exactly to eliminate the need for highly engineered storage
    systems by allowing the use of entire redundant computers rather than
    cleverly interconnected disks.

    So you really have two classes of designs:

    A) traditional big iron

    B) trendy, but not entirely ready for prime time distributed file stores
    like HDFS

    The first option will probably work and will cost about 2x more (based on
    my
    experience, your mileage will vary). The second option will require more
    hand-holding and won't come with a support contract, but you would be able
    to do some things with it that are impossible in a traditional sense.


    My guess is that if you are still asking basic questions like this that are
    answered in the FAQ, then you will be better off paying NetApp for
    engineering time than building this system on your own.

    On 10/16/07 8:52 AM, "TREVORSTEWART@UP.COM" wrote:

    Hmmm...OK...

    Let me explain my requirements here and see if you all can tell me if
    Hadoop provides the functionality I need.

    I'm building a highly perfomant, highly available (no less than 4 9's), raw
    storage subsystem. It will be write once for the initial dataset
    (binary
    data) but will have the ability to maintain metadata associated to the
    binary data. The metadata will be "queryiable" and therefore indexed
    (want to use Lucene for this purpose). It must have the ability to
    store
    petrabytes of data. We will use either NetApps or DMX3 storage media.

    Please discuss...






    "Joydeep Sen
    Sarma"
    <jssarma@facebook To
    .com> <hadoop-user@lucene.apache.org> cc
    10/15/2007 05:20
    PM Subject
    RE: HDFS vs. CIFS

    Please respond to
    hadoop-user@lucen
    e.apache.org






    Not a valid comparison. CIFS is a remote file access protocol only. HDFS
    is a file system (that comes bundled with a remote file access
    protocol).

    It may be possible to build a CIFS gateway for HDFS.

    One interesting point of comparison at the protocol level is the level
    of parallelism. Compared to HDFS protocol - CIFS exposes less
    parallelism. DFS/CIFS has the concept of junction points that allows
    directories from different storage servers to be stitched into one
    namespace. There are commercial products that make this easy. However -
    this allows parallelism at directory level only - whereas HDFS protocol
    allows a single file to be distributed across different servers.

    (And as was pointed out - CIFS supports many other file system
    operations - ACLs, oplocks and what not that HDFS doesn't).

    -----Original Message-----
    From: TREVORSTEWART@UP.COM
    Sent: Monday, October 15, 2007 12:24 PM
    To: hadoop-user@lucene.apache.org
    Subject: HDFS vs. CIFS


    I would like someone to compare and contrast CIFS and HDFS? Or...if
    that
    is not a valid comparison...please explain to me why it's not a valid
    comparison.

    Thanks,
    Trevor

    .
    This message and any attachments contain information from Union Pacific
    which may be confidential and/or privileged.
    If you are not the intended recipient, be aware that any disclosure,
    copying, distribution or use of the contents of this message is strictly
    prohibited by law. If you receive this message in error, please contact
    the sender immediately and delete the message and any attachments.




    .
    This message and any attachments contain information from Union Pacific which
    may be confidential and/or privileged.
    If you are not the intended recipient, be aware that any disclosure, copying,
    distribution or use of the contents of this message is strictly
    prohibited by
    law. If you receive this message in error, please contact the sender
    immediately and delete the message and any attachments.



    .
    This message and any attachments contain information from Union Pacific which
    may be confidential and/or privileged.
    If you are not the intended recipient, be aware that any disclosure, copying,
    distribution or use of the contents of this message is strictly
    prohibited by
    law. If you receive this message in error, please contact the sender
    immediately and delete the message and any attachments.



    .
    This message and any attachments contain information from Union Pacific which
    may be confidential and/or privileged.
    If you are not the intended recipient, be aware that any disclosure, copying,
    distribution or use of the contents of this message is strictly prohibited by
    law. If you receive this message in error, please contact the sender
    immediately and delete the message and any attachments.
  • Jonathan Hendler at Oct 16, 2007 at 7:25 pm
    Open source is enjoyable - it's a strange world where people share what
    they think - openly. It's free speech and you won't always get that from
    closed source. It's often foreign for engineers who are working alone,
    in well funded startups, or in established, proven, but traditional
    companies.

    Email doesn't convey tone very well - I think question was addressed,
    and corrected you where you were possibly mistaken on a couple issues.
    That's GOOD here.

    I'm new to Hadoop, but not mailing lists - and this one's pretty good at
    responding to technical questions, and even helping provide solutions.




    TREVORSTEWART@UP.COM wrote:
    Well then...color me humbled Mr. Dunning.

    I apologize for monopolizing your quite obviously precious time.

    BTW...I don't believe these questions are answered in the FAQ.

    Thank you for making the open source experience SO enjoyable.


Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcommon-user @
categorieshadoop
postedOct 15, '07 at 7:24p
activeOct 17, '07 at 4:33p
posts11
users5
websitehadoop.apache.org...
irc#hadoop

People

Translate

site design / logo © 2022 Grokbase