FAQ
Hi.

Is it possible to use Hadoop for real-time app, in video processing field?

Regards.

Search Discussions

  • Edward J. Yoon at Sep 25, 2008 at 3:00 am
    What kind of the real-time app?
    On Wed, Sep 24, 2008 at 4:50 AM, Stas Oskin wrote:
    Hi.

    Is it possible to use Hadoop for real-time app, in video processing field?

    Regards.
    --
    Best regards, Edward J. Yoon
    edwardyoon@apache.org
    http://blog.udanax.org
  • Stas Oskin at Oct 14, 2008 at 8:30 pm
    Hi.

    Video storage, processing and streaming.

    Regards.

    2008/9/25 Edward J. Yoon <edwardyoon@apache.org>
    What kind of the real-time app?
    On Wed, Sep 24, 2008 at 4:50 AM, Stas Oskin wrote:
    Hi.

    Is it possible to use Hadoop for real-time app, in video processing field?
    Regards.
    --
    Best regards, Edward J. Yoon
    edwardyoon@apache.org
    http://blog.udanax.org
  • Steve Gao at Oct 14, 2008 at 10:41 pm
    Does anybody know if there are books about hadoop or pig? The wiki and manual are kind of ad-hoc and hard to comprehend, for example "I want to know how to apply patchs to my Hadoop, but can't find how to do it" that kind of things.

    Would anybody help? Thanks.
  • Amit k. Saha at Oct 15, 2008 at 4:36 am

    On Wed, Oct 15, 2008 at 4:10 AM, Steve Gao wrote:
    Does anybody know if there are books about hadoop or pig? The wiki and manual are kind of ad-hoc and hard to comprehend, for example "I want to know how to apply patchs to my Hadoop, but can't find how to do it" that kind of things.

    Would anybody help? Thanks.
    http://oreilly.com/catalog/9780596521998/

    HTH,
    Amit
  • Paul at Oct 15, 2008 at 6:25 am
    As long as the new node is in the slaves file on the master, just do a
    start-all.sh and it will attempt to start everything. Nodes that are
    already running will keep running and new nodes will be started.

    Consider doing a rebalance after adding a new node for better
    distribution.



    -paul
    On Oct 15, 2008, at 1:55 AM, "Amit k. Saha" wrote:
    On Wed, Oct 15, 2008 at 9:09 AM, David Wei wrote:
    It seems that we need to restart the whole hadoop system in order
    to add new
    nodes inside the cluster. Any solution for us that no need for the
    rebooting?
    From what I know so far, you have to start the HDFS dameon (which
    reads the 'slaves' file) to 'let it know' which are the data nodes. So
    everytime you add a new DataNode, I believe you will have to restarted
    the daemon, which is like re-initiating the NameNode.

    Hope I am not very wrong :-)

    Best,
    Amit

    --
    Amit Kumar Saha
    http://blogs.sun.com/amitsaha/
    http://amitsaha.in.googlepages.com/
    Skype: amitkumarsaha
  • Steve Loughran at Oct 15, 2008 at 10:02 am

    Amit k. Saha wrote:
    On Wed, Oct 15, 2008 at 9:09 AM, David Wei wrote:
    It seems that we need to restart the whole hadoop system in order to add new
    nodes inside the cluster. Any solution for us that no need for the
    rebooting?
    From what I know so far, you have to start the HDFS dameon (which
    reads the 'slaves' file) to 'let it know' which are the data nodes. So
    everytime you add a new DataNode, I believe you will have to restarted
    the daemon, which is like re-initiating the NameNode.
    You don't need a slaves file; you can connect to a namenode without it.
    So: no need to restart daemons. What you should do is decommission
    datanodes, to shut them down cleanly and make sure all data is copied
    off them, when taking them away deliberately. If you just kill it, the
    namenode will notice, but some data may be underreplicated.

    -steve
  • Ted Dunning at Oct 20, 2008 at 1:20 pm
    Hadoop may not be quite what you want for this.

    You could definitely use Hadopo for storage and streaming. You can also do
    various kinds of processing on hadoop.

    But because Hadoop is primarily intended for batch style operations, there
    is a bit of an assumption that some administrative tasks will take down the
    cluster. That may be a problem (video serving tends to have a web audience
    that isn't very tolerant of downtime).

    At Veoh, we used a simpler, but simpler system for serving videos that was
    originally based on Mogile. The basic idea is that there is a database that
    contains name to URL mappings. The URL's point to storage boxes that have a
    bunch of disks that are served out to the net via LightHttpd. A management
    machine runs occasionally to make sure that files are replicated according
    to policy. The database is made redundant via conventional mechanisms.
    Requests for files can be proxied a farm of front end machines that query
    the database for locations or you can use redirects directly to the
    content. How you do it depends on network topology and your sensitivity
    about divulging internal details. Redirects can give higher peak read speed
    since you are going direct. Proxying avoids a network round trip for the
    redirect.

    At Veoh, this system fed the content delivery networks as a caching layer
    which meant that the traffic was essentially uniform random access. This
    system handled a huge number of files (10^9 or so) very easily and has
    essentially never had customer visible downtime. Extension with new files
    systems is trivial (just tell the manager box and it starts using them).

    This arrangement lacks most of the things that make Hadoop really good for
    what it does. But, in return, it is incredibly simple. It isn't very
    suitable for map-reduce or other high bandwidth processing tasks. It
    doesn't allow computation to go to the data. It doesn't allow large files
    to be read in parallel from many machines. On the other hand, it handles
    way more files than Hadoop does and it handles gobs of tiny files pretty
    well.

    Video is also kind of a write-once medium in many cases and video files
    aren't real splittable for map-reduce purposes. That might mean that you
    could get away with a mogile-ish system.
    On Tue, Oct 14, 2008 at 1:29 PM, Stas Oskin wrote:

    Hi.

    Video storage, processing and streaming.

    Regards.

    2008/9/25 Edward J. Yoon <edwardyoon@apache.org>
    What kind of the real-time app?
    On Wed, Sep 24, 2008 at 4:50 AM, Stas Oskin wrote:
    Hi.

    Is it possible to use Hadoop for real-time app, in video processing field?
    Regards.
    --
    Best regards, Edward J. Yoon
    edwardyoon@apache.org
    http://blog.udanax.org


    --
    ted
  • Stas Oskin at Oct 20, 2008 at 7:04 pm
    Hi Ted.

    Thanks for sharing some of inner workings of Veoh, which btw I'm a frequent
    user of (or at least when time permits :) ).

    I indeed recall reading somewhere that Veoh used a heavily modified version
    of MogileFS, but have switched since as it wasn't ready enough for Veoh
    needs.

    If not Hadoop, are there any other available solutions that can assist in
    distributing the processing of real-time video data? Or the old way of
    separate application severs is the only way to go?

    Regards.



    2008/10/20 Ted Dunning <ted.dunning@gmail.com>
    Hadoop may not be quite what you want for this.

    You could definitely use Hadopo for storage and streaming. You can also do
    various kinds of processing on hadoop.

    But because Hadoop is primarily intended for batch style operations, there
    is a bit of an assumption that some administrative tasks will take down the
    cluster. That may be a problem (video serving tends to have a web audience
    that isn't very tolerant of downtime).

    At Veoh, we used a simpler, but simpler system for serving videos that was
    originally based on Mogile. The basic idea is that there is a database
    that
    contains name to URL mappings. The URL's point to storage boxes that have
    a
    bunch of disks that are served out to the net via LightHttpd. A management
    machine runs occasionally to make sure that files are replicated according
    to policy. The database is made redundant via conventional mechanisms.
    Requests for files can be proxied a farm of front end machines that query
    the database for locations or you can use redirects directly to the
    content. How you do it depends on network topology and your sensitivity
    about divulging internal details. Redirects can give higher peak read
    speed
    since you are going direct. Proxying avoids a network round trip for the
    redirect.

    At Veoh, this system fed the content delivery networks as a caching layer
    which meant that the traffic was essentially uniform random access. This
    system handled a huge number of files (10^9 or so) very easily and has
    essentially never had customer visible downtime. Extension with new files
    systems is trivial (just tell the manager box and it starts using them).

    This arrangement lacks most of the things that make Hadoop really good for
    what it does. But, in return, it is incredibly simple. It isn't very
    suitable for map-reduce or other high bandwidth processing tasks. It
    doesn't allow computation to go to the data. It doesn't allow large files
    to be read in parallel from many machines. On the other hand, it handles
    way more files than Hadoop does and it handles gobs of tiny files pretty
    well.

    Video is also kind of a write-once medium in many cases and video files
    aren't real splittable for map-reduce purposes. That might mean that you
    could get away with a mogile-ish system.
    On Tue, Oct 14, 2008 at 1:29 PM, Stas Oskin wrote:

    Hi.

    Video storage, processing and streaming.

    Regards.

    2008/9/25 Edward J. Yoon <edwardyoon@apache.org>
    What kind of the real-time app?

    On Wed, Sep 24, 2008 at 4:50 AM, Stas Oskin <stas.oskin@gmail.com>
    wrote:
    Hi.

    Is it possible to use Hadoop for real-time app, in video processing field?
    Regards.
    --
    Best regards, Edward J. Yoon
    edwardyoon@apache.org
    http://blog.udanax.org


    --
    ted

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcommon-user @
categorieshadoop
postedSep 23, '08 at 7:51p
activeOct 20, '08 at 7:04p
posts9
users7
websitehadoop.apache.org...
irc#hadoop

People

Translate

site design / logo © 2022 Grokbase