FAQ
hi,

I'm a newbee to hadoop. I want to employ hadoop to analyze 10 million pieces
data (10G totally). The data maybe residents in a rational DB, or a series
of XML files(thousands of pieces per file). I have some questions.

1. how to guarantee mappers' exclusive access to the DB.
2. how to split XML files. To override MultiFileInputFormat?
3. how to transfer a bunch of resources (10M) to slaves.
4. Reduce is not necessary, is it suitable for hadoop?

I can't find a similar case in the build-in examples of hadoop release.
Sorry to interrupt.

Chao Cai

Search Discussions

  • 蔡超 at Nov 3, 2010 at 7:16 am
    hi

    I'm not sure whether anybody here me. Is there any example
    On Tue, Nov 2, 2010 at 10:29 AM, 蔡超 wrote:

    hi,

    I'm a newbee to hadoop. I want to employ hadoop to process 10 million
    pieces data (10G totally). Each piece of data will be handled separately.
    The data maybe residents in a rational DB, or a series of XML
    files(thousands of pieces per file). I have some questions.

    1. how to guarantee mappers' exclusive access to the DB.
    2. how to split XML files. To override MultiFileInputFormat?
    3. how to transfer a bunch of resources (10M) to slaves.
    4. Reduce is not necessary, is it suitable for hadoop?

    I can't find a similar case in the build-in examples of hadoop release.
    Sorry to interrupt.

    Chao Cai



  • Harsh J at Nov 3, 2010 at 4:38 pm
    Hello!
    On Tue, Nov 2, 2010 at 7:59 AM, 蔡超 wrote:
    hi,

    I'm a newbee to hadoop. I want to employ hadoop to analyze 10 million pieces
    data (10G totally).  The data maybe residents in a rational DB, or a series
    of XML files(thousands of pieces per file).  I have some questions.

    1. how to guarantee mappers' exclusive access to the DB.
    I'd suggest moving files to HDFS with HIHO/Sqoop projects and such. Or
    use DBInputFormat, as detailed at this Cloudera weblog post:
    http://www.cloudera.com/blog/2009/03/database-access-with-hadoop/
    2. how to split XML files. To override MultiFileInputFormat?
    Several people have discussed on how to process XML documents with
    Hadoop already. A simple web search would yield you several results on
    the same, apart from the archives of the mapreduce-user mailing list.
    3. how to transfer a bunch of resources (10M) to slaves.
    Use DistributedCache:
    http://hadoop.apache.org/common/docs/r0.20.2/api/org/apache/hadoop/filecache/DistributedCache.html
    4. Reduce is not necessary, is it suitable for hadoop?
    Yes, you can use plain Mapper-only jobs, there is no problem. You just
    need to set number of reducers to 0 in each job.
    I can't find a similar case in the build-in examples of hadoop release.
    Look around the web for projects that _use_ Hadoop, like Mahout for
    example, they have many more examples.
    Sorry to interrupt.

    Chao Cai
    HTH

    --
    Harsh J
    www.harshj.com

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcommon-user @
categorieshadoop
postedNov 2, '10 at 2:29a
activeNov 3, '10 at 4:38p
posts3
users2
websitehadoop.apache.org...
irc#hadoop

2 users in discussion

蔡超: 2 posts Harsh J: 1 post

People

Translate

site design / logo © 2022 Grokbase