FAQ
Hi Folks,

I am having hundreds of small xml files coming each hour. The size varies
from 5 Mb to 15 Mb. As Hadoop did not work well with small files so i want
to merge these small files. So what is the best option to merge these xml
files?



--
Regards
Shuja-ur-Rehman Baig
<http://pk.linkedin.com/in/shujamughal>

Search Discussions

  • Madhu phatak at Feb 3, 2011 at 10:45 am
    Hi
    You can write an InputFormat which create input splits from multiple files .
    It will solve your problem.
    On Wed, Feb 2, 2011 at 4:04 PM, Shuja Rehman wrote:

    Hi Folks,

    I am having hundreds of small xml files coming each hour. The size varies
    from 5 Mb to 15 Mb. As Hadoop did not work well with small files so i want
    to merge these small files. So what is the best option to merge these xml
    files?



    --
    Regards
    Shuja-ur-Rehman Baig
    <http://pk.linkedin.com/in/shujamughal>
  • Kai Voigt at Feb 3, 2011 at 10:47 am
    Did you look into Hadoop Archives?

    http://hadoop.apache.org/mapreduce/docs/r0.21.0/hadoop_archives.html

    Kai

    Am 03.02.2011 um 11:44 schrieb madhu phatak:
    Hi
    You can write an InputFormat which create input splits from multiple files .
    It will solve your problem.
    On Wed, Feb 2, 2011 at 4:04 PM, Shuja Rehman wrote:

    Hi Folks,

    I am having hundreds of small xml files coming each hour. The size varies
    from 5 Mb to 15 Mb. As Hadoop did not work well with small files so i want
    to merge these small files. So what is the best option to merge these xml
    files?



    --
    Regards
    Shuja-ur-Rehman Baig
    <http://pk.linkedin.com/in/shujamughal>
    --
    Kai Voigt
    k@123.org

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcommon-user @
categorieshadoop
postedFeb 2, '11 at 10:35a
activeFeb 3, '11 at 10:47a
posts3
users3
websitehadoop.apache.org...
irc#hadoop

People

Translate

site design / logo © 2022 Grokbase