Grokbase Groups Pig user June 2011
FAQ
Hi,

In my Pig script I'm currently loading multiple input files:

input_1.xml
input_1.xml

Search Discussions

  • Lai Will at Jun 3, 2011 at 9:06 am
    Hi,

    In my Pig script I'm currently loading multiple input files:

    input_1.xml
    input_2.xml
    input_3.xml
    input_4.xml
    ...

    I've heard that zipping and packaging this data will improve performance.
    So I would gzip all single files to

    input_1.xml.gz
    input_2.xml.gz
    input_3.xml.gz
    input_4.xml.gz

    and then package them using cpio.

    And finally instead of loading the multiple .xml files load my single .cpio file.

    Can someone elaborate why that should improve the performance?

    Best,
    Will

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupuser @
categoriespig, hadoop
postedJun 3, '11 at 8:59a
activeJun 3, '11 at 9:06a
posts2
users1
websitepig.apache.org

1 user in discussion

Lai Will: 2 posts

People

Translate

site design / logo © 2021 Grokbase