Grokbase Groups Pig user June 2011
FAQ
Hi,

In my Pig script I'm currently loading multiple input files:

input_1.xml
input_2.xml
input_3.xml
input_4.xml
...

I've heard that zipping and packaging this data will improve performance.
So I would gzip all single files to

input_1.xml.gz
input_2.xml.gz
input_3.xml.gz
input_4.xml.gz

and then package them using cpio.

And finally instead of loading the multiple .xml files load my single .cpio file.

Can someone elaborate why that should improve the performance?

Best,
Will

Search Discussions

Discussion Posts

Previous

Related Discussions

Discussion Navigation
viewthread | post
posts ‹ prev | 2 of 2 | next ›
Discussion Overview
groupuser @
categoriespig, hadoop
postedJun 3, '11 at 8:59a
activeJun 3, '11 at 9:06a
posts2
users1
websitepig.apache.org

1 user in discussion

Lai Will: 2 posts

People

Translate

site design / logo © 2021 Grokbase