|| at Jun 3, 2011 at 9:06 am
In my Pig script I'm currently loading multiple input files:
I've heard that zipping and packaging this data will improve performance.
So I would gzip all single files to
and then package them using cpio.
And finally instead of loading the multiple .xml files load my single .cpio file.
Can someone elaborate why that should improve the performance?