I am running a pig query on around 500 GB input data.
The current block size is 128 MB and split size is the default 128 MB.
I have also specified 16 reducers and around 3800 mappers are running.
Now I observe that shuffling is taking a long time to complete execution,
approximately 25 mins per job.
Can anyone suggest how I can bring down the shuffling time? Is there any
property that I can tweak to improve performance?
Thanks & Regards,