You have a couple of options:
(1) If you disable the multiquery support, you can take advantage of the
full Hadoop globing capabilities which is likely to be sufficient.
(2) If you need to use multiquery, only single-pattern globs are
supported so you would not be able to specify multiple unrelated
directories. If that is not sufficient, you will need to use union but
it might not significantly impact your performance. I would try that
first before trying a custom solution.
From: Pankil Doshi
Sent: Wednesday, August 26, 2009 10:22 AM
Subject: Question Regarding Multiple Loads
I am trying to write Pig scripts for my project. Problem I ma facing is
want to load different files to same variable .Can it be possible to do
without modifying the Loader. I read about Hadoop globbing . Does
have solution to these.
I know I can load all files of a given directory to single variable.
But is it possible to load specific files from that directory? Or
files from different directories to same load variable?
I also know about UNION strategy but that increase one map-reduce job
want to avoid that.
Any kind of suggestions are welcomed.