Grokbase Groups Pig user August 2009
FAQ
Pankil,

You have a couple of options:

(1) If you disable the multiquery support, you can take advantage of the
full Hadoop globing capabilities which is likely to be sufficient.
(2) If you need to use multiquery, only single-pattern globs are
supported so you would not be able to specify multiple unrelated
directories. If that is not sufficient, you will need to use union but
it might not significantly impact your performance. I would try that
first before trying a custom solution.

Olga

-----Original Message-----
From: Pankil Doshi
Sent: Wednesday, August 26, 2009 10:22 AM
To: pig-user@hadoop.apache.org
Subject: Question Regarding Multiple Loads

Hello Everyone,

I am trying to write Pig scripts for my project. Problem I ma facing is
I
want to load different files to same variable .Can it be possible to do
without modifying the Loader. I read about Hadoop globbing . Does
anyone
have solution to these.

I know I can load all files of a given directory to single variable.
But is it possible to load specific files from that directory? Or
specific
files from different directories to same load variable?

I also know about UNION strategy but that increase one map-reduce job
and I
want to avoid that.

Any kind of suggestions are welcomed.

Pankil

Search Discussions

Discussion Posts

Previous

Follow ups

Related Discussions

Discussion Navigation
viewthread | post
posts ‹ prev | 2 of 7 | next ›
Discussion Overview
groupuser @
categoriespig, hadoop
postedAug 26, '09 at 5:22p
activeSep 3, '09 at 2:03p
posts7
users5
websitepig.apache.org

People

Translate

site design / logo © 2021 Grokbase