You are correct that currently we only allow a single glob in the load
statement. It would not be hard to extend it to multiple globs. I have
created a JIRA for it: https://issues.apache.org/jira/browse/PIG-252;
maybe somebody will be interested to contribute a patch.
Olga
-----Original Message-----
From: Tom White
Sent: Wednesday, June 04, 2008 7:56 AM
To: pig-user@incubator.apache.org
Subject: Specifying multiple input paths
I;m having a problem loading data from multiple paths in Pig.
What I'm trying to do is to load data from a range of dates,
so I would like to specify an input of two globbed paths:
x = LOAD '2008/05/{26,27,28,29,30,31},2008/06/{1,2}'
Pig doesn't seem to like this though as it's trying to
interpret it as a single path. The best I can do it to use UNION:
x1 = LOAD '2008/05/{26,27,28,29,30,31}'
x2 = LOAD '2008/06/{1,2}'
x = UNION x1, x2
The downside to this is that I want to parameterize my paths,
and having separate script for each number of paths in the
input is cumbersome.
Is there a better way of doing this? Are there any plans to
support multiple paths, and/or PathFilters?
Thanks,
Tom
From: Tom White
Sent: Wednesday, June 04, 2008 7:56 AM
To: pig-user@incubator.apache.org
Subject: Specifying multiple input paths
I;m having a problem loading data from multiple paths in Pig.
What I'm trying to do is to load data from a range of dates,
so I would like to specify an input of two globbed paths:
x = LOAD '2008/05/{26,27,28,29,30,31},2008/06/{1,2}'
Pig doesn't seem to like this though as it's trying to
interpret it as a single path. The best I can do it to use UNION:
x1 = LOAD '2008/05/{26,27,28,29,30,31}'
x2 = LOAD '2008/06/{1,2}'
x = UNION x1, x2
The downside to this is that I want to parameterize my paths,
and having separate script for each number of paths in the
input is cumbersome.
Is there a better way of doing this? Are there any plans to
support multiple paths, and/or PathFilters?
Thanks,
Tom