John
On Thu, Oct 28, 2010 at 11:49 AM, Alan Gates wrote:
On Oct 28, 2010, at 8:36 AM, John Hui wrote:
I look into the return data bag as an option. The problem is the Loader
return:
return
TupleFactory.getInstance().newTuple(BagFactory.getInstance().newDefaultBag(tuples));
This will give you a tuple, which has a single field, which is a bag.
Within that bag will be all your tuples. If your next Pig Latin statement
is
B = foreach A generate flatten($0);
then B will contain each of your records as individual records.
Alan.
On Oct 28, 2010, at 8:36 AM, John Hui wrote:
I look into the return data bag as an option. The problem is the Loader
interface require me to return a Tuple object.
public Tuple getNext() throws IOException {
but the DataBag interface is not a derive class of Tuple so this means I
will need to change the internal code for pig for my loader to return a
bag
of tuples. Right?
No. If at the end of your getNext() you have a List<Tuple> tuples, thenpublic Tuple getNext() throws IOException {
but the DataBag interface is not a derive class of Tuple so this means I
will need to change the internal code for pig for my loader to return a
bag
of tuples. Right?
return:
return
TupleFactory.getInstance().newTuple(BagFactory.getInstance().newDefaultBag(tuples));
This will give you a tuple, which has a single field, which is a bag.
Within that bag will be all your tuples. If your next Pig Latin statement
is
B = foreach A generate flatten($0);
then B will contain each of your records as individual records.
Alan.
John
On Wed, Oct 27, 2010 at 6:00 PM, John Hui wrote:
Hi Pig Users,
On Wed, Oct 27, 2010 at 6:00 PM, John Hui wrote:
Hi Pig Users,
I am currently writing a UDF loader. In one of my use case, one line in
the input stream results in multiple tuples. Has anyone encounter or
solve
this issue on their end.
The current structure of the code getNext method only return tuple but I
want it to return a List<tuple>. Let me know if there's use case out
there
like mine, I am coding it up to return List<tuple> which is more more
flexible than return only one tuple.
Thanks,
John
the input stream results in multiple tuples. Has anyone encounter or
solve
this issue on their end.
The current structure of the code getNext method only return tuple but I
want it to return a List<tuple>. Let me know if there's use case out
there
like mine, I am coding it up to return List<tuple> which is more more
flexible than return only one tuple.
Thanks,
John