[
https://issues.apache.org/jira/browse/PIG-1544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12899526#action_12899526 ]
Thejas M Nair commented on PIG-1544:
------------------------------------
bq. We should not be using these bags for the cases like UDF for exactly the reason you are mentioning
The case I had in mind was not one where UDF is creating proactive-spill bags, but case where udf input takes bags and they happen to be of proactive-spilling type and the udf retains bags from previous rows.
Anyway, I have come up with a more realistic(?) use case where it is difficult to determine the number of proactive-spill bags that will be present at run time -
{code}
L = load 'f1' as ( c1 : int, b1 : bag{ } );
F1 = foreach L { d = distinct b1; generate c1, d; } -- InternalDistinctBag will be created here
G = group F by c1 using 'merge'; -- This group-by could [1] accumulate several of these InternalDistinctBag objects
F2 = foreach G generate ...
[1] - This does not happen because the query plan has a PORelationToExpressionProject after the result from PODistinct which copies the bag. But it looks like we can optimize and get rid of that bag in this case.
{code}
proactive-spill bags should share the memory alloted for it
-----------------------------------------------------------
Key: PIG-1544
URL:
https://issues.apache.org/jira/browse/PIG-1544Project: Pig
Issue Type: Bug
Reporter: Thejas M Nair
Initially proactive spill bags were designed for use in (co)group (InternalCacheBag) and they knew the total number of proactive bags that were present, and shared the memory limit specified using the property pig.cachedbag.memusage .
But the two proactive bag implementations were added later - InternalDistinctBag and InternalSortedBag are not aware of actual number of bags being used - their users always assume total-numbags = 3.
This needs to be fixed and all proactive-spill bags should share the memory-limit .
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.