|
Pradeep Kamath (JIRA) |
at Nov 6, 2008 at 5:49 pm
|
⇧ |
| |
[
https://issues.apache.org/jira/browse/PIG-484?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Pradeep Kamath updated PIG-484:
-------------------------------
Assignee: Pradeep Kamath
Status: Patch Available (was: Open)
Patch details:
- The idea is to stream small chunks in bags to the "Initial" version of the algebraic function in the combiner. So in cases where in the map, there is an explosion of values, the original bag would have been big and possibly caused costly spills. This will be avoided now since small chunks will be sent to the aggregate function's "initial" method.
The code checks for a combine plan and if it is present, it replaces the POPackage and POForEach in the combine plan with POJoinPackage which is a combination of the two customized for streaming small bags between the package and the foreach.
PERFORMANCE: streaming data to aggregate functions
--------------------------------------------------
Key: PIG-484
URL:
https://issues.apache.org/jira/browse/PIG-484Project: Pig
Issue Type: Improvement
Affects Versions: types_branch
Reporter: Olga Natkovich
Assignee: Pradeep Kamath
Fix For: types_branch
Attachments: PIG-484.patch
Currently, for queries like
A = load 'data';
B = group A by $0;
C = foreach A generate group, MIN(A.$1), MAX (A.$1)
The data will be put into the bag before being passed to aggregate functions. This is unnecessary and inefficient. In this case, data can be just streamed to the functions.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.