|
Arun C Murthy (JIRA) |
at Jul 24, 2008 at 8:46 am
|
⇧ |
| |
[
https://issues.apache.org/jira/browse/HADOOP-3131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Arun C Murthy updated HADOOP-3131:
----------------------------------
Status: Open (was: Patch Available)
Matei, sorry I missed this piece the first time around:
{noformat}
+ for (Segment<K, V> s: segmentsToMerge) {
+ totalBytesProcessed += s.getPosition(); // Count initial bytes read
+ }
+ if (totalBytes != 0) {
+ mergeProgress.set(totalBytesProcessed * progPerByte);
+ } else {
+ mergeProgress.set(1.0f);
+ }
{noformat}
At best it reports progress slightly early (i.e. before the final merge begins) and at worst it provides completely wrong progress value during the merging of intermediate map-outputs since all output for all reduces is in a single file. Hence {{s.getPosition}} is hopelessly off as a measure of merge progress... I vote we just do away with that block.
enabling BLOCK compression for map outputs breaks the reduce progress counters
------------------------------------------------------------------------------
Key: HADOOP-3131
URL:
https://issues.apache.org/jira/browse/HADOOP-3131Project: Hadoop Core
Issue Type: Bug
Affects Versions: 0.17.1, 0.17.0, 0.17.2, 0.18.0, 0.19.0
Reporter: Colin Evans
Assignee: Matei Zaharia
Fix For: 0.19.0
Attachments: HADOOP-3131-v2.patch, HADOOP-3131-v3.patch, HADOOP-3131-v4.patch, HADOOP-3131-v5.patch, merge-progress-trunk.patch, merge-progress.patch, Picture 1.png
Enabling map output compression and setting the compression type to BLOCK causes the progress counters during the reduce to go crazy and report progress counts over 100%.
This is problematic for speculative execution because it thinks the tasks are doing fine.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.