FAQ
Create fewer copies of buffer data during sort/spill
----------------------------------------------------

Key: HADOOP-2919
URL: https://issues.apache.org/jira/browse/HADOOP-2919
Project: Hadoop Core
Issue Type: Improvement
Components: mapred
Reporter: Chris Douglas
Attachments: 2919-0.patch

Currently, the sort/spill works as follows:

Let r be the number of partitions
For each call to collect(K,V) from map:
* If buffers do not exist, allocate a new DataOutputBuffer to collect K,V bytes, allocate r buffers for collecting K,V offsets
* Write K,V into buffer, noting offsets
* Register offsets with associated partition buffer, allocating/copying accounting buffers if nesc
* Calculate the total mem usage for buffer and all partition collectors by iterating over the collectors
* If total mem usage is greater than half of io.sort.mb, then start a new thread to spill, blocking if another spill is in progress

For each spill (assuming no combiner):
* Save references to our K,V byte buffer and accounting data, setting the former to null (will be recreated on the next call to collect(K,V))
* Open a SequenceFile.Writer for this partition
* Sort each partition separately (the current version of sort reuses, but still requires wrapping, indices in IntWritable objects)
* Build a RawKeyValueIterator of sorted data for the partition
* Deserialize each key and value and call SequenceFile::append(K,V) on the writer for this partition

There are a number of opportunities for reducing the number of copies, creations, and operations we perform in this stage, particularly since growing many of the buffers involved requires that we copy the existing data to the newly sized allocation.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Search Discussions

  • Chris Douglas (JIRA) at Mar 1, 2008 at 2:05 am
    [ https://issues.apache.org/jira/browse/HADOOP-2919?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Chris Douglas updated HADOOP-2919:
    ----------------------------------

    Attachment: 2919-0.patch

    This patch effects the following changes to improve our efficiency in this area. Instead of gradually growing our buffers, we use properties to determine the size of the K,V byte buffer and accounting data and allocate it up front. We maintain accounting information for the task as two arrays of ints (rather than separate arrays for each partition), mimicking the existing BufferSorter interface. The first stores offsets into the second, which maintains the k/v offsets and partition information for the keys. This permits us to swap offsets to effect the sort, as is presently implemented in BufferSorter, but without requiring us to wrap them in IntWritables.

    {noformat}
    kvoffset buffer kvindices buffer
    _____________ _________________
    offset k1,v1 | | partition k1,v1 |
    offset k1,v2 | | k1 offset |
    ... | v1 offset |
    offset kn,vn | | partition k2,v2 |
    k2 offset |
    v2 offset | ...
    partition kn,vn |
    kn offset |
    vn offset |
    {noformat}

    By default, the total size of the accounting space is 5% of io.sort.mb. We build on the work done in HADOOP-1965, but rather than using 50% of io.sort.mb before a spill, we set a "soft" limit that defaults to 80% of the number of records or 80% of the K,V buffer before starting a spill thread. Note that this limit does not require us to query each partition collector for its memory usage, but can be effected by examining our indices. Rather than permitting the spill thread to "own" references to the buffers, we maintain a set of indices into the offset and k,v byte buffers defining the area of each in which the spill buffer is permitted to work. According to the Java VM spec, we can assume that reading/writing array elements does not require a lock on the array.

    We maintain three indices for both the accounting and k,v buffers: start, end, and index. The area between start and end is available to the spill, while the area between end and index (in truth, a marker noting end of the last record written) contains "spillable" data yet to be written to disk. If the soft limit is reached- or if one attempts a write into the buffer that is too large to accommodate without a spill- then the task thread sets the end index to the last record marker and triggers a spill. While the spill is running, the area between the start and end indices is unavailable for writing from collect(K,V) and the task thread will block until the spill has completed if the index marker hits the start marker.

    {noformat}
    Buffer indices uring a spill:
    ___________ ___________ ___________
    ___________| |___________| |___________|
    ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^
    s e i v i s e v e i s v
    {noformat}

    It is worth mentioning that each key must be contiguous to be used with a RawComparator, but values can wrap around the end of the buffer. This requires us to note the "voided" space in the buffer that contains no data. When the spill completes, it sets the start marker to the end marker, making that space available for writing. Note that it must also reset the void marker to the buffer size if the spill wraps around the end of the buffer (the rightmost case in the preceding figure). The "voided" marker is owned by whichever thread needs to manipulate it, so we require no special locking for it.

    When we sort, we sort all spill data by partition instead of creating a separate collector for each partition. Further, we can use appendRaw (as was suggested in HADOOP-1609) to write our serialized data directly from the k,v buffer to our spill file writer instead of deserializing each prior to the write. Note that for record-compressed data (when not using a combiner), this permits us to store compressed values in our k,v buffer.

    The attached patch is a work in progress, and is known to suffer from the following deficiencies:
    * Very large keys and values (with a comparably small io.sort.mb) present a difficult problem for a statically allocated collection buffer. If a series of writes to an empty collection exceed the space allocated to the k,v byte buffer (e.g. a 100MB k,v byte buffer and a Writable that attempts 2 51MB write(byte[],int,int) calls), the current patch will loop forever. This will also happen for separate writes. The current patch only spills when the soft limit is reached.
    * Handling of compression is inelegantly implemented. Again, this is a work in progress and will be cleaned up.
    * The spill thread is created each time it is invoked, but it need not be.
    * The code managing the contiguous key property is not as efficient as it could be.
    * The implementation of QuickSort could be improved (re: Sedgewick) to handle the case where keys are equal to the pivot, probably a fairly common case.
    Create fewer copies of buffer data during sort/spill
    ----------------------------------------------------

    Key: HADOOP-2919
    URL: https://issues.apache.org/jira/browse/HADOOP-2919
    Project: Hadoop Core
    Issue Type: Improvement
    Components: mapred
    Reporter: Chris Douglas
    Attachments: 2919-0.patch


    Currently, the sort/spill works as follows:
    Let r be the number of partitions
    For each call to collect(K,V) from map:
    * If buffers do not exist, allocate a new DataOutputBuffer to collect K,V bytes, allocate r buffers for collecting K,V offsets
    * Write K,V into buffer, noting offsets
    * Register offsets with associated partition buffer, allocating/copying accounting buffers if nesc
    * Calculate the total mem usage for buffer and all partition collectors by iterating over the collectors
    * If total mem usage is greater than half of io.sort.mb, then start a new thread to spill, blocking if another spill is in progress
    For each spill (assuming no combiner):
    * Save references to our K,V byte buffer and accounting data, setting the former to null (will be recreated on the next call to collect(K,V))
    * Open a SequenceFile.Writer for this partition
    * Sort each partition separately (the current version of sort reuses, but still requires wrapping, indices in IntWritable objects)
    * Build a RawKeyValueIterator of sorted data for the partition
    * Deserialize each key and value and call SequenceFile::append(K,V) on the writer for this partition
    There are a number of opportunities for reducing the number of copies, creations, and operations we perform in this stage, particularly since growing many of the buffers involved requires that we copy the existing data to the newly sized allocation.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Chris Douglas (JIRA) at Mar 1, 2008 at 3:06 am
    [ https://issues.apache.org/jira/browse/HADOOP-2919?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Chris Douglas reassigned HADOOP-2919:
    -------------------------------------

    Assignee: Chris Douglas
    Create fewer copies of buffer data during sort/spill
    ----------------------------------------------------

    Key: HADOOP-2919
    URL: https://issues.apache.org/jira/browse/HADOOP-2919
    Project: Hadoop Core
    Issue Type: Improvement
    Components: mapred
    Reporter: Chris Douglas
    Assignee: Chris Douglas
    Attachments: 2919-0.patch


    Currently, the sort/spill works as follows:
    Let r be the number of partitions
    For each call to collect(K,V) from map:
    * If buffers do not exist, allocate a new DataOutputBuffer to collect K,V bytes, allocate r buffers for collecting K,V offsets
    * Write K,V into buffer, noting offsets
    * Register offsets with associated partition buffer, allocating/copying accounting buffers if nesc
    * Calculate the total mem usage for buffer and all partition collectors by iterating over the collectors
    * If total mem usage is greater than half of io.sort.mb, then start a new thread to spill, blocking if another spill is in progress
    For each spill (assuming no combiner):
    * Save references to our K,V byte buffer and accounting data, setting the former to null (will be recreated on the next call to collect(K,V))
    * Open a SequenceFile.Writer for this partition
    * Sort each partition separately (the current version of sort reuses, but still requires wrapping, indices in IntWritable objects)
    * Build a RawKeyValueIterator of sorted data for the partition
    * Deserialize each key and value and call SequenceFile::append(K,V) on the writer for this partition
    There are a number of opportunities for reducing the number of copies, creations, and operations we perform in this stage, particularly since growing many of the buffers involved requires that we copy the existing data to the newly sized allocation.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Chris Douglas (JIRA) at Mar 1, 2008 at 3:08 am
    [ https://issues.apache.org/jira/browse/HADOOP-2919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12574060#action_12574060 ]

    Chris Douglas commented on HADOOP-2919:
    ---------------------------------------

    sort benchmark on 100 nodes:
    trunk (r632182) || trunk + patch ||
    31:09 | 25:46 |
    Create fewer copies of buffer data during sort/spill
    ----------------------------------------------------

    Key: HADOOP-2919
    URL: https://issues.apache.org/jira/browse/HADOOP-2919
    Project: Hadoop Core
    Issue Type: Improvement
    Components: mapred
    Reporter: Chris Douglas
    Assignee: Chris Douglas
    Attachments: 2919-0.patch


    Currently, the sort/spill works as follows:
    Let r be the number of partitions
    For each call to collect(K,V) from map:
    * If buffers do not exist, allocate a new DataOutputBuffer to collect K,V bytes, allocate r buffers for collecting K,V offsets
    * Write K,V into buffer, noting offsets
    * Register offsets with associated partition buffer, allocating/copying accounting buffers if nesc
    * Calculate the total mem usage for buffer and all partition collectors by iterating over the collectors
    * If total mem usage is greater than half of io.sort.mb, then start a new thread to spill, blocking if another spill is in progress
    For each spill (assuming no combiner):
    * Save references to our K,V byte buffer and accounting data, setting the former to null (will be recreated on the next call to collect(K,V))
    * Open a SequenceFile.Writer for this partition
    * Sort each partition separately (the current version of sort reuses, but still requires wrapping, indices in IntWritable objects)
    * Build a RawKeyValueIterator of sorted data for the partition
    * Deserialize each key and value and call SequenceFile::append(K,V) on the writer for this partition
    There are a number of opportunities for reducing the number of copies, creations, and operations we perform in this stage, particularly since growing many of the buffers involved requires that we copy the existing data to the newly sized allocation.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Chris Douglas (JIRA) at Mar 3, 2008 at 12:57 am
    [ https://issues.apache.org/jira/browse/HADOOP-2919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12574060#action_12574060 ]

    chris.douglas edited comment on HADOOP-2919 at 3/2/08 4:55 PM:
    ---------------------------------------------------------------

    sort benchmarks:
    nodes|| trunk (r632182) || trunk + patch ||
    100 | 31:09 | 25:46 |
    20 | 23:10 | 20:03 |
    was (Author: chris.douglas):
    sort benchmark on 100 nodes:
    trunk (r632182) || trunk + patch ||
    31:09 | 25:46 |
    Create fewer copies of buffer data during sort/spill
    ----------------------------------------------------

    Key: HADOOP-2919
    URL: https://issues.apache.org/jira/browse/HADOOP-2919
    Project: Hadoop Core
    Issue Type: Improvement
    Components: mapred
    Reporter: Chris Douglas
    Assignee: Chris Douglas
    Attachments: 2919-0.patch


    Currently, the sort/spill works as follows:
    Let r be the number of partitions
    For each call to collect(K,V) from map:
    * If buffers do not exist, allocate a new DataOutputBuffer to collect K,V bytes, allocate r buffers for collecting K,V offsets
    * Write K,V into buffer, noting offsets
    * Register offsets with associated partition buffer, allocating/copying accounting buffers if nesc
    * Calculate the total mem usage for buffer and all partition collectors by iterating over the collectors
    * If total mem usage is greater than half of io.sort.mb, then start a new thread to spill, blocking if another spill is in progress
    For each spill (assuming no combiner):
    * Save references to our K,V byte buffer and accounting data, setting the former to null (will be recreated on the next call to collect(K,V))
    * Open a SequenceFile.Writer for this partition
    * Sort each partition separately (the current version of sort reuses, but still requires wrapping, indices in IntWritable objects)
    * Build a RawKeyValueIterator of sorted data for the partition
    * Deserialize each key and value and call SequenceFile::append(K,V) on the writer for this partition
    There are a number of opportunities for reducing the number of copies, creations, and operations we perform in this stage, particularly since growing many of the buffers involved requires that we copy the existing data to the newly sized allocation.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Chris Douglas (JIRA) at Mar 7, 2008 at 1:54 am
    [ https://issues.apache.org/jira/browse/HADOOP-2919?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Chris Douglas updated HADOOP-2919:
    ----------------------------------

    Attachment: 2919-1.patch

    Updated patch (depends on HADOOP-2943). It should support large records now, but it remains a work in progress.
    Create fewer copies of buffer data during sort/spill
    ----------------------------------------------------

    Key: HADOOP-2919
    URL: https://issues.apache.org/jira/browse/HADOOP-2919
    Project: Hadoop Core
    Issue Type: Improvement
    Components: mapred
    Reporter: Chris Douglas
    Assignee: Chris Douglas
    Attachments: 2919-0.patch, 2919-1.patch


    Currently, the sort/spill works as follows:
    Let r be the number of partitions
    For each call to collect(K,V) from map:
    * If buffers do not exist, allocate a new DataOutputBuffer to collect K,V bytes, allocate r buffers for collecting K,V offsets
    * Write K,V into buffer, noting offsets
    * Register offsets with associated partition buffer, allocating/copying accounting buffers if nesc
    * Calculate the total mem usage for buffer and all partition collectors by iterating over the collectors
    * If total mem usage is greater than half of io.sort.mb, then start a new thread to spill, blocking if another spill is in progress
    For each spill (assuming no combiner):
    * Save references to our K,V byte buffer and accounting data, setting the former to null (will be recreated on the next call to collect(K,V))
    * Open a SequenceFile.Writer for this partition
    * Sort each partition separately (the current version of sort reuses, but still requires wrapping, indices in IntWritable objects)
    * Build a RawKeyValueIterator of sorted data for the partition
    * Deserialize each key and value and call SequenceFile::append(K,V) on the writer for this partition
    There are a number of opportunities for reducing the number of copies, creations, and operations we perform in this stage, particularly since growing many of the buffers involved requires that we copy the existing data to the newly sized allocation.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Chris Douglas (JIRA) at Mar 10, 2008 at 7:05 am
    [ https://issues.apache.org/jira/browse/HADOOP-2919?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Chris Douglas updated HADOOP-2919:
    ----------------------------------

    Attachment: 2919-2.patch

    This patch makes some minor performance improvements, adds documentation, and correctly effects record compression in-place.

    The following should probably be implemented as separate JIRAs:
    * QuickSort would benefit from the optimization whereby keys equal to the pivot are swapped into place at the end of a pass.
    * Instead of recreating the spill thread, a persistent thread should accept spill events. This will permit one to set the spill threshold to less than 50% and avoid the overhead of creating a thread (assumed to be slight relative to the cost of a spill, but worth eliminating).
    * Recreating collectors is expensive. Pooling resources- particularly the collection buffers- between jobs (once JVM reuse is in place) should make a significant difference for jobs with short-running maps.
    Create fewer copies of buffer data during sort/spill
    ----------------------------------------------------

    Key: HADOOP-2919
    URL: https://issues.apache.org/jira/browse/HADOOP-2919
    Project: Hadoop Core
    Issue Type: Improvement
    Components: mapred
    Reporter: Chris Douglas
    Assignee: Chris Douglas
    Fix For: 0.17.0

    Attachments: 2919-0.patch, 2919-1.patch, 2919-2.patch


    Currently, the sort/spill works as follows:
    Let r be the number of partitions
    For each call to collect(K,V) from map:
    * If buffers do not exist, allocate a new DataOutputBuffer to collect K,V bytes, allocate r buffers for collecting K,V offsets
    * Write K,V into buffer, noting offsets
    * Register offsets with associated partition buffer, allocating/copying accounting buffers if nesc
    * Calculate the total mem usage for buffer and all partition collectors by iterating over the collectors
    * If total mem usage is greater than half of io.sort.mb, then start a new thread to spill, blocking if another spill is in progress
    For each spill (assuming no combiner):
    * Save references to our K,V byte buffer and accounting data, setting the former to null (will be recreated on the next call to collect(K,V))
    * Open a SequenceFile.Writer for this partition
    * Sort each partition separately (the current version of sort reuses, but still requires wrapping, indices in IntWritable objects)
    * Build a RawKeyValueIterator of sorted data for the partition
    * Deserialize each key and value and call SequenceFile::append(K,V) on the writer for this partition
    There are a number of opportunities for reducing the number of copies, creations, and operations we perform in this stage, particularly since growing many of the buffers involved requires that we copy the existing data to the newly sized allocation.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Chris Douglas (JIRA) at Mar 10, 2008 at 7:05 am
    [ https://issues.apache.org/jira/browse/HADOOP-2919?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Chris Douglas updated HADOOP-2919:
    ----------------------------------

    Fix Version/s: 0.17.0
    Status: Patch Available (was: Open)
    Create fewer copies of buffer data during sort/spill
    ----------------------------------------------------

    Key: HADOOP-2919
    URL: https://issues.apache.org/jira/browse/HADOOP-2919
    Project: Hadoop Core
    Issue Type: Improvement
    Components: mapred
    Reporter: Chris Douglas
    Assignee: Chris Douglas
    Fix For: 0.17.0

    Attachments: 2919-0.patch, 2919-1.patch, 2919-2.patch


    Currently, the sort/spill works as follows:
    Let r be the number of partitions
    For each call to collect(K,V) from map:
    * If buffers do not exist, allocate a new DataOutputBuffer to collect K,V bytes, allocate r buffers for collecting K,V offsets
    * Write K,V into buffer, noting offsets
    * Register offsets with associated partition buffer, allocating/copying accounting buffers if nesc
    * Calculate the total mem usage for buffer and all partition collectors by iterating over the collectors
    * If total mem usage is greater than half of io.sort.mb, then start a new thread to spill, blocking if another spill is in progress
    For each spill (assuming no combiner):
    * Save references to our K,V byte buffer and accounting data, setting the former to null (will be recreated on the next call to collect(K,V))
    * Open a SequenceFile.Writer for this partition
    * Sort each partition separately (the current version of sort reuses, but still requires wrapping, indices in IntWritable objects)
    * Build a RawKeyValueIterator of sorted data for the partition
    * Deserialize each key and value and call SequenceFile::append(K,V) on the writer for this partition
    There are a number of opportunities for reducing the number of copies, creations, and operations we perform in this stage, particularly since growing many of the buffers involved requires that we copy the existing data to the newly sized allocation.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Hadoop QA (JIRA) at Mar 10, 2008 at 10:21 am
    [ https://issues.apache.org/jira/browse/HADOOP-2919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12576930#action_12576930 ]

    Hadoop QA commented on HADOOP-2919:
    -----------------------------------

    -1 overall. Here are the results of testing the latest attachment
    http://issues.apache.org/jira/secure/attachment/12377508/2919-2.patch
    against trunk revision 619744.

    @author +1. The patch does not contain any @author tags.

    tests included +1. The patch appears to include 3 new or modified tests.

    javadoc +1. The javadoc tool did not generate any warning messages.

    javac -1. The applied patch generated 590 javac compiler warnings (more than the trunk's current 589 warnings).

    release audit +1. The applied patch does not generate any new release audit warnings.

    findbugs -1. The patch appears to introduce 5 new Findbugs warnings.

    core tests -1. The patch failed core unit tests.

    contrib tests +1. The patch passed contrib unit tests.

    Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1927/testReport/
    Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1927/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
    Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1927/artifact/trunk/build/test/checkstyle-errors.html
    Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1927/console

    This message is automatically generated.
    Create fewer copies of buffer data during sort/spill
    ----------------------------------------------------

    Key: HADOOP-2919
    URL: https://issues.apache.org/jira/browse/HADOOP-2919
    Project: Hadoop Core
    Issue Type: Improvement
    Components: mapred
    Reporter: Chris Douglas
    Assignee: Chris Douglas
    Fix For: 0.17.0

    Attachments: 2919-0.patch, 2919-1.patch, 2919-2.patch


    Currently, the sort/spill works as follows:
    Let r be the number of partitions
    For each call to collect(K,V) from map:
    * If buffers do not exist, allocate a new DataOutputBuffer to collect K,V bytes, allocate r buffers for collecting K,V offsets
    * Write K,V into buffer, noting offsets
    * Register offsets with associated partition buffer, allocating/copying accounting buffers if nesc
    * Calculate the total mem usage for buffer and all partition collectors by iterating over the collectors
    * If total mem usage is greater than half of io.sort.mb, then start a new thread to spill, blocking if another spill is in progress
    For each spill (assuming no combiner):
    * Save references to our K,V byte buffer and accounting data, setting the former to null (will be recreated on the next call to collect(K,V))
    * Open a SequenceFile.Writer for this partition
    * Sort each partition separately (the current version of sort reuses, but still requires wrapping, indices in IntWritable objects)
    * Build a RawKeyValueIterator of sorted data for the partition
    * Deserialize each key and value and call SequenceFile::append(K,V) on the writer for this partition
    There are a number of opportunities for reducing the number of copies, creations, and operations we perform in this stage, particularly since growing many of the buffers involved requires that we copy the existing data to the newly sized allocation.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Chris Douglas (JIRA) at Mar 10, 2008 at 6:02 pm
    [ https://issues.apache.org/jira/browse/HADOOP-2919?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Chris Douglas updated HADOOP-2919:
    ----------------------------------

    Status: Patch Available (was: Open)
    Create fewer copies of buffer data during sort/spill
    ----------------------------------------------------

    Key: HADOOP-2919
    URL: https://issues.apache.org/jira/browse/HADOOP-2919
    Project: Hadoop Core
    Issue Type: Improvement
    Components: mapred
    Reporter: Chris Douglas
    Assignee: Chris Douglas
    Fix For: 0.17.0

    Attachments: 2919-0.patch, 2919-1.patch, 2919-2.patch, 2919-3.patch


    Currently, the sort/spill works as follows:
    Let r be the number of partitions
    For each call to collect(K,V) from map:
    * If buffers do not exist, allocate a new DataOutputBuffer to collect K,V bytes, allocate r buffers for collecting K,V offsets
    * Write K,V into buffer, noting offsets
    * Register offsets with associated partition buffer, allocating/copying accounting buffers if nesc
    * Calculate the total mem usage for buffer and all partition collectors by iterating over the collectors
    * If total mem usage is greater than half of io.sort.mb, then start a new thread to spill, blocking if another spill is in progress
    For each spill (assuming no combiner):
    * Save references to our K,V byte buffer and accounting data, setting the former to null (will be recreated on the next call to collect(K,V))
    * Open a SequenceFile.Writer for this partition
    * Sort each partition separately (the current version of sort reuses, but still requires wrapping, indices in IntWritable objects)
    * Build a RawKeyValueIterator of sorted data for the partition
    * Deserialize each key and value and call SequenceFile::append(K,V) on the writer for this partition
    There are a number of opportunities for reducing the number of copies, creations, and operations we perform in this stage, particularly since growing many of the buffers involved requires that we copy the existing data to the newly sized allocation.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Chris Douglas (JIRA) at Mar 10, 2008 at 6:02 pm
    [ https://issues.apache.org/jira/browse/HADOOP-2919?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Chris Douglas updated HADOOP-2919:
    ----------------------------------

    Status: Open (was: Patch Available)
    Create fewer copies of buffer data during sort/spill
    ----------------------------------------------------

    Key: HADOOP-2919
    URL: https://issues.apache.org/jira/browse/HADOOP-2919
    Project: Hadoop Core
    Issue Type: Improvement
    Components: mapred
    Reporter: Chris Douglas
    Assignee: Chris Douglas
    Fix For: 0.17.0

    Attachments: 2919-0.patch, 2919-1.patch, 2919-2.patch, 2919-3.patch


    Currently, the sort/spill works as follows:
    Let r be the number of partitions
    For each call to collect(K,V) from map:
    * If buffers do not exist, allocate a new DataOutputBuffer to collect K,V bytes, allocate r buffers for collecting K,V offsets
    * Write K,V into buffer, noting offsets
    * Register offsets with associated partition buffer, allocating/copying accounting buffers if nesc
    * Calculate the total mem usage for buffer and all partition collectors by iterating over the collectors
    * If total mem usage is greater than half of io.sort.mb, then start a new thread to spill, blocking if another spill is in progress
    For each spill (assuming no combiner):
    * Save references to our K,V byte buffer and accounting data, setting the former to null (will be recreated on the next call to collect(K,V))
    * Open a SequenceFile.Writer for this partition
    * Sort each partition separately (the current version of sort reuses, but still requires wrapping, indices in IntWritable objects)
    * Build a RawKeyValueIterator of sorted data for the partition
    * Deserialize each key and value and call SequenceFile::append(K,V) on the writer for this partition
    There are a number of opportunities for reducing the number of copies, creations, and operations we perform in this stage, particularly since growing many of the buffers involved requires that we copy the existing data to the newly sized allocation.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Chris Douglas (JIRA) at Mar 10, 2008 at 6:02 pm
    [ https://issues.apache.org/jira/browse/HADOOP-2919?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Chris Douglas updated HADOOP-2919:
    ----------------------------------

    Attachment: 2919-3.patch

    Fixed findbugs warnings, suppressed spurious serialization warning for private IOE subclass
    Create fewer copies of buffer data during sort/spill
    ----------------------------------------------------

    Key: HADOOP-2919
    URL: https://issues.apache.org/jira/browse/HADOOP-2919
    Project: Hadoop Core
    Issue Type: Improvement
    Components: mapred
    Reporter: Chris Douglas
    Assignee: Chris Douglas
    Fix For: 0.17.0

    Attachments: 2919-0.patch, 2919-1.patch, 2919-2.patch, 2919-3.patch


    Currently, the sort/spill works as follows:
    Let r be the number of partitions
    For each call to collect(K,V) from map:
    * If buffers do not exist, allocate a new DataOutputBuffer to collect K,V bytes, allocate r buffers for collecting K,V offsets
    * Write K,V into buffer, noting offsets
    * Register offsets with associated partition buffer, allocating/copying accounting buffers if nesc
    * Calculate the total mem usage for buffer and all partition collectors by iterating over the collectors
    * If total mem usage is greater than half of io.sort.mb, then start a new thread to spill, blocking if another spill is in progress
    For each spill (assuming no combiner):
    * Save references to our K,V byte buffer and accounting data, setting the former to null (will be recreated on the next call to collect(K,V))
    * Open a SequenceFile.Writer for this partition
    * Sort each partition separately (the current version of sort reuses, but still requires wrapping, indices in IntWritable objects)
    * Build a RawKeyValueIterator of sorted data for the partition
    * Deserialize each key and value and call SequenceFile::append(K,V) on the writer for this partition
    There are a number of opportunities for reducing the number of copies, creations, and operations we perform in this stage, particularly since growing many of the buffers involved requires that we copy the existing data to the newly sized allocation.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Hadoop QA (JIRA) at Mar 11, 2008 at 12:08 am
    [ https://issues.apache.org/jira/browse/HADOOP-2919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12577257#action_12577257 ]

    Hadoop QA commented on HADOOP-2919:
    -----------------------------------

    -1 overall. Here are the results of testing the latest attachment
    http://issues.apache.org/jira/secure/attachment/12377540/2919-3.patch
    against trunk revision 619744.

    @author +1. The patch does not contain any @author tags.

    tests included +1. The patch appears to include 3 new or modified tests.

    javadoc +1. The javadoc tool did not generate any warning messages.

    javac +1. The applied patch does not generate any new javac compiler warnings.

    release audit +1. The applied patch does not generate any new release audit warnings.

    findbugs +1. The patch does not introduce any new Findbugs warnings.

    core tests -1. The patch failed core unit tests.

    contrib tests +1. The patch passed contrib unit tests.

    Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1930/testReport/
    Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1930/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
    Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1930/artifact/trunk/build/test/checkstyle-errors.html
    Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1930/console

    This message is automatically generated.
    Create fewer copies of buffer data during sort/spill
    ----------------------------------------------------

    Key: HADOOP-2919
    URL: https://issues.apache.org/jira/browse/HADOOP-2919
    Project: Hadoop Core
    Issue Type: Improvement
    Components: mapred
    Reporter: Chris Douglas
    Assignee: Chris Douglas
    Fix For: 0.17.0

    Attachments: 2919-0.patch, 2919-1.patch, 2919-2.patch, 2919-3.patch


    Currently, the sort/spill works as follows:
    Let r be the number of partitions
    For each call to collect(K,V) from map:
    * If buffers do not exist, allocate a new DataOutputBuffer to collect K,V bytes, allocate r buffers for collecting K,V offsets
    * Write K,V into buffer, noting offsets
    * Register offsets with associated partition buffer, allocating/copying accounting buffers if nesc
    * Calculate the total mem usage for buffer and all partition collectors by iterating over the collectors
    * If total mem usage is greater than half of io.sort.mb, then start a new thread to spill, blocking if another spill is in progress
    For each spill (assuming no combiner):
    * Save references to our K,V byte buffer and accounting data, setting the former to null (will be recreated on the next call to collect(K,V))
    * Open a SequenceFile.Writer for this partition
    * Sort each partition separately (the current version of sort reuses, but still requires wrapping, indices in IntWritable objects)
    * Build a RawKeyValueIterator of sorted data for the partition
    * Deserialize each key and value and call SequenceFile::append(K,V) on the writer for this partition
    There are a number of opportunities for reducing the number of copies, creations, and operations we perform in this stage, particularly since growing many of the buffers involved requires that we copy the existing data to the newly sized allocation.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Chris Douglas (JIRA) at Mar 11, 2008 at 12:32 am
    [ https://issues.apache.org/jira/browse/HADOOP-2919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12577260#action_12577260 ]

    Chris Douglas commented on HADOOP-2919:
    ---------------------------------------

    I haven't been able to reproduce this failure in Linux or on MacOS. Looking at the console output, the timeout looks related to HADOOP-2971. I'm seeing a handful of the following errors from Hudson:

    {noformat}
    [junit] 2008-03-10 23:22:51,803 INFO dfs.DataNode (DataNode.java:run(1985)) - PacketResponder blk_1646669170773132170 1 Exception java.net.SocketTimeoutException: 60000 millis timeout while waiting for /127.0.0.1:34190 (local: /127.0.0.1:34496) to be ready for read
    [junit] at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:188)
    [junit] at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:135)
    [junit] at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:121)
    [junit] at java.io.DataInputStream.readFully(DataInputStream.java:176)
    [junit] at java.io.DataInputStream.readLong(DataInputStream.java:380)
    [junit] at org.apache.hadoop.dfs.DataNode$PacketResponder.run(DataNode.java:1957)
    [junit] at java.lang.Thread.run(Thread.java:595)
    {noformat}

    Since the failure is coming from TestMiniMRDFSSort- code this patch certainly affects- this result is not auspicious, but I suspect the issue is not related to this patch.
    Create fewer copies of buffer data during sort/spill
    ----------------------------------------------------

    Key: HADOOP-2919
    URL: https://issues.apache.org/jira/browse/HADOOP-2919
    Project: Hadoop Core
    Issue Type: Improvement
    Components: mapred
    Reporter: Chris Douglas
    Assignee: Chris Douglas
    Fix For: 0.17.0

    Attachments: 2919-0.patch, 2919-1.patch, 2919-2.patch, 2919-3.patch


    Currently, the sort/spill works as follows:
    Let r be the number of partitions
    For each call to collect(K,V) from map:
    * If buffers do not exist, allocate a new DataOutputBuffer to collect K,V bytes, allocate r buffers for collecting K,V offsets
    * Write K,V into buffer, noting offsets
    * Register offsets with associated partition buffer, allocating/copying accounting buffers if nesc
    * Calculate the total mem usage for buffer and all partition collectors by iterating over the collectors
    * If total mem usage is greater than half of io.sort.mb, then start a new thread to spill, blocking if another spill is in progress
    For each spill (assuming no combiner):
    * Save references to our K,V byte buffer and accounting data, setting the former to null (will be recreated on the next call to collect(K,V))
    * Open a SequenceFile.Writer for this partition
    * Sort each partition separately (the current version of sort reuses, but still requires wrapping, indices in IntWritable objects)
    * Build a RawKeyValueIterator of sorted data for the partition
    * Deserialize each key and value and call SequenceFile::append(K,V) on the writer for this partition
    There are a number of opportunities for reducing the number of copies, creations, and operations we perform in this stage, particularly since growing many of the buffers involved requires that we copy the existing data to the newly sized allocation.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Devaraj Das (JIRA) at Mar 13, 2008 at 9:47 am
    [ https://issues.apache.org/jira/browse/HADOOP-2919?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Devaraj Das updated HADOOP-2919:
    --------------------------------

    Status: Patch Available (was: Open)
    Create fewer copies of buffer data during sort/spill
    ----------------------------------------------------

    Key: HADOOP-2919
    URL: https://issues.apache.org/jira/browse/HADOOP-2919
    Project: Hadoop Core
    Issue Type: Improvement
    Components: mapred
    Reporter: Chris Douglas
    Assignee: Chris Douglas
    Fix For: 0.17.0

    Attachments: 2919-0.patch, 2919-1.patch, 2919-2.patch, 2919-3.patch


    Currently, the sort/spill works as follows:
    Let r be the number of partitions
    For each call to collect(K,V) from map:
    * If buffers do not exist, allocate a new DataOutputBuffer to collect K,V bytes, allocate r buffers for collecting K,V offsets
    * Write K,V into buffer, noting offsets
    * Register offsets with associated partition buffer, allocating/copying accounting buffers if nesc
    * Calculate the total mem usage for buffer and all partition collectors by iterating over the collectors
    * If total mem usage is greater than half of io.sort.mb, then start a new thread to spill, blocking if another spill is in progress
    For each spill (assuming no combiner):
    * Save references to our K,V byte buffer and accounting data, setting the former to null (will be recreated on the next call to collect(K,V))
    * Open a SequenceFile.Writer for this partition
    * Sort each partition separately (the current version of sort reuses, but still requires wrapping, indices in IntWritable objects)
    * Build a RawKeyValueIterator of sorted data for the partition
    * Deserialize each key and value and call SequenceFile::append(K,V) on the writer for this partition
    There are a number of opportunities for reducing the number of copies, creations, and operations we perform in this stage, particularly since growing many of the buffers involved requires that we copy the existing data to the newly sized allocation.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Devaraj Das (JIRA) at Mar 13, 2008 at 9:47 am
    [ https://issues.apache.org/jira/browse/HADOOP-2919?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Devaraj Das updated HADOOP-2919:
    --------------------------------

    Status: Open (was: Patch Available)

    Cancelling for resubmission
    Create fewer copies of buffer data during sort/spill
    ----------------------------------------------------

    Key: HADOOP-2919
    URL: https://issues.apache.org/jira/browse/HADOOP-2919
    Project: Hadoop Core
    Issue Type: Improvement
    Components: mapred
    Reporter: Chris Douglas
    Assignee: Chris Douglas
    Fix For: 0.17.0

    Attachments: 2919-0.patch, 2919-1.patch, 2919-2.patch, 2919-3.patch


    Currently, the sort/spill works as follows:
    Let r be the number of partitions
    For each call to collect(K,V) from map:
    * If buffers do not exist, allocate a new DataOutputBuffer to collect K,V bytes, allocate r buffers for collecting K,V offsets
    * Write K,V into buffer, noting offsets
    * Register offsets with associated partition buffer, allocating/copying accounting buffers if nesc
    * Calculate the total mem usage for buffer and all partition collectors by iterating over the collectors
    * If total mem usage is greater than half of io.sort.mb, then start a new thread to spill, blocking if another spill is in progress
    For each spill (assuming no combiner):
    * Save references to our K,V byte buffer and accounting data, setting the former to null (will be recreated on the next call to collect(K,V))
    * Open a SequenceFile.Writer for this partition
    * Sort each partition separately (the current version of sort reuses, but still requires wrapping, indices in IntWritable objects)
    * Build a RawKeyValueIterator of sorted data for the partition
    * Deserialize each key and value and call SequenceFile::append(K,V) on the writer for this partition
    There are a number of opportunities for reducing the number of copies, creations, and operations we perform in this stage, particularly since growing many of the buffers involved requires that we copy the existing data to the newly sized allocation.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Chris Douglas (JIRA) at Mar 13, 2008 at 10:31 pm
    [ https://issues.apache.org/jira/browse/HADOOP-2919?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Chris Douglas updated HADOOP-2919:
    ----------------------------------

    Status: Patch Available (was: Open)
    Create fewer copies of buffer data during sort/spill
    ----------------------------------------------------

    Key: HADOOP-2919
    URL: https://issues.apache.org/jira/browse/HADOOP-2919
    Project: Hadoop Core
    Issue Type: Improvement
    Components: mapred
    Reporter: Chris Douglas
    Assignee: Chris Douglas
    Fix For: 0.17.0

    Attachments: 2919-0.patch, 2919-1.patch, 2919-2.patch, 2919-3.patch, 2919-4.patch


    Currently, the sort/spill works as follows:
    Let r be the number of partitions
    For each call to collect(K,V) from map:
    * If buffers do not exist, allocate a new DataOutputBuffer to collect K,V bytes, allocate r buffers for collecting K,V offsets
    * Write K,V into buffer, noting offsets
    * Register offsets with associated partition buffer, allocating/copying accounting buffers if nesc
    * Calculate the total mem usage for buffer and all partition collectors by iterating over the collectors
    * If total mem usage is greater than half of io.sort.mb, then start a new thread to spill, blocking if another spill is in progress
    For each spill (assuming no combiner):
    * Save references to our K,V byte buffer and accounting data, setting the former to null (will be recreated on the next call to collect(K,V))
    * Open a SequenceFile.Writer for this partition
    * Sort each partition separately (the current version of sort reuses, but still requires wrapping, indices in IntWritable objects)
    * Build a RawKeyValueIterator of sorted data for the partition
    * Deserialize each key and value and call SequenceFile::append(K,V) on the writer for this partition
    There are a number of opportunities for reducing the number of copies, creations, and operations we perform in this stage, particularly since growing many of the buffers involved requires that we copy the existing data to the newly sized allocation.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Chris Douglas (JIRA) at Mar 13, 2008 at 10:31 pm
    [ https://issues.apache.org/jira/browse/HADOOP-2919?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Chris Douglas updated HADOOP-2919:
    ----------------------------------

    Status: Open (was: Patch Available)
    Create fewer copies of buffer data during sort/spill
    ----------------------------------------------------

    Key: HADOOP-2919
    URL: https://issues.apache.org/jira/browse/HADOOP-2919
    Project: Hadoop Core
    Issue Type: Improvement
    Components: mapred
    Reporter: Chris Douglas
    Assignee: Chris Douglas
    Fix For: 0.17.0

    Attachments: 2919-0.patch, 2919-1.patch, 2919-2.patch, 2919-3.patch, 2919-4.patch


    Currently, the sort/spill works as follows:
    Let r be the number of partitions
    For each call to collect(K,V) from map:
    * If buffers do not exist, allocate a new DataOutputBuffer to collect K,V bytes, allocate r buffers for collecting K,V offsets
    * Write K,V into buffer, noting offsets
    * Register offsets with associated partition buffer, allocating/copying accounting buffers if nesc
    * Calculate the total mem usage for buffer and all partition collectors by iterating over the collectors
    * If total mem usage is greater than half of io.sort.mb, then start a new thread to spill, blocking if another spill is in progress
    For each spill (assuming no combiner):
    * Save references to our K,V byte buffer and accounting data, setting the former to null (will be recreated on the next call to collect(K,V))
    * Open a SequenceFile.Writer for this partition
    * Sort each partition separately (the current version of sort reuses, but still requires wrapping, indices in IntWritable objects)
    * Build a RawKeyValueIterator of sorted data for the partition
    * Deserialize each key and value and call SequenceFile::append(K,V) on the writer for this partition
    There are a number of opportunities for reducing the number of copies, creations, and operations we perform in this stage, particularly since growing many of the buffers involved requires that we copy the existing data to the newly sized allocation.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Chris Douglas (JIRA) at Mar 13, 2008 at 10:31 pm
    [ https://issues.apache.org/jira/browse/HADOOP-2919?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Chris Douglas updated HADOOP-2919:
    ----------------------------------

    Attachment: 2919-4.patch

    Merged patch with latest trunk
    Create fewer copies of buffer data during sort/spill
    ----------------------------------------------------

    Key: HADOOP-2919
    URL: https://issues.apache.org/jira/browse/HADOOP-2919
    Project: Hadoop Core
    Issue Type: Improvement
    Components: mapred
    Reporter: Chris Douglas
    Assignee: Chris Douglas
    Fix For: 0.17.0

    Attachments: 2919-0.patch, 2919-1.patch, 2919-2.patch, 2919-3.patch, 2919-4.patch


    Currently, the sort/spill works as follows:
    Let r be the number of partitions
    For each call to collect(K,V) from map:
    * If buffers do not exist, allocate a new DataOutputBuffer to collect K,V bytes, allocate r buffers for collecting K,V offsets
    * Write K,V into buffer, noting offsets
    * Register offsets with associated partition buffer, allocating/copying accounting buffers if nesc
    * Calculate the total mem usage for buffer and all partition collectors by iterating over the collectors
    * If total mem usage is greater than half of io.sort.mb, then start a new thread to spill, blocking if another spill is in progress
    For each spill (assuming no combiner):
    * Save references to our K,V byte buffer and accounting data, setting the former to null (will be recreated on the next call to collect(K,V))
    * Open a SequenceFile.Writer for this partition
    * Sort each partition separately (the current version of sort reuses, but still requires wrapping, indices in IntWritable objects)
    * Build a RawKeyValueIterator of sorted data for the partition
    * Deserialize each key and value and call SequenceFile::append(K,V) on the writer for this partition
    There are a number of opportunities for reducing the number of copies, creations, and operations we perform in this stage, particularly since growing many of the buffers involved requires that we copy the existing data to the newly sized allocation.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Hadoop QA (JIRA) at Mar 14, 2008 at 1:45 am
    [ https://issues.apache.org/jira/browse/HADOOP-2919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12578564#action_12578564 ]

    Hadoop QA commented on HADOOP-2919:
    -----------------------------------

    -1 overall. Here are the results of testing the latest attachment
    http://issues.apache.org/jira/secure/attachment/12377841/2919-4.patch
    against trunk revision 619744.

    @author +1. The patch does not contain any @author tags.

    tests included +1. The patch appears to include 3 new or modified tests.

    javadoc +1. The javadoc tool did not generate any warning messages.

    javac +1. The applied patch does not generate any new javac compiler warnings.

    release audit +1. The applied patch does not generate any new release audit warnings.

    findbugs +1. The patch does not introduce any new Findbugs warnings.

    core tests -1. The patch failed core unit tests.

    contrib tests +1. The patch passed contrib unit tests.

    Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1958/testReport/
    Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1958/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
    Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1958/artifact/trunk/build/test/checkstyle-errors.html
    Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1958/console

    This message is automatically generated.
    Create fewer copies of buffer data during sort/spill
    ----------------------------------------------------

    Key: HADOOP-2919
    URL: https://issues.apache.org/jira/browse/HADOOP-2919
    Project: Hadoop Core
    Issue Type: Improvement
    Components: mapred
    Reporter: Chris Douglas
    Assignee: Chris Douglas
    Fix For: 0.17.0

    Attachments: 2919-0.patch, 2919-1.patch, 2919-2.patch, 2919-3.patch, 2919-4.patch


    Currently, the sort/spill works as follows:
    Let r be the number of partitions
    For each call to collect(K,V) from map:
    * If buffers do not exist, allocate a new DataOutputBuffer to collect K,V bytes, allocate r buffers for collecting K,V offsets
    * Write K,V into buffer, noting offsets
    * Register offsets with associated partition buffer, allocating/copying accounting buffers if nesc
    * Calculate the total mem usage for buffer and all partition collectors by iterating over the collectors
    * If total mem usage is greater than half of io.sort.mb, then start a new thread to spill, blocking if another spill is in progress
    For each spill (assuming no combiner):
    * Save references to our K,V byte buffer and accounting data, setting the former to null (will be recreated on the next call to collect(K,V))
    * Open a SequenceFile.Writer for this partition
    * Sort each partition separately (the current version of sort reuses, but still requires wrapping, indices in IntWritable objects)
    * Build a RawKeyValueIterator of sorted data for the partition
    * Deserialize each key and value and call SequenceFile::append(K,V) on the writer for this partition
    There are a number of opportunities for reducing the number of copies, creations, and operations we perform in this stage, particularly since growing many of the buffers involved requires that we copy the existing data to the newly sized allocation.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Owen O'Malley (JIRA) at Mar 18, 2008 at 7:47 pm
    [ https://issues.apache.org/jira/browse/HADOOP-2919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12580027#action_12580027 ]

    Owen O'Malley commented on HADOOP-2919:
    ---------------------------------------

    I just started going through this, but I've seen some nits:
    1. IndexedSorter is importing Progressable, but not using it.
    2. utils should depend on mapred, so IndexedSorter and IndexdSortable should move out to utils.
    3. QuickSort should use empty braces rather than just a semicolon, so replace:
    while (++i < r && s.compare(i, x) < 0);
    while (--j > x && s.compare(x, j) < 0);
    with
    while (++i < r && s.compare(i, x) < 0) { } // NOTHING
    while (--j > x && s.compare(x, j) < 0) { } // NOTHING

    Create fewer copies of buffer data during sort/spill
    ----------------------------------------------------

    Key: HADOOP-2919
    URL: https://issues.apache.org/jira/browse/HADOOP-2919
    Project: Hadoop Core
    Issue Type: Improvement
    Components: mapred
    Reporter: Chris Douglas
    Assignee: Chris Douglas
    Fix For: 0.17.0

    Attachments: 2919-0.patch, 2919-1.patch, 2919-2.patch, 2919-3.patch, 2919-4.patch


    Currently, the sort/spill works as follows:
    Let r be the number of partitions
    For each call to collect(K,V) from map:
    * If buffers do not exist, allocate a new DataOutputBuffer to collect K,V bytes, allocate r buffers for collecting K,V offsets
    * Write K,V into buffer, noting offsets
    * Register offsets with associated partition buffer, allocating/copying accounting buffers if nesc
    * Calculate the total mem usage for buffer and all partition collectors by iterating over the collectors
    * If total mem usage is greater than half of io.sort.mb, then start a new thread to spill, blocking if another spill is in progress
    For each spill (assuming no combiner):
    * Save references to our K,V byte buffer and accounting data, setting the former to null (will be recreated on the next call to collect(K,V))
    * Open a SequenceFile.Writer for this partition
    * Sort each partition separately (the current version of sort reuses, but still requires wrapping, indices in IntWritable objects)
    * Build a RawKeyValueIterator of sorted data for the partition
    * Deserialize each key and value and call SequenceFile::append(K,V) on the writer for this partition
    There are a number of opportunities for reducing the number of copies, creations, and operations we perform in this stage, particularly since growing many of the buffers involved requires that we copy the existing data to the newly sized allocation.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Owen O'Malley (JIRA) at Mar 19, 2008 at 12:06 am
    [ https://issues.apache.org/jira/browse/HADOOP-2919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12580182#action_12580182 ]

    Owen O'Malley commented on HADOOP-2919:
    ---------------------------------------

    A few more suggestions:

    1. reduce from warn to debug for single value spill
    2. think about case when key precisely fills buffer and whether we need to reset in that case
    3. fix sizes to raw comparator
    4. add test case where key compare uses length (bytes writable?)
    5. rename mark to avoid mark/reset confusion
    6. use byte[] inside of reset instead of dataoutputbuffer
    7. remove extra semicolon in MapTask.getVBytesForOffset
    8. don't ignore interruptedexception
    9. wrap remotely caught exceptions in new exception and set cause to the remotely caught exception

    Create fewer copies of buffer data during sort/spill
    ----------------------------------------------------

    Key: HADOOP-2919
    URL: https://issues.apache.org/jira/browse/HADOOP-2919
    Project: Hadoop Core
    Issue Type: Improvement
    Components: mapred
    Reporter: Chris Douglas
    Assignee: Chris Douglas
    Fix For: 0.17.0

    Attachments: 2919-0.patch, 2919-1.patch, 2919-2.patch, 2919-3.patch, 2919-4.patch


    Currently, the sort/spill works as follows:
    Let r be the number of partitions
    For each call to collect(K,V) from map:
    * If buffers do not exist, allocate a new DataOutputBuffer to collect K,V bytes, allocate r buffers for collecting K,V offsets
    * Write K,V into buffer, noting offsets
    * Register offsets with associated partition buffer, allocating/copying accounting buffers if nesc
    * Calculate the total mem usage for buffer and all partition collectors by iterating over the collectors
    * If total mem usage is greater than half of io.sort.mb, then start a new thread to spill, blocking if another spill is in progress
    For each spill (assuming no combiner):
    * Save references to our K,V byte buffer and accounting data, setting the former to null (will be recreated on the next call to collect(K,V))
    * Open a SequenceFile.Writer for this partition
    * Sort each partition separately (the current version of sort reuses, but still requires wrapping, indices in IntWritable objects)
    * Build a RawKeyValueIterator of sorted data for the partition
    * Deserialize each key and value and call SequenceFile::append(K,V) on the writer for this partition
    There are a number of opportunities for reducing the number of copies, creations, and operations we perform in this stage, particularly since growing many of the buffers involved requires that we copy the existing data to the newly sized allocation.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Chris Douglas (JIRA) at Mar 19, 2008 at 3:02 am
    [ https://issues.apache.org/jira/browse/HADOOP-2919?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Chris Douglas updated HADOOP-2919:
    ----------------------------------

    Attachment: 2919-5.patch

    Addresses most of Owen's feedback, excluding the following:

    bq. 2. think about case when key precisely fills buffer and whether we need to reset in that case

    For now, storing key start/end indices is useful enough that I'm loathe to make a corner case of that for now. Re-copying the key is unnecessary, but- I'm guessing- not very costly relative to adding an additional branch into the compare (since it should be a very infrequent case).

    bq. 4. add test case where key compare uses length (bytes writable?)

    I haven't been able to find a Writable with this property (HADOOP-3046), but I'll add a test case presently.
    Create fewer copies of buffer data during sort/spill
    ----------------------------------------------------

    Key: HADOOP-2919
    URL: https://issues.apache.org/jira/browse/HADOOP-2919
    Project: Hadoop Core
    Issue Type: Improvement
    Components: mapred
    Reporter: Chris Douglas
    Assignee: Chris Douglas
    Fix For: 0.17.0

    Attachments: 2919-0.patch, 2919-1.patch, 2919-2.patch, 2919-3.patch, 2919-4.patch, 2919-5.patch


    Currently, the sort/spill works as follows:
    Let r be the number of partitions
    For each call to collect(K,V) from map:
    * If buffers do not exist, allocate a new DataOutputBuffer to collect K,V bytes, allocate r buffers for collecting K,V offsets
    * Write K,V into buffer, noting offsets
    * Register offsets with associated partition buffer, allocating/copying accounting buffers if nesc
    * Calculate the total mem usage for buffer and all partition collectors by iterating over the collectors
    * If total mem usage is greater than half of io.sort.mb, then start a new thread to spill, blocking if another spill is in progress
    For each spill (assuming no combiner):
    * Save references to our K,V byte buffer and accounting data, setting the former to null (will be recreated on the next call to collect(K,V))
    * Open a SequenceFile.Writer for this partition
    * Sort each partition separately (the current version of sort reuses, but still requires wrapping, indices in IntWritable objects)
    * Build a RawKeyValueIterator of sorted data for the partition
    * Deserialize each key and value and call SequenceFile::append(K,V) on the writer for this partition
    There are a number of opportunities for reducing the number of copies, creations, and operations we perform in this stage, particularly since growing many of the buffers involved requires that we copy the existing data to the newly sized allocation.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Chris Douglas (JIRA) at Mar 19, 2008 at 7:10 pm
    [ https://issues.apache.org/jira/browse/HADOOP-2919?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Chris Douglas updated HADOOP-2919:
    ----------------------------------

    Status: Patch Available (was: Open)
    Create fewer copies of buffer data during sort/spill
    ----------------------------------------------------

    Key: HADOOP-2919
    URL: https://issues.apache.org/jira/browse/HADOOP-2919
    Project: Hadoop Core
    Issue Type: Improvement
    Components: mapred
    Reporter: Chris Douglas
    Assignee: Chris Douglas
    Fix For: 0.17.0

    Attachments: 2919-0.patch, 2919-1.patch, 2919-2.patch, 2919-3.patch, 2919-4.patch, 2919-5.patch


    Currently, the sort/spill works as follows:
    Let r be the number of partitions
    For each call to collect(K,V) from map:
    * If buffers do not exist, allocate a new DataOutputBuffer to collect K,V bytes, allocate r buffers for collecting K,V offsets
    * Write K,V into buffer, noting offsets
    * Register offsets with associated partition buffer, allocating/copying accounting buffers if nesc
    * Calculate the total mem usage for buffer and all partition collectors by iterating over the collectors
    * If total mem usage is greater than half of io.sort.mb, then start a new thread to spill, blocking if another spill is in progress
    For each spill (assuming no combiner):
    * Save references to our K,V byte buffer and accounting data, setting the former to null (will be recreated on the next call to collect(K,V))
    * Open a SequenceFile.Writer for this partition
    * Sort each partition separately (the current version of sort reuses, but still requires wrapping, indices in IntWritable objects)
    * Build a RawKeyValueIterator of sorted data for the partition
    * Deserialize each key and value and call SequenceFile::append(K,V) on the writer for this partition
    There are a number of opportunities for reducing the number of copies, creations, and operations we perform in this stage, particularly since growing many of the buffers involved requires that we copy the existing data to the newly sized allocation.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Chris Douglas (JIRA) at Mar 19, 2008 at 7:10 pm
    [ https://issues.apache.org/jira/browse/HADOOP-2919?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Chris Douglas updated HADOOP-2919:
    ----------------------------------

    Status: Open (was: Patch Available)
    Create fewer copies of buffer data during sort/spill
    ----------------------------------------------------

    Key: HADOOP-2919
    URL: https://issues.apache.org/jira/browse/HADOOP-2919
    Project: Hadoop Core
    Issue Type: Improvement
    Components: mapred
    Reporter: Chris Douglas
    Assignee: Chris Douglas
    Fix For: 0.17.0

    Attachments: 2919-0.patch, 2919-1.patch, 2919-2.patch, 2919-3.patch, 2919-4.patch, 2919-5.patch


    Currently, the sort/spill works as follows:
    Let r be the number of partitions
    For each call to collect(K,V) from map:
    * If buffers do not exist, allocate a new DataOutputBuffer to collect K,V bytes, allocate r buffers for collecting K,V offsets
    * Write K,V into buffer, noting offsets
    * Register offsets with associated partition buffer, allocating/copying accounting buffers if nesc
    * Calculate the total mem usage for buffer and all partition collectors by iterating over the collectors
    * If total mem usage is greater than half of io.sort.mb, then start a new thread to spill, blocking if another spill is in progress
    For each spill (assuming no combiner):
    * Save references to our K,V byte buffer and accounting data, setting the former to null (will be recreated on the next call to collect(K,V))
    * Open a SequenceFile.Writer for this partition
    * Sort each partition separately (the current version of sort reuses, but still requires wrapping, indices in IntWritable objects)
    * Build a RawKeyValueIterator of sorted data for the partition
    * Deserialize each key and value and call SequenceFile::append(K,V) on the writer for this partition
    There are a number of opportunities for reducing the number of copies, creations, and operations we perform in this stage, particularly since growing many of the buffers involved requires that we copy the existing data to the newly sized allocation.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Owen O'Malley (JIRA) at Mar 19, 2008 at 9:04 pm
    [ https://issues.apache.org/jira/browse/HADOOP-2919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12580559#action_12580559 ]

    Owen O'Malley commented on HADOOP-2919:
    ---------------------------------------

    Some more comments:
    1. rewrite softlimit computations moving the ?: operators down to just calculate the number of entries & bytes
    2. include size of record and size of buffer in map output buffer too small exception
    3. rename inmemuncompressedbytes since it can contain compressed bytes too
    4. make combine and spill call close on combiner in finally block
    5. remove IllegalArgumentException in memuncompressedbytes declaration
    6. make static finals for the index offsets (0, 1, 2, and 3)
    Create fewer copies of buffer data during sort/spill
    ----------------------------------------------------

    Key: HADOOP-2919
    URL: https://issues.apache.org/jira/browse/HADOOP-2919
    Project: Hadoop Core
    Issue Type: Improvement
    Components: mapred
    Reporter: Chris Douglas
    Assignee: Chris Douglas
    Fix For: 0.17.0

    Attachments: 2919-0.patch, 2919-1.patch, 2919-2.patch, 2919-3.patch, 2919-4.patch, 2919-5.patch


    Currently, the sort/spill works as follows:
    Let r be the number of partitions
    For each call to collect(K,V) from map:
    * If buffers do not exist, allocate a new DataOutputBuffer to collect K,V bytes, allocate r buffers for collecting K,V offsets
    * Write K,V into buffer, noting offsets
    * Register offsets with associated partition buffer, allocating/copying accounting buffers if nesc
    * Calculate the total mem usage for buffer and all partition collectors by iterating over the collectors
    * If total mem usage is greater than half of io.sort.mb, then start a new thread to spill, blocking if another spill is in progress
    For each spill (assuming no combiner):
    * Save references to our K,V byte buffer and accounting data, setting the former to null (will be recreated on the next call to collect(K,V))
    * Open a SequenceFile.Writer for this partition
    * Sort each partition separately (the current version of sort reuses, but still requires wrapping, indices in IntWritable objects)
    * Build a RawKeyValueIterator of sorted data for the partition
    * Deserialize each key and value and call SequenceFile::append(K,V) on the writer for this partition
    There are a number of opportunities for reducing the number of copies, creations, and operations we perform in this stage, particularly since growing many of the buffers involved requires that we copy the existing data to the newly sized allocation.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Chris Douglas (JIRA) at Mar 20, 2008 at 1:24 am
    [ https://issues.apache.org/jira/browse/HADOOP-2919?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Chris Douglas updated HADOOP-2919:
    ----------------------------------

    Status: Patch Available (was: Open)
    Create fewer copies of buffer data during sort/spill
    ----------------------------------------------------

    Key: HADOOP-2919
    URL: https://issues.apache.org/jira/browse/HADOOP-2919
    Project: Hadoop Core
    Issue Type: Improvement
    Components: mapred
    Reporter: Chris Douglas
    Assignee: Chris Douglas
    Fix For: 0.17.0

    Attachments: 2919-0.patch, 2919-1.patch, 2919-2.patch, 2919-3.patch, 2919-4.patch, 2919-5.patch, 2919-6.patch


    Currently, the sort/spill works as follows:
    Let r be the number of partitions
    For each call to collect(K,V) from map:
    * If buffers do not exist, allocate a new DataOutputBuffer to collect K,V bytes, allocate r buffers for collecting K,V offsets
    * Write K,V into buffer, noting offsets
    * Register offsets with associated partition buffer, allocating/copying accounting buffers if nesc
    * Calculate the total mem usage for buffer and all partition collectors by iterating over the collectors
    * If total mem usage is greater than half of io.sort.mb, then start a new thread to spill, blocking if another spill is in progress
    For each spill (assuming no combiner):
    * Save references to our K,V byte buffer and accounting data, setting the former to null (will be recreated on the next call to collect(K,V))
    * Open a SequenceFile.Writer for this partition
    * Sort each partition separately (the current version of sort reuses, but still requires wrapping, indices in IntWritable objects)
    * Build a RawKeyValueIterator of sorted data for the partition
    * Deserialize each key and value and call SequenceFile::append(K,V) on the writer for this partition
    There are a number of opportunities for reducing the number of copies, creations, and operations we perform in this stage, particularly since growing many of the buffers involved requires that we copy the existing data to the newly sized allocation.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Chris Douglas (JIRA) at Mar 20, 2008 at 1:24 am
    [ https://issues.apache.org/jira/browse/HADOOP-2919?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Chris Douglas updated HADOOP-2919:
    ----------------------------------

    Status: Open (was: Patch Available)
    Create fewer copies of buffer data during sort/spill
    ----------------------------------------------------

    Key: HADOOP-2919
    URL: https://issues.apache.org/jira/browse/HADOOP-2919
    Project: Hadoop Core
    Issue Type: Improvement
    Components: mapred
    Reporter: Chris Douglas
    Assignee: Chris Douglas
    Fix For: 0.17.0

    Attachments: 2919-0.patch, 2919-1.patch, 2919-2.patch, 2919-3.patch, 2919-4.patch, 2919-5.patch, 2919-6.patch


    Currently, the sort/spill works as follows:
    Let r be the number of partitions
    For each call to collect(K,V) from map:
    * If buffers do not exist, allocate a new DataOutputBuffer to collect K,V bytes, allocate r buffers for collecting K,V offsets
    * Write K,V into buffer, noting offsets
    * Register offsets with associated partition buffer, allocating/copying accounting buffers if nesc
    * Calculate the total mem usage for buffer and all partition collectors by iterating over the collectors
    * If total mem usage is greater than half of io.sort.mb, then start a new thread to spill, blocking if another spill is in progress
    For each spill (assuming no combiner):
    * Save references to our K,V byte buffer and accounting data, setting the former to null (will be recreated on the next call to collect(K,V))
    * Open a SequenceFile.Writer for this partition
    * Sort each partition separately (the current version of sort reuses, but still requires wrapping, indices in IntWritable objects)
    * Build a RawKeyValueIterator of sorted data for the partition
    * Deserialize each key and value and call SequenceFile::append(K,V) on the writer for this partition
    There are a number of opportunities for reducing the number of copies, creations, and operations we perform in this stage, particularly since growing many of the buffers involved requires that we copy the existing data to the newly sized allocation.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Chris Douglas (JIRA) at Mar 20, 2008 at 1:24 am
    [ https://issues.apache.org/jira/browse/HADOOP-2919?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Chris Douglas updated HADOOP-2919:
    ----------------------------------

    Attachment: 2919-6.patch

    Integrated Owen's feedback
    Create fewer copies of buffer data during sort/spill
    ----------------------------------------------------

    Key: HADOOP-2919
    URL: https://issues.apache.org/jira/browse/HADOOP-2919
    Project: Hadoop Core
    Issue Type: Improvement
    Components: mapred
    Reporter: Chris Douglas
    Assignee: Chris Douglas
    Fix For: 0.17.0

    Attachments: 2919-0.patch, 2919-1.patch, 2919-2.patch, 2919-3.patch, 2919-4.patch, 2919-5.patch, 2919-6.patch


    Currently, the sort/spill works as follows:
    Let r be the number of partitions
    For each call to collect(K,V) from map:
    * If buffers do not exist, allocate a new DataOutputBuffer to collect K,V bytes, allocate r buffers for collecting K,V offsets
    * Write K,V into buffer, noting offsets
    * Register offsets with associated partition buffer, allocating/copying accounting buffers if nesc
    * Calculate the total mem usage for buffer and all partition collectors by iterating over the collectors
    * If total mem usage is greater than half of io.sort.mb, then start a new thread to spill, blocking if another spill is in progress
    For each spill (assuming no combiner):
    * Save references to our K,V byte buffer and accounting data, setting the former to null (will be recreated on the next call to collect(K,V))
    * Open a SequenceFile.Writer for this partition
    * Sort each partition separately (the current version of sort reuses, but still requires wrapping, indices in IntWritable objects)
    * Build a RawKeyValueIterator of sorted data for the partition
    * Deserialize each key and value and call SequenceFile::append(K,V) on the writer for this partition
    There are a number of opportunities for reducing the number of copies, creations, and operations we perform in this stage, particularly since growing many of the buffers involved requires that we copy the existing data to the newly sized allocation.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Chris Douglas (JIRA) at Mar 20, 2008 at 1:26 am
    [ https://issues.apache.org/jira/browse/HADOOP-2919?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Chris Douglas updated HADOOP-2919:
    ----------------------------------

    Attachment: 2919-6.patch

    (whoops; caught unrelated changes)
    Create fewer copies of buffer data during sort/spill
    ----------------------------------------------------

    Key: HADOOP-2919
    URL: https://issues.apache.org/jira/browse/HADOOP-2919
    Project: Hadoop Core
    Issue Type: Improvement
    Components: mapred
    Reporter: Chris Douglas
    Assignee: Chris Douglas
    Fix For: 0.17.0

    Attachments: 2919-0.patch, 2919-1.patch, 2919-2.patch, 2919-3.patch, 2919-4.patch, 2919-5.patch, 2919-6.patch


    Currently, the sort/spill works as follows:
    Let r be the number of partitions
    For each call to collect(K,V) from map:
    * If buffers do not exist, allocate a new DataOutputBuffer to collect K,V bytes, allocate r buffers for collecting K,V offsets
    * Write K,V into buffer, noting offsets
    * Register offsets with associated partition buffer, allocating/copying accounting buffers if nesc
    * Calculate the total mem usage for buffer and all partition collectors by iterating over the collectors
    * If total mem usage is greater than half of io.sort.mb, then start a new thread to spill, blocking if another spill is in progress
    For each spill (assuming no combiner):
    * Save references to our K,V byte buffer and accounting data, setting the former to null (will be recreated on the next call to collect(K,V))
    * Open a SequenceFile.Writer for this partition
    * Sort each partition separately (the current version of sort reuses, but still requires wrapping, indices in IntWritable objects)
    * Build a RawKeyValueIterator of sorted data for the partition
    * Deserialize each key and value and call SequenceFile::append(K,V) on the writer for this partition
    There are a number of opportunities for reducing the number of copies, creations, and operations we perform in this stage, particularly since growing many of the buffers involved requires that we copy the existing data to the newly sized allocation.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Chris Douglas (JIRA) at Mar 20, 2008 at 1:26 am
    [ https://issues.apache.org/jira/browse/HADOOP-2919?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Chris Douglas updated HADOOP-2919:
    ----------------------------------

    Status: Patch Available (was: Open)
    Create fewer copies of buffer data during sort/spill
    ----------------------------------------------------

    Key: HADOOP-2919
    URL: https://issues.apache.org/jira/browse/HADOOP-2919
    Project: Hadoop Core
    Issue Type: Improvement
    Components: mapred
    Reporter: Chris Douglas
    Assignee: Chris Douglas
    Fix For: 0.17.0

    Attachments: 2919-0.patch, 2919-1.patch, 2919-2.patch, 2919-3.patch, 2919-4.patch, 2919-5.patch, 2919-6.patch


    Currently, the sort/spill works as follows:
    Let r be the number of partitions
    For each call to collect(K,V) from map:
    * If buffers do not exist, allocate a new DataOutputBuffer to collect K,V bytes, allocate r buffers for collecting K,V offsets
    * Write K,V into buffer, noting offsets
    * Register offsets with associated partition buffer, allocating/copying accounting buffers if nesc
    * Calculate the total mem usage for buffer and all partition collectors by iterating over the collectors
    * If total mem usage is greater than half of io.sort.mb, then start a new thread to spill, blocking if another spill is in progress
    For each spill (assuming no combiner):
    * Save references to our K,V byte buffer and accounting data, setting the former to null (will be recreated on the next call to collect(K,V))
    * Open a SequenceFile.Writer for this partition
    * Sort each partition separately (the current version of sort reuses, but still requires wrapping, indices in IntWritable objects)
    * Build a RawKeyValueIterator of sorted data for the partition
    * Deserialize each key and value and call SequenceFile::append(K,V) on the writer for this partition
    There are a number of opportunities for reducing the number of copies, creations, and operations we perform in this stage, particularly since growing many of the buffers involved requires that we copy the existing data to the newly sized allocation.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Chris Douglas (JIRA) at Mar 20, 2008 at 1:26 am
    [ https://issues.apache.org/jira/browse/HADOOP-2919?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Chris Douglas updated HADOOP-2919:
    ----------------------------------

    Attachment: (was: 2919-6.patch)
    Create fewer copies of buffer data during sort/spill
    ----------------------------------------------------

    Key: HADOOP-2919
    URL: https://issues.apache.org/jira/browse/HADOOP-2919
    Project: Hadoop Core
    Issue Type: Improvement
    Components: mapred
    Reporter: Chris Douglas
    Assignee: Chris Douglas
    Fix For: 0.17.0

    Attachments: 2919-0.patch, 2919-1.patch, 2919-2.patch, 2919-3.patch, 2919-4.patch, 2919-5.patch, 2919-6.patch


    Currently, the sort/spill works as follows:
    Let r be the number of partitions
    For each call to collect(K,V) from map:
    * If buffers do not exist, allocate a new DataOutputBuffer to collect K,V bytes, allocate r buffers for collecting K,V offsets
    * Write K,V into buffer, noting offsets
    * Register offsets with associated partition buffer, allocating/copying accounting buffers if nesc
    * Calculate the total mem usage for buffer and all partition collectors by iterating over the collectors
    * If total mem usage is greater than half of io.sort.mb, then start a new thread to spill, blocking if another spill is in progress
    For each spill (assuming no combiner):
    * Save references to our K,V byte buffer and accounting data, setting the former to null (will be recreated on the next call to collect(K,V))
    * Open a SequenceFile.Writer for this partition
    * Sort each partition separately (the current version of sort reuses, but still requires wrapping, indices in IntWritable objects)
    * Build a RawKeyValueIterator of sorted data for the partition
    * Deserialize each key and value and call SequenceFile::append(K,V) on the writer for this partition
    There are a number of opportunities for reducing the number of copies, creations, and operations we perform in this stage, particularly since growing many of the buffers involved requires that we copy the existing data to the newly sized allocation.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Chris Douglas (JIRA) at Mar 20, 2008 at 1:26 am
    [ https://issues.apache.org/jira/browse/HADOOP-2919?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Chris Douglas updated HADOOP-2919:
    ----------------------------------

    Status: Open (was: Patch Available)
    Create fewer copies of buffer data during sort/spill
    ----------------------------------------------------

    Key: HADOOP-2919
    URL: https://issues.apache.org/jira/browse/HADOOP-2919
    Project: Hadoop Core
    Issue Type: Improvement
    Components: mapred
    Reporter: Chris Douglas
    Assignee: Chris Douglas
    Fix For: 0.17.0

    Attachments: 2919-0.patch, 2919-1.patch, 2919-2.patch, 2919-3.patch, 2919-4.patch, 2919-5.patch, 2919-6.patch


    Currently, the sort/spill works as follows:
    Let r be the number of partitions
    For each call to collect(K,V) from map:
    * If buffers do not exist, allocate a new DataOutputBuffer to collect K,V bytes, allocate r buffers for collecting K,V offsets
    * Write K,V into buffer, noting offsets
    * Register offsets with associated partition buffer, allocating/copying accounting buffers if nesc
    * Calculate the total mem usage for buffer and all partition collectors by iterating over the collectors
    * If total mem usage is greater than half of io.sort.mb, then start a new thread to spill, blocking if another spill is in progress
    For each spill (assuming no combiner):
    * Save references to our K,V byte buffer and accounting data, setting the former to null (will be recreated on the next call to collect(K,V))
    * Open a SequenceFile.Writer for this partition
    * Sort each partition separately (the current version of sort reuses, but still requires wrapping, indices in IntWritable objects)
    * Build a RawKeyValueIterator of sorted data for the partition
    * Deserialize each key and value and call SequenceFile::append(K,V) on the writer for this partition
    There are a number of opportunities for reducing the number of copies, creations, and operations we perform in this stage, particularly since growing many of the buffers involved requires that we copy the existing data to the newly sized allocation.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Hadoop QA (JIRA) at Mar 20, 2008 at 12:59 pm
    [ https://issues.apache.org/jira/browse/HADOOP-2919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12580740#action_12580740 ]

    Hadoop QA commented on HADOOP-2919:
    -----------------------------------

    -1 overall. Here are the results of testing the latest attachment
    http://issues.apache.org/jira/secure/attachment/12378286/2919-6.patch
    against trunk revision 619744.

    @author +1. The patch does not contain any @author tags.

    tests included +1. The patch appears to include 3 new or modified tests.

    javadoc +1. The javadoc tool did not generate any warning messages.

    javac +1. The applied patch does not generate any new javac compiler warnings.

    release audit +1. The applied patch does not generate any new release audit warnings.

    findbugs +1. The patch does not introduce any new Findbugs warnings.

    core tests -1. The patch failed core unit tests.

    contrib tests +1. The patch passed contrib unit tests.

    Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2010/testReport/
    Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2010/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
    Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2010/artifact/trunk/build/test/checkstyle-errors.html
    Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2010/console

    This message is automatically generated.
    Create fewer copies of buffer data during sort/spill
    ----------------------------------------------------

    Key: HADOOP-2919
    URL: https://issues.apache.org/jira/browse/HADOOP-2919
    Project: Hadoop Core
    Issue Type: Improvement
    Components: mapred
    Reporter: Chris Douglas
    Assignee: Chris Douglas
    Fix For: 0.17.0

    Attachments: 2919-0.patch, 2919-1.patch, 2919-2.patch, 2919-3.patch, 2919-4.patch, 2919-5.patch, 2919-6.patch


    Currently, the sort/spill works as follows:
    Let r be the number of partitions
    For each call to collect(K,V) from map:
    * If buffers do not exist, allocate a new DataOutputBuffer to collect K,V bytes, allocate r buffers for collecting K,V offsets
    * Write K,V into buffer, noting offsets
    * Register offsets with associated partition buffer, allocating/copying accounting buffers if nesc
    * Calculate the total mem usage for buffer and all partition collectors by iterating over the collectors
    * If total mem usage is greater than half of io.sort.mb, then start a new thread to spill, blocking if another spill is in progress
    For each spill (assuming no combiner):
    * Save references to our K,V byte buffer and accounting data, setting the former to null (will be recreated on the next call to collect(K,V))
    * Open a SequenceFile.Writer for this partition
    * Sort each partition separately (the current version of sort reuses, but still requires wrapping, indices in IntWritable objects)
    * Build a RawKeyValueIterator of sorted data for the partition
    * Deserialize each key and value and call SequenceFile::append(K,V) on the writer for this partition
    There are a number of opportunities for reducing the number of copies, creations, and operations we perform in this stage, particularly since growing many of the buffers involved requires that we copy the existing data to the newly sized allocation.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Chris Douglas (JIRA) at Mar 26, 2008 at 10:31 pm
    [ https://issues.apache.org/jira/browse/HADOOP-2919?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Chris Douglas updated HADOOP-2919:
    ----------------------------------

    Attachment: 2919-7.patch

    This patch is idential to 2919-6, but the output buffer is released prior to the merge.
    Create fewer copies of buffer data during sort/spill
    ----------------------------------------------------

    Key: HADOOP-2919
    URL: https://issues.apache.org/jira/browse/HADOOP-2919
    Project: Hadoop Core
    Issue Type: Improvement
    Components: mapred
    Reporter: Chris Douglas
    Assignee: Chris Douglas
    Priority: Blocker
    Fix For: 0.17.0

    Attachments: 2919-0.patch, 2919-1.patch, 2919-2.patch, 2919-3.patch, 2919-4.patch, 2919-5.patch, 2919-6.patch, 2919-7.patch


    Currently, the sort/spill works as follows:
    Let r be the number of partitions
    For each call to collect(K,V) from map:
    * If buffers do not exist, allocate a new DataOutputBuffer to collect K,V bytes, allocate r buffers for collecting K,V offsets
    * Write K,V into buffer, noting offsets
    * Register offsets with associated partition buffer, allocating/copying accounting buffers if nesc
    * Calculate the total mem usage for buffer and all partition collectors by iterating over the collectors
    * If total mem usage is greater than half of io.sort.mb, then start a new thread to spill, blocking if another spill is in progress
    For each spill (assuming no combiner):
    * Save references to our K,V byte buffer and accounting data, setting the former to null (will be recreated on the next call to collect(K,V))
    * Open a SequenceFile.Writer for this partition
    * Sort each partition separately (the current version of sort reuses, but still requires wrapping, indices in IntWritable objects)
    * Build a RawKeyValueIterator of sorted data for the partition
    * Deserialize each key and value and call SequenceFile::append(K,V) on the writer for this partition
    There are a number of opportunities for reducing the number of copies, creations, and operations we perform in this stage, particularly since growing many of the buffers involved requires that we copy the existing data to the newly sized allocation.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Chris Douglas (JIRA) at Mar 27, 2008 at 4:55 am
    [ https://issues.apache.org/jira/browse/HADOOP-2919?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Chris Douglas updated HADOOP-2919:
    ----------------------------------

    Status: Patch Available (was: Open)
    Create fewer copies of buffer data during sort/spill
    ----------------------------------------------------

    Key: HADOOP-2919
    URL: https://issues.apache.org/jira/browse/HADOOP-2919
    Project: Hadoop Core
    Issue Type: Improvement
    Components: mapred
    Reporter: Chris Douglas
    Assignee: Chris Douglas
    Priority: Blocker
    Fix For: 0.17.0

    Attachments: 2919-0.patch, 2919-1.patch, 2919-2.patch, 2919-3.patch, 2919-4.patch, 2919-5.patch, 2919-6.patch, 2919-7.patch


    Currently, the sort/spill works as follows:
    Let r be the number of partitions
    For each call to collect(K,V) from map:
    * If buffers do not exist, allocate a new DataOutputBuffer to collect K,V bytes, allocate r buffers for collecting K,V offsets
    * Write K,V into buffer, noting offsets
    * Register offsets with associated partition buffer, allocating/copying accounting buffers if nesc
    * Calculate the total mem usage for buffer and all partition collectors by iterating over the collectors
    * If total mem usage is greater than half of io.sort.mb, then start a new thread to spill, blocking if another spill is in progress
    For each spill (assuming no combiner):
    * Save references to our K,V byte buffer and accounting data, setting the former to null (will be recreated on the next call to collect(K,V))
    * Open a SequenceFile.Writer for this partition
    * Sort each partition separately (the current version of sort reuses, but still requires wrapping, indices in IntWritable objects)
    * Build a RawKeyValueIterator of sorted data for the partition
    * Deserialize each key and value and call SequenceFile::append(K,V) on the writer for this partition
    There are a number of opportunities for reducing the number of copies, creations, and operations we perform in this stage, particularly since growing many of the buffers involved requires that we copy the existing data to the newly sized allocation.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Chris Douglas (JIRA) at Mar 27, 2008 at 4:55 am
    [ https://issues.apache.org/jira/browse/HADOOP-2919?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Chris Douglas updated HADOOP-2919:
    ----------------------------------

    Status: Open (was: Patch Available)
    Create fewer copies of buffer data during sort/spill
    ----------------------------------------------------

    Key: HADOOP-2919
    URL: https://issues.apache.org/jira/browse/HADOOP-2919
    Project: Hadoop Core
    Issue Type: Improvement
    Components: mapred
    Reporter: Chris Douglas
    Assignee: Chris Douglas
    Priority: Blocker
    Fix For: 0.17.0

    Attachments: 2919-0.patch, 2919-1.patch, 2919-2.patch, 2919-3.patch, 2919-4.patch, 2919-5.patch, 2919-6.patch, 2919-7.patch


    Currently, the sort/spill works as follows:
    Let r be the number of partitions
    For each call to collect(K,V) from map:
    * If buffers do not exist, allocate a new DataOutputBuffer to collect K,V bytes, allocate r buffers for collecting K,V offsets
    * Write K,V into buffer, noting offsets
    * Register offsets with associated partition buffer, allocating/copying accounting buffers if nesc
    * Calculate the total mem usage for buffer and all partition collectors by iterating over the collectors
    * If total mem usage is greater than half of io.sort.mb, then start a new thread to spill, blocking if another spill is in progress
    For each spill (assuming no combiner):
    * Save references to our K,V byte buffer and accounting data, setting the former to null (will be recreated on the next call to collect(K,V))
    * Open a SequenceFile.Writer for this partition
    * Sort each partition separately (the current version of sort reuses, but still requires wrapping, indices in IntWritable objects)
    * Build a RawKeyValueIterator of sorted data for the partition
    * Deserialize each key and value and call SequenceFile::append(K,V) on the writer for this partition
    There are a number of opportunities for reducing the number of copies, creations, and operations we perform in this stage, particularly since growing many of the buffers involved requires that we copy the existing data to the newly sized allocation.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Hadoop QA (JIRA) at Mar 27, 2008 at 6:35 am
    [ https://issues.apache.org/jira/browse/HADOOP-2919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12582526#action_12582526 ]

    Hadoop QA commented on HADOOP-2919:
    -----------------------------------

    -1 overall. Here are the results of testing the latest attachment
    http://issues.apache.org/jira/secure/attachment/12378667/2919-7.patch
    against trunk revision 619744.

    @author +1. The patch does not contain any @author tags.

    tests included +1. The patch appears to include 3 new or modified tests.

    javadoc +1. The javadoc tool did not generate any warning messages.

    javac +1. The applied patch does not generate any new javac compiler warnings.

    release audit +1. The applied patch does not generate any new release audit warnings.

    findbugs +1. The patch does not introduce any new Findbugs warnings.

    core tests -1. The patch failed core unit tests.

    contrib tests +1. The patch passed contrib unit tests.

    Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2074/testReport/
    Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2074/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
    Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2074/artifact/trunk/build/test/checkstyle-errors.html
    Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2074/console

    This message is automatically generated.
    Create fewer copies of buffer data during sort/spill
    ----------------------------------------------------

    Key: HADOOP-2919
    URL: https://issues.apache.org/jira/browse/HADOOP-2919
    Project: Hadoop Core
    Issue Type: Improvement
    Components: mapred
    Reporter: Chris Douglas
    Assignee: Chris Douglas
    Priority: Blocker
    Fix For: 0.17.0

    Attachments: 2919-0.patch, 2919-1.patch, 2919-2.patch, 2919-3.patch, 2919-4.patch, 2919-5.patch, 2919-6.patch, 2919-7.patch


    Currently, the sort/spill works as follows:
    Let r be the number of partitions
    For each call to collect(K,V) from map:
    * If buffers do not exist, allocate a new DataOutputBuffer to collect K,V bytes, allocate r buffers for collecting K,V offsets
    * Write K,V into buffer, noting offsets
    * Register offsets with associated partition buffer, allocating/copying accounting buffers if nesc
    * Calculate the total mem usage for buffer and all partition collectors by iterating over the collectors
    * If total mem usage is greater than half of io.sort.mb, then start a new thread to spill, blocking if another spill is in progress
    For each spill (assuming no combiner):
    * Save references to our K,V byte buffer and accounting data, setting the former to null (will be recreated on the next call to collect(K,V))
    * Open a SequenceFile.Writer for this partition
    * Sort each partition separately (the current version of sort reuses, but still requires wrapping, indices in IntWritable objects)
    * Build a RawKeyValueIterator of sorted data for the partition
    * Deserialize each key and value and call SequenceFile::append(K,V) on the writer for this partition
    There are a number of opportunities for reducing the number of copies, creations, and operations we perform in this stage, particularly since growing many of the buffers involved requires that we copy the existing data to the newly sized allocation.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Chris Douglas (JIRA) at Mar 27, 2008 at 8:51 am
    [ https://issues.apache.org/jira/browse/HADOOP-2919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12582547#action_12582547 ]

    Chris Douglas commented on HADOOP-2919:
    ---------------------------------------

    sort on 100 nodes:
    trunk (r641466) || trunk + patch ||
    uncompressed bytes | 38:17 | 34:38 |
    compressed text | 34:18 | 31:37 |
    Create fewer copies of buffer data during sort/spill
    ----------------------------------------------------

    Key: HADOOP-2919
    URL: https://issues.apache.org/jira/browse/HADOOP-2919
    Project: Hadoop Core
    Issue Type: Improvement
    Components: mapred
    Reporter: Chris Douglas
    Assignee: Chris Douglas
    Priority: Blocker
    Fix For: 0.17.0

    Attachments: 2919-0.patch, 2919-1.patch, 2919-2.patch, 2919-3.patch, 2919-4.patch, 2919-5.patch, 2919-6.patch, 2919-7.patch


    Currently, the sort/spill works as follows:
    Let r be the number of partitions
    For each call to collect(K,V) from map:
    * If buffers do not exist, allocate a new DataOutputBuffer to collect K,V bytes, allocate r buffers for collecting K,V offsets
    * Write K,V into buffer, noting offsets
    * Register offsets with associated partition buffer, allocating/copying accounting buffers if nesc
    * Calculate the total mem usage for buffer and all partition collectors by iterating over the collectors
    * If total mem usage is greater than half of io.sort.mb, then start a new thread to spill, blocking if another spill is in progress
    For each spill (assuming no combiner):
    * Save references to our K,V byte buffer and accounting data, setting the former to null (will be recreated on the next call to collect(K,V))
    * Open a SequenceFile.Writer for this partition
    * Sort each partition separately (the current version of sort reuses, but still requires wrapping, indices in IntWritable objects)
    * Build a RawKeyValueIterator of sorted data for the partition
    * Deserialize each key and value and call SequenceFile::append(K,V) on the writer for this partition
    There are a number of opportunities for reducing the number of copies, creations, and operations we perform in this stage, particularly since growing many of the buffers involved requires that we copy the existing data to the newly sized allocation.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Owen O'Malley (JIRA) at Mar 28, 2008 at 8:52 pm
    [ https://issues.apache.org/jira/browse/HADOOP-2919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12583192#action_12583192 ]

    Owen O'Malley commented on HADOOP-2919:
    ---------------------------------------

    +1
    Create fewer copies of buffer data during sort/spill
    ----------------------------------------------------

    Key: HADOOP-2919
    URL: https://issues.apache.org/jira/browse/HADOOP-2919
    Project: Hadoop Core
    Issue Type: Improvement
    Components: mapred
    Reporter: Chris Douglas
    Assignee: Chris Douglas
    Priority: Blocker
    Fix For: 0.17.0

    Attachments: 2919-0.patch, 2919-1.patch, 2919-2.patch, 2919-3.patch, 2919-4.patch, 2919-5.patch, 2919-6.patch, 2919-7.patch


    Currently, the sort/spill works as follows:
    Let r be the number of partitions
    For each call to collect(K,V) from map:
    * If buffers do not exist, allocate a new DataOutputBuffer to collect K,V bytes, allocate r buffers for collecting K,V offsets
    * Write K,V into buffer, noting offsets
    * Register offsets with associated partition buffer, allocating/copying accounting buffers if nesc
    * Calculate the total mem usage for buffer and all partition collectors by iterating over the collectors
    * If total mem usage is greater than half of io.sort.mb, then start a new thread to spill, blocking if another spill is in progress
    For each spill (assuming no combiner):
    * Save references to our K,V byte buffer and accounting data, setting the former to null (will be recreated on the next call to collect(K,V))
    * Open a SequenceFile.Writer for this partition
    * Sort each partition separately (the current version of sort reuses, but still requires wrapping, indices in IntWritable objects)
    * Build a RawKeyValueIterator of sorted data for the partition
    * Deserialize each key and value and call SequenceFile::append(K,V) on the writer for this partition
    There are a number of opportunities for reducing the number of copies, creations, and operations we perform in this stage, particularly since growing many of the buffers involved requires that we copy the existing data to the newly sized allocation.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Owen O'Malley (JIRA) at Mar 31, 2008 at 10:34 pm
    [ https://issues.apache.org/jira/browse/HADOOP-2919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12583911#action_12583911 ]

    Owen O'Malley commented on HADOOP-2919:
    ---------------------------------------

    I was concerned that TestMiniMRDFSSort was failing on Hudson with this patch, even though it was working on "real" machines. It looks like it was primarily resource starvation on the zones machines. I filed HADOOP-3142 and HADOOP-3143 to address the problems.
    Create fewer copies of buffer data during sort/spill
    ----------------------------------------------------

    Key: HADOOP-2919
    URL: https://issues.apache.org/jira/browse/HADOOP-2919
    Project: Hadoop Core
    Issue Type: Improvement
    Components: mapred
    Reporter: Chris Douglas
    Assignee: Chris Douglas
    Priority: Blocker
    Fix For: 0.17.0

    Attachments: 2919-0.patch, 2919-1.patch, 2919-2.patch, 2919-3.patch, 2919-4.patch, 2919-5.patch, 2919-6.patch, 2919-7.patch


    Currently, the sort/spill works as follows:
    Let r be the number of partitions
    For each call to collect(K,V) from map:
    * If buffers do not exist, allocate a new DataOutputBuffer to collect K,V bytes, allocate r buffers for collecting K,V offsets
    * Write K,V into buffer, noting offsets
    * Register offsets with associated partition buffer, allocating/copying accounting buffers if nesc
    * Calculate the total mem usage for buffer and all partition collectors by iterating over the collectors
    * If total mem usage is greater than half of io.sort.mb, then start a new thread to spill, blocking if another spill is in progress
    For each spill (assuming no combiner):
    * Save references to our K,V byte buffer and accounting data, setting the former to null (will be recreated on the next call to collect(K,V))
    * Open a SequenceFile.Writer for this partition
    * Sort each partition separately (the current version of sort reuses, but still requires wrapping, indices in IntWritable objects)
    * Build a RawKeyValueIterator of sorted data for the partition
    * Deserialize each key and value and call SequenceFile::append(K,V) on the writer for this partition
    There are a number of opportunities for reducing the number of copies, creations, and operations we perform in this stage, particularly since growing many of the buffers involved requires that we copy the existing data to the newly sized allocation.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Owen O'Malley (JIRA) at Mar 31, 2008 at 10:54 pm
    [ https://issues.apache.org/jira/browse/HADOOP-2919?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Owen O'Malley updated HADOOP-2919:
    ----------------------------------

    Resolution: Fixed
    Status: Resolved (was: Patch Available)

    I just committed this. Thanks, Chris!
    Create fewer copies of buffer data during sort/spill
    ----------------------------------------------------

    Key: HADOOP-2919
    URL: https://issues.apache.org/jira/browse/HADOOP-2919
    Project: Hadoop Core
    Issue Type: Improvement
    Components: mapred
    Reporter: Chris Douglas
    Assignee: Chris Douglas
    Priority: Blocker
    Fix For: 0.17.0

    Attachments: 2919-0.patch, 2919-1.patch, 2919-2.patch, 2919-3.patch, 2919-4.patch, 2919-5.patch, 2919-6.patch, 2919-7.patch


    Currently, the sort/spill works as follows:
    Let r be the number of partitions
    For each call to collect(K,V) from map:
    * If buffers do not exist, allocate a new DataOutputBuffer to collect K,V bytes, allocate r buffers for collecting K,V offsets
    * Write K,V into buffer, noting offsets
    * Register offsets with associated partition buffer, allocating/copying accounting buffers if nesc
    * Calculate the total mem usage for buffer and all partition collectors by iterating over the collectors
    * If total mem usage is greater than half of io.sort.mb, then start a new thread to spill, blocking if another spill is in progress
    For each spill (assuming no combiner):
    * Save references to our K,V byte buffer and accounting data, setting the former to null (will be recreated on the next call to collect(K,V))
    * Open a SequenceFile.Writer for this partition
    * Sort each partition separately (the current version of sort reuses, but still requires wrapping, indices in IntWritable objects)
    * Build a RawKeyValueIterator of sorted data for the partition
    * Deserialize each key and value and call SequenceFile::append(K,V) on the writer for this partition
    There are a number of opportunities for reducing the number of copies, creations, and operations we perform in this stage, particularly since growing many of the buffers involved requires that we copy the existing data to the newly sized allocation.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcommon-dev @
categorieshadoop
postedMar 1, '08 at 2:04a
activeMar 31, '08 at 10:54p
posts42
users1
websitehadoop.apache.org...
irc#hadoop

1 user in discussion

Owen O'Malley (JIRA): 42 posts

People

Translate

site design / logo © 2022 Grokbase