FAQ
Hi all,

This is the formalization of a previous thread
<https://groups.google.com/forum/#!topic/golang-nuts/434c3YInH_M> in which
I'll try to make a case for including an additional copy primitive into the
standard libs. The reason I propose its inclusion is because it is
non-trivial to write, and apparently - based on the previous discussion -,
people find it really hard to realize that/why simpler solutions do not
work as intended.

*Problem to solve:*

Go makes it very easy to connect readers with writers and copy/pipe data
from one endpoint to the other. In an average case where the reader can
produce data in a similar throughput and stability as the writer, this
solution works as intended.

However, if the reader produces data in bursts, or the writer consumes data
in bursts, the existing solutions break down, because the whole pipeline is
synchronous: if the writer cannot accept data for a while, the reader won't
fetch, even if there's available. Similarly if the reader produces in
periodic bursts, but the writer cannot consume immediately, then the reader
stalls, even though it might take it a long time to deliver the next burst.

Two concrete examples where this scenario appeared:

    - *Streaming a file from one remote location to another.*
    The reader/downloader can be considered fairly stable, producing data at
    a constant rate. The writer/uploader however works in bursts, as if first
    collects a larger chunk of data and only uploads afterwards (e.g. gsutils
    collects ~100MB worth of data). The issue is, that after the uploader's
    buffer fills up, it blocks further writes until it finishes processing. But
    if the writer is blocked, the reader will block too (nowhere to put the
    data), and the download stalls.

    - *Generating a data block and uploading to a remote location.*
    If we have a non-streaming data generator (e.g. compress a file in a
    non-streaming way), then it will most probably generate large chunks of
    data periodically. Compared to this, the uploader may be considered stable.
    However, while the uploader is stably but slowly pushing the data, the
    generator stops potentially long running tasks, since the writer is not
    accepting new data.

*Why existing constructs don't work:*

Per the previous thread, people readily jump to buffered readers/writers
and pipes, thinking that they must work, but they don't realize why they
*cannot* work:

    - *Buffered readers are synchronous and blocking.*
    The goal of a buffered reader is to provide peeking capabilities for an
    input stream, and to store some data that arrived but we haven't consumed.
    The issue is, that if there is something already in the buffer, newly
    arriving data won't be pushed in until the existing data is consumed (even
    if the buffer is mostly empty). This means that a buffered reader will
    still stall reading, even though it has the capacity to accept it.

    - *Buffered writers are synchronous and blocking.*
    The goal of a buffered writer is to save potentially expensive write
    operations by accumulating arriving data, and only forwarding it when
    enough was collected. However, the moment the buffer is full and/or a flush
    is executed, the writer is completely blocked until the data in its
    entirety can be transferred. But this means, that anything that streams
    data into a writer immediately stalls.

    - *Pipes are non-buffered*
    Piping an input stream into an output stream will not work, as neither
    io.Pipe nor os.Pipe uses buffered pipes (yes, they do use 64K buffers, but
    that is not nearly enough to cover these issues, and they are not
    modifyable), so a pipe doesn't really do anything more that a simple copy.

All in all, in order to handle bursty readers and/or writers, a buffer
needs to be placed *in between* the differing read and writer operations,
not before or after.

*My proposal:*

My proposed solution to these problems is the introduction of a specialized
copy operation into the bufio package, that would run on two separate
threads: one consuming data from the reader and feeding it into an internal
buffer of configurable size; the other one consuming data from the internal
buffer and feeding it to the writer endpoint. The goal of this mechanism is
to completely isolate the reader and writer threads, allowing each to
execute even if the other one is temporarily blocked.

The signature of the copy would be analogous to the io.Copy, just with the
configurable buffer size added:

bufio.Copy(dst io.Writer, src io.Reader, buffer int) (written int64, err
error)

Internally the operation would be based on a single circular buffer with
both reader and writer threads using atomic operations for data handling,
only resorting to channels when the buffer is full or empty (since then one
thread must obviously block). The solution would also not require any
memory allocations beyond the initial buffer setup, making arbitrarily long
running copy operations GC friendly/free.

*Proposed implementation:*

I've written up an implementation for the above mentioned Copy operation.
In both algorithmic construction and naming conventions (internal
variables) follows the io.Copy implementation, however consider it a
starting point for further refinements.

The implementation and some fairly trivial tests have been included here:

$ go get github.com/karalabe/bufioprop

Furthermore, to prove my point that existing constructs and even other
simple-looking solutions don't work as intended, I've written a small
*shootout* code simulating three copy scenarios:

    - Classical streaming copy where the source and sink have a similar
    throughput and production/consumption style.
    - Stable source producing data at a constant rate, but a bursty sink,
    accepting big batches periodically. The overall throughput of the two
    endpoints are the same, only the production/consumption cycles and data
    chunks are different.
    - Bursty source producing big data chunks in rare occasions, and a
    stable sink consuming data at a constant rate. Again, the overall
    throughput of the two endpoints are the same, only the
    production/consumption cycles and data chunks are different.

You can run these tests via:

$ go get github.com/karalabe/bufioprop/shootout
$ shootout

Stable input, stable output:
         io.Copy: 3.38504052s 10.666667 mbps.
  [!] bufio.Copy: 3.37012021s 10.666667 mbps.
rogerpeppe.Copy: 3.414476536s 10.666667 mbps.
mattharden.Copy: 6.368713887s 5.333333 mbps.

Stable input, bursty output:
         io.Copy: 6.251177787s 5.333333 mbps.
  [!] bufio.Copy: 3.387935437s 10.666667 mbps.
rogerpeppe.Copy: 5.98428305s 6.400000 mbps.
mattharden.Copy: 6.250739081s 5.333333 mbps.

Bursty input, stable output:
         io.Copy: 6.25889809s 5.333333 mbps.
  [!] bufio.Copy: 3.347354357s 10.666667 mbps.
rogerpeppe.Copy: 5.999921216s 6.400000 mbps.
mattharden.Copy: 3.473998412s 10.666667 mbps.

To add your own challenger code, simple create a new package inside the
shootout folder, write a Copy with the above proposed signature and insert
it into the shootout.go contenders
<https://github.com/karalabe/bufioprop/blob/master/shootout/shootout.go#L22>
  variable.

*Final notes:*

A further solution was proposed by Jan Mercl
<https://groups.google.com/d/msg/golang-nuts/434c3YInH_M/5gyVBUOL69IJ>,
which if I understood correctly entailed reading chunks of data on one
thread, and passing those chunks through a channel to a writer thread.
Although this indeed works, the disadvantages compared to my proposal are:

    - If data chunks are placed into the channel immediately after being
    read, then it's hard to control the total buffer size, as the reads may be
    of arbitrary length (and the channel limits chunk counts, not sizes).
    - If data chunks are first accumulated into larger pieces, and only
    queued for writing afterwards, then there will be a delay between incoming
    data and outgoing data, even though it's available already. Bigger issue
    still if the larder piece is not yet full but no more data arrives for a
    while, essentially stalling the writer for nothing.
    - All read/write operations need syncing through a channel, potentially
    having a minor performance hit (debatable if it's a big enough problem).
    - There are hidden costs in memory allocations or buffer reuses, based
    on the internal implementation.

    These issues could probably be solved one way or another, but my point
    still stands, that a proper solution is non-trivial.

I would welcome constructive feedback, and also aiming at the core
developers, please give it a deeper thought as to whether this would be
worthwhile to get into the libs.

Thanks,
   Peter

PS: I invite anyone to propose other solutions (maybe simpler ones, maybe I
missed something in the libs that would make this trivial). *However*, the
reason I've spent so much time on preparing a go gettable implementation
and an associated simulator/shootout is because the underlying issue is not
trivial. Please verify that your solution indeed passes the shootout before
dismissing the need for this proposal.

--
You received this message because you are subscribed to the Google Groups "golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Search Discussions

  • Jan Mercl at Jan 29, 2015 at 11:37 am
    On Thu Jan 29 2015 at 12:01:42 Péter Szilágyi wrote:

    Let me please vote against the proposal of including this generalization
    into the stdlib. The reasons were discussed before, to reiterate the most
    important one: proper error handling is not possible for the general case.

    Wrt to the proposed code at [0]: That implementation is IMO not acceptable
    as it's guaranteed to leak goroutines as soon as either src or dst concrete
    implementation(s) is/are a network connection lacking a timeout and the
    remote party ceases to continue communication - which is virtually
    guaranteed to happen in real life, sooner or later. (Note the connection
    with the claim at end of the preceding paragraph.)

    [0]: http://github.com/karalabe/bufioprop

    -j

    PS: I'm probably missing something but the proposed code seems to not
    overlap the receiving and sending operations above a single chunk. I'd
    expect simply a buffered channel of chunks (chan []byte) in between the rx
    and tx goroutines (that's the mechanism I was thinking about when I said
    dozen-or-so lines.). And there should probably a configurable limit on the
    combined size of the chunks in the channel.

    --
    You received this message because you are subscribed to the Google Groups "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/d/optout.
  • Péter Szilágyi at Jan 29, 2015 at 11:53 am

    Let me please vote against the proposal of including this generalization
    into the stdlib. The reasons were discussed before, to reiterate the most
    important one: proper error handling is not possible for the general case.
    The proposed bufio.Copy is analogous (and meant to be) to io.Copy. Whatever
    you can do with io.Copy, you should be able to do with bufio.Copy too.
    Nothing more.

    Wrt to the proposed code at [0]: That implementation is IMO not acceptable
    as it's guaranteed to leak goroutines as soon as either src or dst concrete
    implementation(s) is/are a network connection lacking a timeout and the
    remote party ceases to continue communication - which is virtually
    guaranteed to happen in real life, sooner or later. (Note the connection
    with the claim at end of the preceding paragraph.)
    In this respect, my proposed bufio.Copy behaves exactly as io.Copy (unless
    there's a bug somewhere). It will block and hog whatever resource it holds
    until the copy finishes. The same way it is not the responsibility of
    io.Copy to handle errors, it should not be the responsibility of
    bufio.Copy. It is a simplification over a potentially complex
    implementation.

    PS: I'm probably missing something but the proposed code seems to not
    overlap the receiving and sending operations above a single chunk. I'd
    expect simply a buffered channel of chunks (chan []byte) in between the rx
    and tx goroutines (that's the mechanism I was thinking about when I said
    dozen-or-so lines.). And there should probably a configurable limit on the
    combined size of the chunks in the channel.
    Implementation wise there could be refinements. My current one indeed
    doesn't read directly into the internal buffer but goes through a smaller
    intermediate chunk (the same way io.Copy does). This can be refined as need
    be, it's meant as a starting point if it's decided to be included.

    I've discussed the potential disadvantages of the chunk-channel approach at
    the end of my proposal, though since I don't have a concrete code to look
    at, it's hard to argue against a conceptual idea. If you;d be willing to
    write up a working solution, we could debate if my implementation is worth
    the bother or not.

    --
    You received this message because you are subscribed to the Google Groups "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/d/optout.
  • Yy at Jan 29, 2015 at 1:36 pm

    On 29 January 2015 at 12:01, Péter Szilágyi wrote:
    All in all, in order to handle bursty readers and/or writers, a buffer needs
    to be placed in between the differing read and writer operations, not before
    or after.
    Is there any problem with just having the buffer in the middle and
    writing to / reading from it?

    I have tried this idea in the shootout (great way to make your point!)
    and I get the same results as with bufio.Copy:

    http://play.golang.org/p/bCwjF7_4p0

    I have decided to ignore the size variable and just let grow a
    bytes.Buffer with initial size 0. Ideally, you would take a Buffer
    variable (or define the function as a method of Buffer). Then, you
    could avoid allocating a new buffer if you already have one at hand.


    --
    - yiyus || JGL .

    --
    You received this message because you are subscribed to the Google Groups "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/d/optout.
  • Egon at Jan 29, 2015 at 1:40 pm
    Here's https://gist.github.com/egonelbre/5600f3de19daec1ba055 a version
    that only uses a single buffer for moving data. Although I wasn't currently
    able to figure out how to get rid of the busy waiting; there's probably a
    simple answer to that didn't occur to me.

    + Egon
    On Thursday, 29 January 2015 13:01:47 UTC+2, Péter Szilágyi wrote:

    Hi all,

    This is the formalization of a previous thread
    <https://groups.google.com/forum/#!topic/golang-nuts/434c3YInH_M> in
    which I'll try to make a case for including an additional copy primitive
    into the standard libs. The reason I propose its inclusion is because it is
    non-trivial to write, and apparently - based on the previous discussion -,
    people find it really hard to realize that/why simpler solutions do not
    work as intended.

    *Problem to solve:*

    Go makes it very easy to connect readers with writers and copy/pipe data
    from one endpoint to the other. In an average case where the reader can
    produce data in a similar throughput and stability as the writer, this
    solution works as intended.

    However, if the reader produces data in bursts, or the writer consumes
    data in bursts, the existing solutions break down, because the whole
    pipeline is synchronous: if the writer cannot accept data for a while, the
    reader won't fetch, even if there's available. Similarly if the reader
    produces in periodic bursts, but the writer cannot consume immediately,
    then the reader stalls, even though it might take it a long time to deliver
    the next burst.

    Two concrete examples where this scenario appeared:

    - *Streaming a file from one remote location to another.*
    The reader/downloader can be considered fairly stable, producing data
    at a constant rate. The writer/uploader however works in bursts, as if
    first collects a larger chunk of data and only uploads afterwards (e.g.
    gsutils collects ~100MB worth of data). The issue is, that after the
    uploader's buffer fills up, it blocks further writes until it finishes
    processing. But if the writer is blocked, the reader will block too
    (nowhere to put the data), and the download stalls.

    - *Generating a data block and uploading to a remote location.*
    If we have a non-streaming data generator (e.g. compress a file in a
    non-streaming way), then it will most probably generate large chunks of
    data periodically. Compared to this, the uploader may be considered stable.
    However, while the uploader is stably but slowly pushing the data, the
    generator stops potentially long running tasks, since the writer is not
    accepting new data.

    *Why existing constructs don't work:*

    Per the previous thread, people readily jump to buffered readers/writers
    and pipes, thinking that they must work, but they don't realize why they
    *cannot* work:

    - *Buffered readers are synchronous and blocking.*
    The goal of a buffered reader is to provide peeking capabilities for
    an input stream, and to store some data that arrived but we haven't
    consumed. The issue is, that if there is something already in the buffer,
    newly arriving data won't be pushed in until the existing data is consumed
    (even if the buffer is mostly empty). This means that a buffered reader
    will still stall reading, even though it has the capacity to accept it.

    - *Buffered writers are synchronous and blocking.*
    The goal of a buffered writer is to save potentially expensive write
    operations by accumulating arriving data, and only forwarding it when
    enough was collected. However, the moment the buffer is full and/or a flush
    is executed, the writer is completely blocked until the data in its
    entirety can be transferred. But this means, that anything that streams
    data into a writer immediately stalls.

    - *Pipes are non-buffered*
    Piping an input stream into an output stream will not work, as neither
    io.Pipe nor os.Pipe uses buffered pipes (yes, they do use 64K buffers, but
    that is not nearly enough to cover these issues, and they are not
    modifyable), so a pipe doesn't really do anything more that a simple copy.

    All in all, in order to handle bursty readers and/or writers, a buffer
    needs to be placed *in between* the differing read and writer operations,
    not before or after.

    *My proposal:*

    My proposed solution to these problems is the introduction of a
    specialized copy operation into the bufio package, that would run on two
    separate threads: one consuming data from the reader and feeding it into an
    internal buffer of configurable size; the other one consuming data from the
    internal buffer and feeding it to the writer endpoint. The goal of this
    mechanism is to completely isolate the reader and writer threads, allowing
    each to execute even if the other one is temporarily blocked.

    The signature of the copy would be analogous to the io.Copy, just with the
    configurable buffer size added:

    bufio.Copy(dst io.Writer, src io.Reader, buffer int) (written int64, err
    error)

    Internally the operation would be based on a single circular buffer with
    both reader and writer threads using atomic operations for data handling,
    only resorting to channels when the buffer is full or empty (since then one
    thread must obviously block). The solution would also not require any
    memory allocations beyond the initial buffer setup, making arbitrarily long
    running copy operations GC friendly/free.

    *Proposed implementation:*

    I've written up an implementation for the above mentioned Copy operation.
    In both algorithmic construction and naming conventions (internal
    variables) follows the io.Copy implementation, however consider it a
    starting point for further refinements.

    The implementation and some fairly trivial tests have been included here:

    $ go get github.com/karalabe/bufioprop

    Furthermore, to prove my point that existing constructs and even other
    simple-looking solutions don't work as intended, I've written a small
    *shootout* code simulating three copy scenarios:

    - Classical streaming copy where the source and sink have a similar
    throughput and production/consumption style.
    - Stable source producing data at a constant rate, but a bursty sink,
    accepting big batches periodically. The overall throughput of the two
    endpoints are the same, only the production/consumption cycles and data
    chunks are different.
    - Bursty source producing big data chunks in rare occasions, and a
    stable sink consuming data at a constant rate. Again, the overall
    throughput of the two endpoints are the same, only the
    production/consumption cycles and data chunks are different.

    You can run these tests via:

    $ go get github.com/karalabe/bufioprop/shootout
    $ shootout

    Stable input, stable output:
    io.Copy: 3.38504052s 10.666667 mbps.
    [!] bufio.Copy: 3.37012021s 10.666667 mbps.
    rogerpeppe.Copy: 3.414476536s 10.666667 mbps.
    mattharden.Copy: 6.368713887s 5.333333 mbps.

    Stable input, bursty output:
    io.Copy: 6.251177787s 5.333333 mbps.
    [!] bufio.Copy: 3.387935437s 10.666667 mbps.
    rogerpeppe.Copy: 5.98428305s 6.400000 mbps.
    mattharden.Copy: 6.250739081s 5.333333 mbps.

    Bursty input, stable output:
    io.Copy: 6.25889809s 5.333333 mbps.
    [!] bufio.Copy: 3.347354357s 10.666667 mbps.
    rogerpeppe.Copy: 5.999921216s 6.400000 mbps.
    mattharden.Copy: 3.473998412s 10.666667 mbps.

    To add your own challenger code, simple create a new package inside the
    shootout folder, write a Copy with the above proposed signature and insert
    it into the shootout.go contenders
    <https://github.com/karalabe/bufioprop/blob/master/shootout/shootout.go#L22>
    variable.

    *Final notes:*

    A further solution was proposed by Jan Mercl
    <https://groups.google.com/d/msg/golang-nuts/434c3YInH_M/5gyVBUOL69IJ>,
    which if I understood correctly entailed reading chunks of data on one
    thread, and passing those chunks through a channel to a writer thread.
    Although this indeed works, the disadvantages compared to my proposal are:

    - If data chunks are placed into the channel immediately after being
    read, then it's hard to control the total buffer size, as the reads may be
    of arbitrary length (and the channel limits chunk counts, not sizes).
    - If data chunks are first accumulated into larger pieces, and only
    queued for writing afterwards, then there will be a delay between incoming
    data and outgoing data, even though it's available already. Bigger issue
    still if the larder piece is not yet full but no more data arrives for a
    while, essentially stalling the writer for nothing.
    - All read/write operations need syncing through a channel,
    potentially having a minor performance hit (debatable if it's a big enough
    problem).
    - There are hidden costs in memory allocations or buffer reuses, based
    on the internal implementation.

    These issues could probably be solved one way or another, but my point
    still stands, that a proper solution is non-trivial.

    I would welcome constructive feedback, and also aiming at the core
    developers, please give it a deeper thought as to whether this would be
    worthwhile to get into the libs.

    Thanks,
    Peter

    PS: I invite anyone to propose other solutions (maybe simpler ones, maybe
    I missed something in the libs that would make this trivial). *However*,
    the reason I've spent so much time on preparing a go gettable
    implementation and an associated simulator/shootout is because the
    underlying issue is not trivial. Please verify that your solution indeed
    passes the shootout before dismissing the need for this proposal.
    --
    You received this message because you are subscribed to the Google Groups "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/d/optout.
  • Egon at Jan 29, 2015 at 1:41 pm

    On Thursday, 29 January 2015 15:40:26 UTC+2, Egon wrote:
    Here's https://gist.github.com/egonelbre/5600f3de19daec1ba055 a version
    that only uses a single buffer for moving data. Although I wasn't currently
    able to figure out how to get rid of the busy waiting; there's probably a
    simple answer to that didn't occur to me.
    Also there might be some +-1 errors lurking around.

    + Egon
    On Thursday, 29 January 2015 13:01:47 UTC+2, Péter Szilágyi wrote:

    Hi all,

    This is the formalization of a previous thread
    <https://groups.google.com/forum/#!topic/golang-nuts/434c3YInH_M> in
    which I'll try to make a case for including an additional copy primitive
    into the standard libs. The reason I propose its inclusion is because it is
    non-trivial to write, and apparently - based on the previous discussion -,
    people find it really hard to realize that/why simpler solutions do not
    work as intended.

    *Problem to solve:*

    Go makes it very easy to connect readers with writers and copy/pipe data
    from one endpoint to the other. In an average case where the reader can
    produce data in a similar throughput and stability as the writer, this
    solution works as intended.

    However, if the reader produces data in bursts, or the writer consumes
    data in bursts, the existing solutions break down, because the whole
    pipeline is synchronous: if the writer cannot accept data for a while, the
    reader won't fetch, even if there's available. Similarly if the reader
    produces in periodic bursts, but the writer cannot consume immediately,
    then the reader stalls, even though it might take it a long time to deliver
    the next burst.

    Two concrete examples where this scenario appeared:

    - *Streaming a file from one remote location to another.*
    The reader/downloader can be considered fairly stable, producing data
    at a constant rate. The writer/uploader however works in bursts, as if
    first collects a larger chunk of data and only uploads afterwards (e.g.
    gsutils collects ~100MB worth of data). The issue is, that after the
    uploader's buffer fills up, it blocks further writes until it finishes
    processing. But if the writer is blocked, the reader will block too
    (nowhere to put the data), and the download stalls.

    - *Generating a data block and uploading to a remote location.*
    If we have a non-streaming data generator (e.g. compress a file in a
    non-streaming way), then it will most probably generate large chunks of
    data periodically. Compared to this, the uploader may be considered stable.
    However, while the uploader is stably but slowly pushing the data, the
    generator stops potentially long running tasks, since the writer is not
    accepting new data.

    *Why existing constructs don't work:*

    Per the previous thread, people readily jump to buffered readers/writers
    and pipes, thinking that they must work, but they don't realize why they
    *cannot* work:

    - *Buffered readers are synchronous and blocking.*
    The goal of a buffered reader is to provide peeking capabilities for
    an input stream, and to store some data that arrived but we haven't
    consumed. The issue is, that if there is something already in the buffer,
    newly arriving data won't be pushed in until the existing data is consumed
    (even if the buffer is mostly empty). This means that a buffered reader
    will still stall reading, even though it has the capacity to accept it.

    - *Buffered writers are synchronous and blocking.*
    The goal of a buffered writer is to save potentially expensive write
    operations by accumulating arriving data, and only forwarding it when
    enough was collected. However, the moment the buffer is full and/or a flush
    is executed, the writer is completely blocked until the data in its
    entirety can be transferred. But this means, that anything that streams
    data into a writer immediately stalls.

    - *Pipes are non-buffered*
    Piping an input stream into an output stream will not work, as
    neither io.Pipe nor os.Pipe uses buffered pipes (yes, they do use 64K
    buffers, but that is not nearly enough to cover these issues, and they are
    not modifyable), so a pipe doesn't really do anything more that a simple
    copy.

    All in all, in order to handle bursty readers and/or writers, a buffer
    needs to be placed *in between* the differing read and writer
    operations, not before or after.

    *My proposal:*

    My proposed solution to these problems is the introduction of a
    specialized copy operation into the bufio package, that would run on two
    separate threads: one consuming data from the reader and feeding it into an
    internal buffer of configurable size; the other one consuming data from the
    internal buffer and feeding it to the writer endpoint. The goal of this
    mechanism is to completely isolate the reader and writer threads, allowing
    each to execute even if the other one is temporarily blocked.

    The signature of the copy would be analogous to the io.Copy, just with
    the configurable buffer size added:

    bufio.Copy(dst io.Writer, src io.Reader, buffer int) (written int64, err
    error)

    Internally the operation would be based on a single circular buffer with
    both reader and writer threads using atomic operations for data handling,
    only resorting to channels when the buffer is full or empty (since then one
    thread must obviously block). The solution would also not require any
    memory allocations beyond the initial buffer setup, making arbitrarily long
    running copy operations GC friendly/free.

    *Proposed implementation:*

    I've written up an implementation for the above mentioned Copy operation.
    In both algorithmic construction and naming conventions (internal
    variables) follows the io.Copy implementation, however consider it a
    starting point for further refinements.

    The implementation and some fairly trivial tests have been included here:

    $ go get github.com/karalabe/bufioprop

    Furthermore, to prove my point that existing constructs and even other
    simple-looking solutions don't work as intended, I've written a small
    *shootout* code simulating three copy scenarios:

    - Classical streaming copy where the source and sink have a similar
    throughput and production/consumption style.
    - Stable source producing data at a constant rate, but a bursty sink,
    accepting big batches periodically. The overall throughput of the two
    endpoints are the same, only the production/consumption cycles and data
    chunks are different.
    - Bursty source producing big data chunks in rare occasions, and a
    stable sink consuming data at a constant rate. Again, the overall
    throughput of the two endpoints are the same, only the
    production/consumption cycles and data chunks are different.

    You can run these tests via:

    $ go get github.com/karalabe/bufioprop/shootout
    $ shootout

    Stable input, stable output:
    io.Copy: 3.38504052s 10.666667 mbps.
    [!] bufio.Copy: 3.37012021s 10.666667 mbps.
    rogerpeppe.Copy: 3.414476536s 10.666667 mbps.
    mattharden.Copy: 6.368713887s 5.333333 mbps.

    Stable input, bursty output:
    io.Copy: 6.251177787s 5.333333 mbps.
    [!] bufio.Copy: 3.387935437s 10.666667 mbps.
    rogerpeppe.Copy: 5.98428305s 6.400000 mbps.
    mattharden.Copy: 6.250739081s 5.333333 mbps.

    Bursty input, stable output:
    io.Copy: 6.25889809s 5.333333 mbps.
    [!] bufio.Copy: 3.347354357s 10.666667 mbps.
    rogerpeppe.Copy: 5.999921216s 6.400000 mbps.
    mattharden.Copy: 3.473998412s 10.666667 mbps.

    To add your own challenger code, simple create a new package inside the
    shootout folder, write a Copy with the above proposed signature and insert
    it into the shootout.go contenders
    <https://github.com/karalabe/bufioprop/blob/master/shootout/shootout.go#L22>
    variable.

    *Final notes:*

    A further solution was proposed by Jan Mercl
    <https://groups.google.com/d/msg/golang-nuts/434c3YInH_M/5gyVBUOL69IJ>,
    which if I understood correctly entailed reading chunks of data on one
    thread, and passing those chunks through a channel to a writer thread.
    Although this indeed works, the disadvantages compared to my proposal are:

    - If data chunks are placed into the channel immediately after being
    read, then it's hard to control the total buffer size, as the reads may be
    of arbitrary length (and the channel limits chunk counts, not sizes).
    - If data chunks are first accumulated into larger pieces, and only
    queued for writing afterwards, then there will be a delay between incoming
    data and outgoing data, even though it's available already. Bigger issue
    still if the larder piece is not yet full but no more data arrives for a
    while, essentially stalling the writer for nothing.
    - All read/write operations need syncing through a channel,
    potentially having a minor performance hit (debatable if it's a big enough
    problem).
    - There are hidden costs in memory allocations or buffer reuses,
    based on the internal implementation.

    These issues could probably be solved one way or another, but my
    point still stands, that a proper solution is non-trivial.

    I would welcome constructive feedback, and also aiming at the core
    developers, please give it a deeper thought as to whether this would be
    worthwhile to get into the libs.

    Thanks,
    Peter

    PS: I invite anyone to propose other solutions (maybe simpler ones, maybe
    I missed something in the libs that would make this trivial). *However*,
    the reason I've spent so much time on preparing a go gettable
    implementation and an associated simulator/shootout is because the
    underlying issue is not trivial. Please verify that your solution indeed
    passes the shootout before dismissing the need for this proposal.
    --
    You received this message because you are subscribed to the Google Groups "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/d/optout.
  • Péter Szilágyi at Jan 29, 2015 at 2:05 pm
    @yy There are various issues with your solution:

        - The buffer currently is unbounded, how would you limit it to a fixed
        size?
        - bytes.buffer is not thread safe, so if you run with GOMAXPROCS>1, it
        will probably corrupt your data (try testing it by replacing my copy
        call with yours
        <https://github.com/karalabe/bufioprop/blob/master/bufio_test.go#L28>),
        and run go test -cpu=2 (note, I haven't actually tried it, but it must
        eventually fail :P).

    @Egon I'll have a look and get back. You could address the busy wait the
    same way I did, through a signal channel.

    --
    You received this message because you are subscribed to the Google Groups "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/d/optout.
  • Péter Szilágyi at Jan 29, 2015 at 2:09 pm
    @Egon:

    The third "shootout" failed with:
    egonelbre.Copy: operation failed: have n 32505856, want n 33554432, err
    <nil>.

    Inserting your version into the bufio tests I've provided fails with:
    --- FAIL: TestCopy (1.60s)
             bufio_test.go:33: data length mismatch: have 217862, want 134217728.

    On Thu, Jan 29, 2015 at 4:05 PM, Péter Szilágyi wrote:

    @yy There are various issues with your solution:

    - The buffer currently is unbounded, how would you limit it to a fixed
    size?
    - bytes.buffer is not thread safe, so if you run with GOMAXPROCS>1, it
    will probably corrupt your data (try testing it by replacing my copy
    call with yours
    <https://github.com/karalabe/bufioprop/blob/master/bufio_test.go#L28>),
    and run go test -cpu=2 (note, I haven't actually tried it, but it must
    eventually fail :P).

    @Egon I'll have a look and get back. You could address the busy wait the
    same way I did, through a signal channel.
    --
    You received this message because you are subscribed to the Google Groups "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/d/optout.
  • Egon at Jan 29, 2015 at 2:17 pm
    Yup, already noticed... the bug, currently debugging it.

    Also you may want to set the benchmark constant to something more random -
    a power of two is probably a too good case.
    On Thursday, 29 January 2015 16:09:45 UTC+2, Péter Szilágyi wrote:

    @Egon:

    The third "shootout" failed with:
    egonelbre.Copy: operation failed: have n 32505856, want n 33554432, err
    <nil>.

    Inserting your version into the bufio tests I've provided fails with:
    --- FAIL: TestCopy (1.60s)
    bufio_test.go:33: data length mismatch: have 217862, want
    134217728.


    On Thu, Jan 29, 2015 at 4:05 PM, Péter Szilágyi <pet...@gmail.com
    <javascript:>> wrote:
    @yy There are various issues with your solution:

    - The buffer currently is unbounded, how would you limit it to a
    fixed size?
    - bytes.buffer is not thread safe, so if you run with GOMAXPROCS>1,
    it will probably corrupt your data (try testing it by replacing my
    copy call with yours
    <https://github.com/karalabe/bufioprop/blob/master/bufio_test.go#L28>),
    and run go test -cpu=2 (note, I haven't actually tried it, but it must
    eventually fail :P).

    @Egon I'll have a look and get back. You could address the busy wait the
    same way I did, through a signal channel.
    --
    You received this message because you are subscribed to the Google Groups "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/d/optout.
  • Péter Szilágyi at Jan 29, 2015 at 2:33 pm
    I'm trying to extend the shootout a bit to catch various bugs too, not just
    through tests. Will push a new version soon :)
    On Thu, Jan 29, 2015 at 4:17 PM, Egon wrote:

    Yup, already noticed... the bug, currently debugging it.

    Also you may want to set the benchmark constant to something more random -
    a power of two is probably a too good case.
    On Thursday, 29 January 2015 16:09:45 UTC+2, Péter Szilágyi wrote:

    @Egon:

    The third "shootout" failed with:
    egonelbre.Copy: operation failed: have n 32505856, want n 33554432, err
    <nil>.

    Inserting your version into the bufio tests I've provided fails with:
    --- FAIL: TestCopy (1.60s)
    bufio_test.go:33: data length mismatch: have 217862, want
    134217728.

    On Thu, Jan 29, 2015 at 4:05 PM, Péter Szilágyi wrote:

    @yy There are various issues with your solution:

    - The buffer currently is unbounded, how would you limit it to a
    fixed size?
    - bytes.buffer is not thread safe, so if you run with GOMAXPROCS>1,
    it will probably corrupt your data (try testing it by replacing my
    copy call with yours
    <https://github.com/karalabe/bufioprop/blob/master/bufio_test.go#L28>),
    and run go test -cpu=2 (note, I haven't actually tried it, but it must
    eventually fail :P).

    @Egon I'll have a look and get back. You could address the busy wait the
    same way I did, through a signal channel.
    --
    You received this message because you are subscribed to the Google Groups
    "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send an
    email to golang-nuts+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/d/optout.
    --
    You received this message because you are subscribed to the Google Groups "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/d/optout.
  • Péter Szilágyi at Jan 29, 2015 at 2:34 pm
    I've pushed a new version of the shootout:

        - Added the two contender solutions from yy and egon
        - Added a high throughput threading tests to make sure index and
        threading issues are caught
        - In the same light, set gomaxprocs to 8 by default

    Current output is:

    go run shootout.go

    High throughput tests:
             io.Copy: test passed.
      [!] bufio.Copy: test passed.
    rogerpeppe.Copy: test passed.
    mattharden.Copy: test passed.
          yiyus.Copy: corrupt data on the output.
      egonelbre.Copy: data length mismatch: have 221132, want 33554432.

    Stable input, stable output:
             io.Copy: 3.398112152s 10.666667 mbps.
      [!] bufio.Copy: 3.383682682s 10.666667 mbps.
    rogerpeppe.Copy: 3.410368724s 10.666667 mbps.
    mattharden.Copy: 6.394555705s 5.333333 mbps.

    Stable input, bursty output:
             io.Copy: 6.268870046s 5.333333 mbps.
      [!] bufio.Copy: 3.393604252s 10.666667 mbps.
    rogerpeppe.Copy: 5.960128813s 6.400000 mbps.
    mattharden.Copy: 6.265215525s 5.333333 mbps.

    Bursty input, stable output:
             io.Copy: 6.254404989s 5.333333 mbps.
      [!] bufio.Copy: 3.343740455s 10.666667 mbps.
    rogerpeppe.Copy: 5.959544487s 6.400000 mbps.
    mattharden.Copy: 3.482824639s 10.666667 mbps.

    PS: I've noticed a rare failure in my code too, I'm investigating.
    On Thu, Jan 29, 2015 at 4:21 PM, Péter Szilágyi wrote:

    I'm trying to extend the shootout a bit to catch various bugs too, not
    just through tests. Will push a new version soon :)
    On Thu, Jan 29, 2015 at 4:17 PM, Egon wrote:

    Yup, already noticed... the bug, currently debugging it.

    Also you may want to set the benchmark constant to something more random
    - a power of two is probably a too good case.
    On Thursday, 29 January 2015 16:09:45 UTC+2, Péter Szilágyi wrote:

    @Egon:

    The third "shootout" failed with:
    egonelbre.Copy: operation failed: have n 32505856, want n 33554432, err
    <nil>.

    Inserting your version into the bufio tests I've provided fails with:
    --- FAIL: TestCopy (1.60s)
    bufio_test.go:33: data length mismatch: have 217862, want
    134217728.


    On Thu, Jan 29, 2015 at 4:05 PM, Péter Szilágyi <pet...@gmail.com>
    wrote:
    @yy There are various issues with your solution:

    - The buffer currently is unbounded, how would you limit it to a
    fixed size?
    - bytes.buffer is not thread safe, so if you run with GOMAXPROCS>1,
    it will probably corrupt your data (try testing it by replacing my
    copy call with yours
    <https://github.com/karalabe/bufioprop/blob/master/bufio_test.go#L28>),
    and run go test -cpu=2 (note, I haven't actually tried it, but it must
    eventually fail :P).

    @Egon I'll have a look and get back. You could address the busy wait
    the same way I did, through a signal channel.
    --
    You received this message because you are subscribed to the Google Groups
    "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send an
    email to golang-nuts+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/d/optout.
    --
    You received this message because you are subscribed to the Google Groups "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/d/optout.
  • Egon at Jan 29, 2015 at 3:12 pm

    On Thursday, 29 January 2015 16:34:59 UTC+2, Péter Szilágyi wrote:
    I've pushed a new version of the shootout:

    - Added the two contender solutions from yy and egon
    - Added a high throughput threading tests to make sure index and
    threading issues are caught
    - In the same light, set gomaxprocs to 8 by default

    Current output is:

    go run shootout.go

    High throughput tests:
    io.Copy: test passed.
    [!] bufio.Copy: test passed.
    rogerpeppe.Copy: test passed.
    mattharden.Copy: test passed.
    yiyus.Copy: corrupt data on the output.
    egonelbre.Copy: data length mismatch: have 221132, want 33554432.
    Fixed & non-busy: https://gist.github.com/egonelbre/5600f3de19daec1ba055

    Also:
    Stable input, stable output:
             io.Copy: 3.2751874s 10.666667 mbps.
      [!] bufio.Copy: 3.270187s 10.666667 mbps.
    rogerpeppe.Copy: 3.2711871s 10.666667 mbps.
    mattharden.Copy: 6.2123553s 5.333333 mbps.
      egonelbre.Copy: 3.2741873s 10.666667 mbps.

    Stable input, bursty output:
             io.Copy: 6.154352s 5.333333 mbps.
      [!] bufio.Copy: 3.2941884s 10.666667 mbps.
    rogerpeppe.Copy: 6.1523519s 5.333333 mbps.
    mattharden.Copy: 6.1513519s 5.333333 mbps.
      egonelbre.Copy: 3.2921883s 10.666667 mbps.

    Bursty input, stable output:
             io.Copy: 6.1513518s 5.333333 mbps.
      [!] bufio.Copy: 3.2711871s 10.666667 mbps.
    rogerpeppe.Copy: 6.1523519s 5.333333 mbps.
    mattharden.Copy: 3.3761931s 10.666667 mbps.
      egonelbre.Copy: 3.2711871s 10.666667 mbps.

    With irregular read/write sizes e.g. 100*999:

    Stable input, stable output:
             io.Copy: 3.3511917s 10.666667 mbps.
      [!] bufio.Copy: 3.3521917s 10.666667 mbps.
    rogerpeppe.Copy: 3.3511917s 10.666667 mbps.
    mattharden.Copy: 6.3903655s 5.333333 mbps.
      egonelbre.Copy: 3.3511917s 10.666667 mbps.

    Stable input, bursty output:
             io.Copy: 6.3213615s 5.333333 mbps.
      [!] bufio.Copy: 3.390194s 10.666667 mbps.
    rogerpeppe.Copy: 6.3203615s 5.333333 mbps.
    mattharden.Copy: 6.3413627s 5.333333 mbps.
      egonelbre.Copy: 3.3901939s 10.666667 mbps.

    Bursty input, stable output:
             io.Copy: 6.3243617s 5.333333 mbps.
      [!] bufio.Copy: 3.3551919s 10.666667 mbps.
    rogerpeppe.Copy: 6.3213616s 5.333333 mbps.
    mattharden.Copy: 3.5512031s 10.666667 mbps.
      egonelbre.Copy: 3.3501916s 10.666667 mbps.

    --
    You received this message because you are subscribed to the Google Groups "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/d/optout.
  • Péter Szilágyi at Jan 29, 2015 at 3:26 pm
    Ok, I also fixed my data race (pushed all updates, including your new
    solution) :D

    So, now the merged solutions are:

    High throughput tests:
             io.Copy: test passed.
      [!] bufio.Copy: test passed.
    rogerpeppe.Copy: test passed.
    mattharden.Copy: test passed.
          yiyus.Copy: corrupt data on the output.
      egonelbre.Copy: test passed.

    Stable input, stable output:
             io.Copy: 3.377904114s 10.666667 mbps.
      [!] bufio.Copy: 3.372692234s 10.666667 mbps.
    rogerpeppe.Copy: 3.38447648s 10.666667 mbps.
    mattharden.Copy: 6.380075266s 5.333333 mbps.
      egonelbre.Copy: 3.376019856s 10.666667 mbps.

    Stable input, bursty output:
             io.Copy: 6.26829765s 5.333333 mbps.
      [!] bufio.Copy: 3.397446603s 10.666667 mbps.
    rogerpeppe.Copy: 5.963495089s 6.400000 mbps.
    mattharden.Copy: 6.266678561s 5.333333 mbps.
      egonelbre.Copy: 3.395984883s 10.666667 mbps.

    Bursty input, stable output:
             io.Copy: 6.287110346s 5.333333 mbps.
      [!] bufio.Copy: 3.363437488s 10.666667 mbps.
    rogerpeppe.Copy: 5.979492432s 6.400000 mbps.
    mattharden.Copy: 3.484705393s 10.666667 mbps.
      egonelbre.Copy: 3.357674971s 10.666667 mbps.

    Let's see if I can add some benchmarks too to see how they perform if not
    ratelimited.

    Anybody up to challenge the complexity? As far as I see it Egon's solution
    is in essence the same approach :)
    On Thu, Jan 29, 2015 at 5:12 PM, Egon wrote:


    On Thursday, 29 January 2015 16:34:59 UTC+2, Péter Szilágyi wrote:

    I've pushed a new version of the shootout:

    - Added the two contender solutions from yy and egon
    - Added a high throughput threading tests to make sure index and
    threading issues are caught
    - In the same light, set gomaxprocs to 8 by default

    Current output is:

    go run shootout.go

    High throughput tests:
    io.Copy: test passed.
    [!] bufio.Copy: test passed.
    rogerpeppe.Copy: test passed.
    mattharden.Copy: test passed.
    yiyus.Copy: corrupt data on the output.
    egonelbre.Copy: data length mismatch: have 221132, want 33554432.
    Fixed & non-busy: https://gist.github.com/egonelbre/5600f3de19daec1ba055

    Also:
    Stable input, stable output:
    io.Copy: 3.2751874s 10.666667 mbps.
    [!] bufio.Copy: 3.270187s 10.666667 mbps.
    rogerpeppe.Copy: 3.2711871s 10.666667 mbps.
    mattharden.Copy: 6.2123553s 5.333333 mbps.
    egonelbre.Copy: 3.2741873s 10.666667 mbps.

    Stable input, bursty output:
    io.Copy: 6.154352s 5.333333 mbps.
    [!] bufio.Copy: 3.2941884s 10.666667 mbps.
    rogerpeppe.Copy: 6.1523519s 5.333333 mbps.
    mattharden.Copy: 6.1513519s 5.333333 mbps.
    egonelbre.Copy: 3.2921883s 10.666667 mbps.

    Bursty input, stable output:
    io.Copy: 6.1513518s 5.333333 mbps.
    [!] bufio.Copy: 3.2711871s 10.666667 mbps.
    rogerpeppe.Copy: 6.1523519s 5.333333 mbps.
    mattharden.Copy: 3.3761931s 10.666667 mbps.
    egonelbre.Copy: 3.2711871s 10.666667 mbps.

    With irregular read/write sizes e.g. 100*999:

    Stable input, stable output:
    io.Copy: 3.3511917s 10.666667 mbps.
    [!] bufio.Copy: 3.3521917s 10.666667 mbps.
    rogerpeppe.Copy: 3.3511917s 10.666667 mbps.
    mattharden.Copy: 6.3903655s 5.333333 mbps.
    egonelbre.Copy: 3.3511917s 10.666667 mbps.

    Stable input, bursty output:
    io.Copy: 6.3213615s 5.333333 mbps.
    [!] bufio.Copy: 3.390194s 10.666667 mbps.
    rogerpeppe.Copy: 6.3203615s 5.333333 mbps.
    mattharden.Copy: 6.3413627s 5.333333 mbps.
    egonelbre.Copy: 3.3901939s 10.666667 mbps.

    Bursty input, stable output:
    io.Copy: 6.3243617s 5.333333 mbps.
    [!] bufio.Copy: 3.3551919s 10.666667 mbps.
    rogerpeppe.Copy: 6.3213616s 5.333333 mbps.
    mattharden.Copy: 3.5512031s 10.666667 mbps.
    egonelbre.Copy: 3.3501916s 10.666667 mbps.

    --
    You received this message because you are subscribed to the Google Groups
    "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send an
    email to golang-nuts+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/d/optout.
    --
    You received this message because you are subscribed to the Google Groups "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/d/optout.
  • Jan Mercl at Jan 29, 2015 at 3:38 pm

    On Thu Jan 29 2015 at 16:26:58 Péter Szilágyi wrote:

    Anybody up to challenge the complexity?
    See the PR.

    -j

    --
    You received this message because you are subscribed to the Google Groups "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/d/optout.
  • Egon at Jan 29, 2015 at 3:42 pm

    On Thursday, 29 January 2015 17:26:55 UTC+2, Péter Szilágyi wrote:
    Ok, I also fixed my data race (pushed all updates, including your new
    solution) :D

    So, now the merged solutions are:

    High throughput tests:
    io.Copy: test passed.
    [!] bufio.Copy: test passed.
    rogerpeppe.Copy: test passed.
    mattharden.Copy: test passed.
    yiyus.Copy: corrupt data on the output.
    egonelbre.Copy: test passed.

    Stable input, stable output:
    io.Copy: 3.377904114s 10.666667 mbps.
    [!] bufio.Copy: 3.372692234s 10.666667 mbps.
    rogerpeppe.Copy: 3.38447648s 10.666667 mbps.
    mattharden.Copy: 6.380075266s 5.333333 mbps.
    egonelbre.Copy: 3.376019856s 10.666667 mbps.

    Stable input, bursty output:
    io.Copy: 6.26829765s 5.333333 mbps.
    [!] bufio.Copy: 3.397446603s 10.666667 mbps.
    rogerpeppe.Copy: 5.963495089s 6.400000 mbps.
    mattharden.Copy: 6.266678561s 5.333333 mbps.
    egonelbre.Copy: 3.395984883s 10.666667 mbps.

    Bursty input, stable output:
    io.Copy: 6.287110346s 5.333333 mbps.
    [!] bufio.Copy: 3.363437488s 10.666667 mbps.
    rogerpeppe.Copy: 5.979492432s 6.400000 mbps.
    mattharden.Copy: 3.484705393s 10.666667 mbps.
    egonelbre.Copy: 3.357674971s 10.666667 mbps.

    Let's see if I can add some benchmarks too to see how they perform if not
    ratelimited.

    Anybody up to challenge the complexity? As far as I see it Egon's solution
    is in essence the same approach :)
    I'm using a single temporary buffer for passing the data around. Which is,
    I guess the reason why sometimes one performs better than the other. Also
    it seems the locking is different.

    Also, are you sure you haven't got a deadlock?

    When reader happens to end up in select @ 45, and writer happens to end up
    in select @ 111.

    Seems like a sequence of:
    R line 42, ba == 0 // buffer is full
    W line 154, ba := bs // managed to read everything
    W line 156 // signaling does nothing
    R line 45 // blocks
    W line 107 // ba == bs
    W line 111 // blocks

    On Thu, Jan 29, 2015 at 5:12 PM, Egon <egon...@gmail.com <javascript:>>
    wrote:
    On Thursday, 29 January 2015 16:34:59 UTC+2, Péter Szilágyi wrote:

    I've pushed a new version of the shootout:

    - Added the two contender solutions from yy and egon
    - Added a high throughput threading tests to make sure index and
    threading issues are caught
    - In the same light, set gomaxprocs to 8 by default

    Current output is:

    go run shootout.go

    High throughput tests:
    io.Copy: test passed.
    [!] bufio.Copy: test passed.
    rogerpeppe.Copy: test passed.
    mattharden.Copy: test passed.
    yiyus.Copy: corrupt data on the output.
    egonelbre.Copy: data length mismatch: have 221132, want 33554432.
    Fixed & non-busy: https://gist.github.com/egonelbre/5600f3de19daec1ba055

    Also:
    Stable input, stable output:
    io.Copy: 3.2751874s 10.666667 mbps.
    [!] bufio.Copy: 3.270187s 10.666667 mbps.
    rogerpeppe.Copy: 3.2711871s 10.666667 mbps.
    mattharden.Copy: 6.2123553s 5.333333 mbps.
    egonelbre.Copy: 3.2741873s 10.666667 mbps.

    Stable input, bursty output:
    io.Copy: 6.154352s 5.333333 mbps.
    [!] bufio.Copy: 3.2941884s 10.666667 mbps.
    rogerpeppe.Copy: 6.1523519s 5.333333 mbps.
    mattharden.Copy: 6.1513519s 5.333333 mbps.
    egonelbre.Copy: 3.2921883s 10.666667 mbps.

    Bursty input, stable output:
    io.Copy: 6.1513518s 5.333333 mbps.
    [!] bufio.Copy: 3.2711871s 10.666667 mbps.
    rogerpeppe.Copy: 6.1523519s 5.333333 mbps.
    mattharden.Copy: 3.3761931s 10.666667 mbps.
    egonelbre.Copy: 3.2711871s 10.666667 mbps.

    With irregular read/write sizes e.g. 100*999:

    Stable input, stable output:
    io.Copy: 3.3511917s 10.666667 mbps.
    [!] bufio.Copy: 3.3521917s 10.666667 mbps.
    rogerpeppe.Copy: 3.3511917s 10.666667 mbps.
    mattharden.Copy: 6.3903655s 5.333333 mbps.
    egonelbre.Copy: 3.3511917s 10.666667 mbps.

    Stable input, bursty output:
    io.Copy: 6.3213615s 5.333333 mbps.
    [!] bufio.Copy: 3.390194s 10.666667 mbps.
    rogerpeppe.Copy: 6.3203615s 5.333333 mbps.
    mattharden.Copy: 6.3413627s 5.333333 mbps.
    egonelbre.Copy: 3.3901939s 10.666667 mbps.

    Bursty input, stable output:
    io.Copy: 6.3243617s 5.333333 mbps.
    [!] bufio.Copy: 3.3551919s 10.666667 mbps.
    rogerpeppe.Copy: 6.3213616s 5.333333 mbps.
    mattharden.Copy: 3.5512031s 10.666667 mbps.
    egonelbre.Copy: 3.3501916s 10.666667 mbps.

    --
    You received this message because you are subscribed to the Google Groups
    "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send an
    email to golang-nuts...@googlegroups.com <javascript:>.
    For more options, visit https://groups.google.com/d/optout.
    --
    You received this message because you are subscribed to the Google Groups "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/d/optout.
  • Péter Szilágyi at Jan 29, 2015 at 3:46 pm
    @Jan:

    fatal error: all goroutines are asleep - deadlock!

    goroutine 1 [chan receive]:
    github.com/karalabe/bufioprop/shootout/jnml.Copy(0x7f0be6164b00,
    0xc21e742070, 0x7f0be6164b28, 0xc21e742000, 0x2615, 0x0, 0x0, 0x0)
             /work/src/github.com/karalabe/bufioprop/shootout/jnml/copy.go:70
    +0x6cb
    main.test(0xc208084000, 0x8000000, 0x8000000, 0x502c30, 0xe, 0x52dd60,
    0x1e329600)
             /work/src/github.com/karalabe/bufioprop/shootout/validator.go:21
    +0x1de
    main.main()
             /work/src/github.com/karalabe/bufioprop/shootout/shootout.go:56
    +0x300

    goroutine 33 [chan receive]:
    github.com/karalabe/bufioprop/shootout/jnml.func·001()
    <http://github.com/karalabe/bufioprop/shootout/jnml.func%C2%B7001()>
             /work/src/github.com/karalabe/bufioprop/shootout/jnml/copy.go:16
    +0x8b
    created by github.com/karalabe/bufioprop/shootout/jnml.Copy
             /work/src/github.com/karalabe/bufioprop/shootout/jnml/copy.go:29
    +0x1d2

    On Thu, Jan 29, 2015 at 5:42 PM, Egon wrote:


    On Thursday, 29 January 2015 17:26:55 UTC+2, Péter Szilágyi wrote:

    Ok, I also fixed my data race (pushed all updates, including your new
    solution) :D

    So, now the merged solutions are:

    High throughput tests:
    io.Copy: test passed.
    [!] bufio.Copy: test passed.
    rogerpeppe.Copy: test passed.
    mattharden.Copy: test passed.
    yiyus.Copy: corrupt data on the output.
    egonelbre.Copy: test passed.

    Stable input, stable output:
    io.Copy: 3.377904114s 10.666667 mbps.
    [!] bufio.Copy: 3.372692234s 10.666667 mbps.
    rogerpeppe.Copy: 3.38447648s 10.666667 mbps.
    mattharden.Copy: 6.380075266s 5.333333 mbps.
    egonelbre.Copy: 3.376019856s 10.666667 mbps.

    Stable input, bursty output:
    io.Copy: 6.26829765s 5.333333 mbps.
    [!] bufio.Copy: 3.397446603s 10.666667 mbps.
    rogerpeppe.Copy: 5.963495089s 6.400000 mbps.
    mattharden.Copy: 6.266678561s 5.333333 mbps.
    egonelbre.Copy: 3.395984883s 10.666667 mbps.

    Bursty input, stable output:
    io.Copy: 6.287110346s 5.333333 mbps.
    [!] bufio.Copy: 3.363437488s 10.666667 mbps.
    rogerpeppe.Copy: 5.979492432s 6.400000 mbps.
    mattharden.Copy: 3.484705393s 10.666667 mbps.
    egonelbre.Copy: 3.357674971s 10.666667 mbps.

    Let's see if I can add some benchmarks too to see how they perform if not
    ratelimited.

    Anybody up to challenge the complexity? As far as I see it Egon's
    solution is in essence the same approach :)
    I'm using a single temporary buffer for passing the data around. Which is,
    I guess the reason why sometimes one performs better than the other. Also
    it seems the locking is different.

    Also, are you sure you haven't got a deadlock?

    When reader happens to end up in select @ 45, and writer happens to end up
    in select @ 111.

    Seems like a sequence of:
    R line 42, ba == 0 // buffer is full
    W line 154, ba := bs // managed to read everything
    W line 156 // signaling does nothing
    R line 45 // blocks
    W line 107 // ba == bs
    W line 111 // blocks

    On Thu, Jan 29, 2015 at 5:12 PM, Egon wrote:


    On Thursday, 29 January 2015 16:34:59 UTC+2, Péter Szilágyi wrote:

    I've pushed a new version of the shootout:

    - Added the two contender solutions from yy and egon
    - Added a high throughput threading tests to make sure index and
    threading issues are caught
    - In the same light, set gomaxprocs to 8 by default

    Current output is:

    go run shootout.go

    High throughput tests:
    io.Copy: test passed.
    [!] bufio.Copy: test passed.
    rogerpeppe.Copy: test passed.
    mattharden.Copy: test passed.
    yiyus.Copy: corrupt data on the output.
    egonelbre.Copy: data length mismatch: have 221132, want 33554432.
    Fixed & non-busy: https://gist.github.com/egonelbre/5600f3de19daec1ba055

    Also:
    Stable input, stable output:
    io.Copy: 3.2751874s 10.666667 mbps.
    [!] bufio.Copy: 3.270187s 10.666667 mbps.
    rogerpeppe.Copy: 3.2711871s 10.666667 mbps.
    mattharden.Copy: 6.2123553s 5.333333 mbps.
    egonelbre.Copy: 3.2741873s 10.666667 mbps.

    Stable input, bursty output:
    io.Copy: 6.154352s 5.333333 mbps.
    [!] bufio.Copy: 3.2941884s 10.666667 mbps.
    rogerpeppe.Copy: 6.1523519s 5.333333 mbps.
    mattharden.Copy: 6.1513519s 5.333333 mbps.
    egonelbre.Copy: 3.2921883s 10.666667 mbps.

    Bursty input, stable output:
    io.Copy: 6.1513518s 5.333333 mbps.
    [!] bufio.Copy: 3.2711871s 10.666667 mbps.
    rogerpeppe.Copy: 6.1523519s 5.333333 mbps.
    mattharden.Copy: 3.3761931s 10.666667 mbps.
    egonelbre.Copy: 3.2711871s 10.666667 mbps.

    With irregular read/write sizes e.g. 100*999:

    Stable input, stable output:
    io.Copy: 3.3511917s 10.666667 mbps.
    [!] bufio.Copy: 3.3521917s 10.666667 mbps.
    rogerpeppe.Copy: 3.3511917s 10.666667 mbps.
    mattharden.Copy: 6.3903655s 5.333333 mbps.
    egonelbre.Copy: 3.3511917s 10.666667 mbps.

    Stable input, bursty output:
    io.Copy: 6.3213615s 5.333333 mbps.
    [!] bufio.Copy: 3.390194s 10.666667 mbps.
    rogerpeppe.Copy: 6.3203615s 5.333333 mbps.
    mattharden.Copy: 6.3413627s 5.333333 mbps.
    egonelbre.Copy: 3.3901939s 10.666667 mbps.

    Bursty input, stable output:
    io.Copy: 6.3243617s 5.333333 mbps.
    [!] bufio.Copy: 3.3551919s 10.666667 mbps.
    rogerpeppe.Copy: 6.3213616s 5.333333 mbps.
    mattharden.Copy: 3.5512031s 10.666667 mbps.
    egonelbre.Copy: 3.3501916s 10.666667 mbps.

    --
    You received this message because you are subscribed to the Google
    Groups "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send
    an email to golang-nuts...@googlegroups.com.
    For more options, visit https://groups.google.com/d/optout.
    --
    You received this message because you are subscribed to the Google Groups
    "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send an
    email to golang-nuts+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/d/optout.
    --
    You received this message because you are subscribed to the Google Groups "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/d/optout.
  • Péter Szilágyi at Jan 29, 2015 at 3:48 pm
    Btw, I've pushed your additions + some minor benchmarks. Have to run
    though, so cont tomorrow :)
    On Thu, Jan 29, 2015 at 5:46 PM, Péter Szilágyi wrote:

    @Jan:

    fatal error: all goroutines are asleep - deadlock!

    goroutine 1 [chan receive]:
    github.com/karalabe/bufioprop/shootout/jnml.Copy(0x7f0be6164b00,
    0xc21e742070, 0x7f0be6164b28, 0xc21e742000, 0x2615, 0x0, 0x0, 0x0)
    /work/src/github.com/karalabe/bufioprop/shootout/jnml/copy.go:70
    +0x6cb
    main.test(0xc208084000, 0x8000000, 0x8000000, 0x502c30, 0xe, 0x52dd60,
    0x1e329600)
    /work/src/github.com/karalabe/bufioprop/shootout/validator.go:21
    +0x1de
    main.main()
    /work/src/github.com/karalabe/bufioprop/shootout/shootout.go:56
    +0x300

    goroutine 33 [chan receive]:
    github.com/karalabe/bufioprop/shootout/jnml.func·001()
    <http://github.com/karalabe/bufioprop/shootout/jnml.func%C2%B7001()>
    /work/src/github.com/karalabe/bufioprop/shootout/jnml/copy.go:16
    +0x8b
    created by github.com/karalabe/bufioprop/shootout/jnml.Copy
    /work/src/github.com/karalabe/bufioprop/shootout/jnml/copy.go:29
    +0x1d2

    On Thu, Jan 29, 2015 at 5:42 PM, Egon wrote:


    On Thursday, 29 January 2015 17:26:55 UTC+2, Péter Szilágyi wrote:

    Ok, I also fixed my data race (pushed all updates, including your new
    solution) :D

    So, now the merged solutions are:

    High throughput tests:
    io.Copy: test passed.
    [!] bufio.Copy: test passed.
    rogerpeppe.Copy: test passed.
    mattharden.Copy: test passed.
    yiyus.Copy: corrupt data on the output.
    egonelbre.Copy: test passed.

    Stable input, stable output:
    io.Copy: 3.377904114s 10.666667 mbps.
    [!] bufio.Copy: 3.372692234s 10.666667 mbps.
    rogerpeppe.Copy: 3.38447648s 10.666667 mbps.
    mattharden.Copy: 6.380075266s 5.333333 mbps.
    egonelbre.Copy: 3.376019856s 10.666667 mbps.

    Stable input, bursty output:
    io.Copy: 6.26829765s 5.333333 mbps.
    [!] bufio.Copy: 3.397446603s 10.666667 mbps.
    rogerpeppe.Copy: 5.963495089s 6.400000 mbps.
    mattharden.Copy: 6.266678561s 5.333333 mbps.
    egonelbre.Copy: 3.395984883s 10.666667 mbps.

    Bursty input, stable output:
    io.Copy: 6.287110346s 5.333333 mbps.
    [!] bufio.Copy: 3.363437488s 10.666667 mbps.
    rogerpeppe.Copy: 5.979492432s 6.400000 mbps.
    mattharden.Copy: 3.484705393s 10.666667 mbps.
    egonelbre.Copy: 3.357674971s 10.666667 mbps.

    Let's see if I can add some benchmarks too to see how they perform if
    not ratelimited.

    Anybody up to challenge the complexity? As far as I see it Egon's
    solution is in essence the same approach :)
    I'm using a single temporary buffer for passing the data around. Which
    is, I guess the reason why sometimes one performs better than the other.
    Also it seems the locking is different.

    Also, are you sure you haven't got a deadlock?

    When reader happens to end up in select @ 45, and writer happens to end
    up in select @ 111.

    Seems like a sequence of:
    R line 42, ba == 0 // buffer is full
    W line 154, ba := bs // managed to read everything
    W line 156 // signaling does nothing
    R line 45 // blocks
    W line 107 // ba == bs
    W line 111 // blocks

    On Thu, Jan 29, 2015 at 5:12 PM, Egon wrote:


    On Thursday, 29 January 2015 16:34:59 UTC+2, Péter Szilágyi wrote:

    I've pushed a new version of the shootout:

    - Added the two contender solutions from yy and egon
    - Added a high throughput threading tests to make sure index and
    threading issues are caught
    - In the same light, set gomaxprocs to 8 by default

    Current output is:

    go run shootout.go

    High throughput tests:
    io.Copy: test passed.
    [!] bufio.Copy: test passed.
    rogerpeppe.Copy: test passed.
    mattharden.Copy: test passed.
    yiyus.Copy: corrupt data on the output.
    egonelbre.Copy: data length mismatch: have 221132, want 33554432.
    Fixed & non-busy: https://gist.github.com/egonelbre/
    5600f3de19daec1ba055

    Also:
    Stable input, stable output:
    io.Copy: 3.2751874s 10.666667 mbps.
    [!] bufio.Copy: 3.270187s 10.666667 mbps.
    rogerpeppe.Copy: 3.2711871s 10.666667 mbps.
    mattharden.Copy: 6.2123553s 5.333333 mbps.
    egonelbre.Copy: 3.2741873s 10.666667 mbps.

    Stable input, bursty output:
    io.Copy: 6.154352s 5.333333 mbps.
    [!] bufio.Copy: 3.2941884s 10.666667 mbps.
    rogerpeppe.Copy: 6.1523519s 5.333333 mbps.
    mattharden.Copy: 6.1513519s 5.333333 mbps.
    egonelbre.Copy: 3.2921883s 10.666667 mbps.

    Bursty input, stable output:
    io.Copy: 6.1513518s 5.333333 mbps.
    [!] bufio.Copy: 3.2711871s 10.666667 mbps.
    rogerpeppe.Copy: 6.1523519s 5.333333 mbps.
    mattharden.Copy: 3.3761931s 10.666667 mbps.
    egonelbre.Copy: 3.2711871s 10.666667 mbps.

    With irregular read/write sizes e.g. 100*999:

    Stable input, stable output:
    io.Copy: 3.3511917s 10.666667 mbps.
    [!] bufio.Copy: 3.3521917s 10.666667 mbps.
    rogerpeppe.Copy: 3.3511917s 10.666667 mbps.
    mattharden.Copy: 6.3903655s 5.333333 mbps.
    egonelbre.Copy: 3.3511917s 10.666667 mbps.

    Stable input, bursty output:
    io.Copy: 6.3213615s 5.333333 mbps.
    [!] bufio.Copy: 3.390194s 10.666667 mbps.
    rogerpeppe.Copy: 6.3203615s 5.333333 mbps.
    mattharden.Copy: 6.3413627s 5.333333 mbps.
    egonelbre.Copy: 3.3901939s 10.666667 mbps.

    Bursty input, stable output:
    io.Copy: 6.3243617s 5.333333 mbps.
    [!] bufio.Copy: 3.3551919s 10.666667 mbps.
    rogerpeppe.Copy: 6.3213616s 5.333333 mbps.
    mattharden.Copy: 3.5512031s 10.666667 mbps.
    egonelbre.Copy: 3.3501916s 10.666667 mbps.

    --
    You received this message because you are subscribed to the Google
    Groups "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send
    an email to golang-nuts...@googlegroups.com.
    For more options, visit https://groups.google.com/d/optout.
    --
    You received this message because you are subscribed to the Google Groups
    "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send an
    email to golang-nuts+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/d/optout.
    --
    You received this message because you are subscribed to the Google Groups "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/d/optout.
  • Péter Szilágyi at Jan 29, 2015 at 3:49 pm
    And just to report the current state:

    High throughput tests:
             io.Copy: test passed.
      [!] bufio.Copy: test passed.
    rogerpeppe.Copy: test passed.
    mattharden.Copy: test passed.
          yiyus.Copy: corrupt data on the output.
      egonelbre.Copy: test passed.
    ------------------------------------------------
    Stable input, stable output shootout:
             io.Copy: 3.382263678s 10.666667 mbps.
      [!] bufio.Copy: 3.382133279s 10.666667 mbps.
    rogerpeppe.Copy: 3.39823721s 10.666667 mbps.
    mattharden.Copy: 6.379035232s 5.333333 mbps.
      egonelbre.Copy: 3.372385831s 10.666667 mbps.

    Stable input, bursty output shootout:
             io.Copy: 6.261477315s 5.333333 mbps.
      [!] bufio.Copy: 3.382891506s 10.666667 mbps.
    rogerpeppe.Copy: 5.950842729s 6.400000 mbps.
      egonelbre.Copy: 3.400504271s 10.666667 mbps.

    Bursty input, stable output shootout:
      [!] bufio.Copy: 3.356421192s 10.666667 mbps.
      egonelbre.Copy: 3.378505396s 10.666667 mbps.
    ------------------------------------------------
    High throughput benchmarks:
      [!] bufio.Copy: data 32MB, buffer 333B time 393.035954ms.
      [!] bufio.Copy: data 32MB, buffer 4155B time 143.638607ms.
      [!] bufio.Copy: data 32MB, buffer 65359B time 139.441233ms.
      [!] bufio.Copy: data 32MB, buffer 1048559B time 150.569137ms.
      [!] bufio.Copy: data 32MB, buffer 16777301B time 113.844092ms.
      egonelbre.Copy: data 32MB, buffer 333B time 432.112266ms.
      egonelbre.Copy: data 32MB, buffer 4155B time 114.398071ms.
      egonelbre.Copy: data 32MB, buffer 65359B time 91.900054ms.
      egonelbre.Copy: data 32MB, buffer 1048559B time 90.73783ms.
      egonelbre.Copy: data 32MB, buffer 16777301B time 122.790374ms.

    On Thu, Jan 29, 2015 at 5:48 PM, Péter Szilágyi wrote:

    Btw, I've pushed your additions + some minor benchmarks. Have to run
    though, so cont tomorrow :)
    On Thu, Jan 29, 2015 at 5:46 PM, Péter Szilágyi wrote:

    @Jan:

    fatal error: all goroutines are asleep - deadlock!

    goroutine 1 [chan receive]:
    github.com/karalabe/bufioprop/shootout/jnml.Copy(0x7f0be6164b00,
    0xc21e742070, 0x7f0be6164b28, 0xc21e742000, 0x2615, 0x0, 0x0, 0x0)
    /work/src/github.com/karalabe/bufioprop/shootout/jnml/copy.go:70
    +0x6cb
    main.test(0xc208084000, 0x8000000, 0x8000000, 0x502c30, 0xe, 0x52dd60,
    0x1e329600)
    /work/src/github.com/karalabe/bufioprop/shootout/validator.go:21
    +0x1de
    main.main()
    /work/src/github.com/karalabe/bufioprop/shootout/shootout.go:56
    +0x300

    goroutine 33 [chan receive]:
    github.com/karalabe/bufioprop/shootout/jnml.func·001()
    <http://github.com/karalabe/bufioprop/shootout/jnml.func%C2%B7001()>
    /work/src/github.com/karalabe/bufioprop/shootout/jnml/copy.go:16
    +0x8b
    created by github.com/karalabe/bufioprop/shootout/jnml.Copy
    /work/src/github.com/karalabe/bufioprop/shootout/jnml/copy.go:29
    +0x1d2

    On Thu, Jan 29, 2015 at 5:42 PM, Egon wrote:


    On Thursday, 29 January 2015 17:26:55 UTC+2, Péter Szilágyi wrote:

    Ok, I also fixed my data race (pushed all updates, including your new
    solution) :D

    So, now the merged solutions are:

    High throughput tests:
    io.Copy: test passed.
    [!] bufio.Copy: test passed.
    rogerpeppe.Copy: test passed.
    mattharden.Copy: test passed.
    yiyus.Copy: corrupt data on the output.
    egonelbre.Copy: test passed.

    Stable input, stable output:
    io.Copy: 3.377904114s 10.666667 mbps.
    [!] bufio.Copy: 3.372692234s 10.666667 mbps.
    rogerpeppe.Copy: 3.38447648s 10.666667 mbps.
    mattharden.Copy: 6.380075266s 5.333333 mbps.
    egonelbre.Copy: 3.376019856s 10.666667 mbps.

    Stable input, bursty output:
    io.Copy: 6.26829765s 5.333333 mbps.
    [!] bufio.Copy: 3.397446603s 10.666667 mbps.
    rogerpeppe.Copy: 5.963495089s 6.400000 mbps.
    mattharden.Copy: 6.266678561s 5.333333 mbps.
    egonelbre.Copy: 3.395984883s 10.666667 mbps.

    Bursty input, stable output:
    io.Copy: 6.287110346s 5.333333 mbps.
    [!] bufio.Copy: 3.363437488s 10.666667 mbps.
    rogerpeppe.Copy: 5.979492432s 6.400000 mbps.
    mattharden.Copy: 3.484705393s 10.666667 mbps.
    egonelbre.Copy: 3.357674971s 10.666667 mbps.

    Let's see if I can add some benchmarks too to see how they perform if
    not ratelimited.

    Anybody up to challenge the complexity? As far as I see it Egon's
    solution is in essence the same approach :)
    I'm using a single temporary buffer for passing the data around. Which
    is, I guess the reason why sometimes one performs better than the other.
    Also it seems the locking is different.

    Also, are you sure you haven't got a deadlock?

    When reader happens to end up in select @ 45, and writer happens to end
    up in select @ 111.

    Seems like a sequence of:
    R line 42, ba == 0 // buffer is full
    W line 154, ba := bs // managed to read everything
    W line 156 // signaling does nothing
    R line 45 // blocks
    W line 107 // ba == bs
    W line 111 // blocks

    On Thu, Jan 29, 2015 at 5:12 PM, Egon wrote:


    On Thursday, 29 January 2015 16:34:59 UTC+2, Péter Szilágyi wrote:

    I've pushed a new version of the shootout:

    - Added the two contender solutions from yy and egon
    - Added a high throughput threading tests to make sure index and
    threading issues are caught
    - In the same light, set gomaxprocs to 8 by default

    Current output is:

    go run shootout.go

    High throughput tests:
    io.Copy: test passed.
    [!] bufio.Copy: test passed.
    rogerpeppe.Copy: test passed.
    mattharden.Copy: test passed.
    yiyus.Copy: corrupt data on the output.
    egonelbre.Copy: data length mismatch: have 221132, want 33554432.
    Fixed & non-busy: https://gist.github.com/egonelbre/
    5600f3de19daec1ba055

    Also:
    Stable input, stable output:
    io.Copy: 3.2751874s 10.666667 mbps.
    [!] bufio.Copy: 3.270187s 10.666667 mbps.
    rogerpeppe.Copy: 3.2711871s 10.666667 mbps.
    mattharden.Copy: 6.2123553s 5.333333 mbps.
    egonelbre.Copy: 3.2741873s 10.666667 mbps.

    Stable input, bursty output:
    io.Copy: 6.154352s 5.333333 mbps.
    [!] bufio.Copy: 3.2941884s 10.666667 mbps.
    rogerpeppe.Copy: 6.1523519s 5.333333 mbps.
    mattharden.Copy: 6.1513519s 5.333333 mbps.
    egonelbre.Copy: 3.2921883s 10.666667 mbps.

    Bursty input, stable output:
    io.Copy: 6.1513518s 5.333333 mbps.
    [!] bufio.Copy: 3.2711871s 10.666667 mbps.
    rogerpeppe.Copy: 6.1523519s 5.333333 mbps.
    mattharden.Copy: 3.3761931s 10.666667 mbps.
    egonelbre.Copy: 3.2711871s 10.666667 mbps.

    With irregular read/write sizes e.g. 100*999:

    Stable input, stable output:
    io.Copy: 3.3511917s 10.666667 mbps.
    [!] bufio.Copy: 3.3521917s 10.666667 mbps.
    rogerpeppe.Copy: 3.3511917s 10.666667 mbps.
    mattharden.Copy: 6.3903655s 5.333333 mbps.
    egonelbre.Copy: 3.3511917s 10.666667 mbps.

    Stable input, bursty output:
    io.Copy: 6.3213615s 5.333333 mbps.
    [!] bufio.Copy: 3.390194s 10.666667 mbps.
    rogerpeppe.Copy: 6.3203615s 5.333333 mbps.
    mattharden.Copy: 6.3413627s 5.333333 mbps.
    egonelbre.Copy: 3.3901939s 10.666667 mbps.

    Bursty input, stable output:
    io.Copy: 6.3243617s 5.333333 mbps.
    [!] bufio.Copy: 3.3551919s 10.666667 mbps.
    rogerpeppe.Copy: 6.3213616s 5.333333 mbps.
    mattharden.Copy: 3.5512031s 10.666667 mbps.
    egonelbre.Copy: 3.3501916s 10.666667 mbps.

    --
    You received this message because you are subscribed to the Google
    Groups "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send
    an email to golang-nuts...@googlegroups.com.
    For more options, visit https://groups.google.com/d/optout.
    --
    You received this message because you are subscribed to the Google
    Groups "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send
    an email to golang-nuts+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/d/optout.
    --
    You received this message because you are subscribed to the Google Groups "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/d/optout.
  • Jan Mercl at Jan 29, 2015 at 3:57 pm

    On Thu Jan 29 2015 at 16:46:58 Péter Szilágyi wrote:

    @Jan:

    fatal error: all goroutines are asleep - deadlock!
    Sorry, last minute change slipped without testing. Please see [0], thanks!

    [0]: https://github.com/karalabe/bufioprop/pull/2

    -j

    --
    You received this message because you are subscribed to the Google Groups "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/d/optout.
  • Péter Szilágyi at Jan 29, 2015 at 4:05 pm
    I'm out, cont tomorrow.

    High throughput tests:
             io.Copy: test passed.
      [!] bufio.Copy: test passed.
    rogerpeppe.Copy: test passed.
    mattharden.Copy: test passed.
          yiyus.Copy: corrupt data on the output.
      egonelbre.Copy: test passed.
           jnml.Copy: test passed.
    ------------------------------------------------
    Stable input, stable output shootout:
             io.Copy: 3.414319512s 10.666667 mbps.
      [!] bufio.Copy: 3.472845836s 10.666667 mbps.
    rogerpeppe.Copy: 3.449201443s 10.666667 mbps.
    mattharden.Copy: 6.422282401s 5.333333 mbps.
      egonelbre.Copy: 3.386334062s 10.666667 mbps.
           jnml.Copy: 3.552567409s 10.666667 mbps.

    Stable input, bursty output shootout:
             io.Copy: 6.296563835s 5.333333 mbps.
      [!] bufio.Copy: 3.408388288s 10.666667 mbps.
    rogerpeppe.Copy: 5.982002029s 6.400000 mbps.
      egonelbre.Copy: 3.406534694s 10.666667 mbps.
           jnml.Copy: 3.444579099s 10.666667 mbps.

    Bursty input, stable output shootout:
      [!] bufio.Copy: 3.357498698s 10.666667 mbps.
      egonelbre.Copy: 3.355458497s 10.666667 mbps.
           jnml.Copy: 3.418233526s 10.666667 mbps.
    ------------------------------------------------
    High throughput benchmarks:
    +----------------+--------------+--------------+--------------+--------------+--------------+
    SOLUTION | BUF-333 | BUF-4155 | BUF-65359 | BUF-1048559
    BUF-16777301 |
    +----------------+--------------+--------------+--------------+--------------+--------------+
    [!] bufio.Copy | 351.547187ms | 95.685503ms | 103.636411ms |
    107.020277ms | 92.015499ms |
    egonelbre.Copy | 416.191585ms | 100.28949ms | 85.448071ms | 66.624809ms
    95.744309ms |
    jnml.Copy | 161.064838ms | 160.426351ms | 150.014876ms |
    143.287792ms | 142.91072ms |
    +----------------+--------------+--------------+--------------+--------------+--------------+
    On Thu, Jan 29, 2015 at 5:57 PM, Jan Mercl wrote:
    On Thu Jan 29 2015 at 16:46:58 Péter Szilágyi wrote:

    @Jan:

    fatal error: all goroutines are asleep - deadlock!
    Sorry, last minute change slipped without testing. Please see [0], thanks!

    [0]: https://github.com/karalabe/bufioprop/pull/2

    -j
    --
    You received this message because you are subscribed to the Google Groups "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/d/optout.
  • Piers at Jan 29, 2015 at 10:39 pm
    The shootout report is rounding the time to integer seconds in the bitrate
    calculation.

    Original:
          throughput := float64(size) / (1024 * 1024) /
    float64(elapsed/time.Second)

    Better:
          throughput := float64(size) / (1024 * 1024) / elapsed.Seconds()

    time.Seconds() returns a float64.

    Tested at: https://play.golang.org/p/Jj8R962bMB

    On Thursday, 29 January 2015 16:05:33 UTC, Péter Szilágyi wrote:

    I'm out, cont tomorrow.

    High throughput tests:
    io.Copy: test passed.
    [!] bufio.Copy: test passed.
    rogerpeppe.Copy: test passed.
    mattharden.Copy: test passed.
    yiyus.Copy: corrupt data on the output.
    egonelbre.Copy: test passed.
    jnml.Copy: test passed.
    ------------------------------------------------
    Stable input, stable output shootout:
    io.Copy: 3.414319512s 10.666667 mbps.
    [!] bufio.Copy: 3.472845836s 10.666667 mbps.
    rogerpeppe.Copy: 3.449201443s 10.666667 mbps.
    mattharden.Copy: 6.422282401s 5.333333 mbps.
    egonelbre.Copy: 3.386334062s 10.666667 mbps.
    jnml.Copy: 3.552567409s 10.666667 mbps.

    Stable input, bursty output shootout:
    io.Copy: 6.296563835s 5.333333 mbps.
    [!] bufio.Copy: 3.408388288s 10.666667 mbps.
    rogerpeppe.Copy: 5.982002029s 6.400000 mbps.
    egonelbre.Copy: 3.406534694s 10.666667 mbps.
    jnml.Copy: 3.444579099s 10.666667 mbps.

    Bursty input, stable output shootout:
    [!] bufio.Copy: 3.357498698s 10.666667 mbps.
    egonelbre.Copy: 3.355458497s 10.666667 mbps.
    jnml.Copy: 3.418233526s 10.666667 mbps.
    ------------------------------------------------
    High throughput benchmarks:

    +----------------+--------------+--------------+--------------+--------------+--------------+
    SOLUTION | BUF-333 | BUF-4155 | BUF-65359 |
    BUF-1048559 | BUF-16777301 |

    +----------------+--------------+--------------+--------------+--------------+--------------+
    [!] bufio.Copy | 351.547187ms | 95.685503ms | 103.636411ms |
    107.020277ms | 92.015499ms |
    egonelbre.Copy | 416.191585ms | 100.28949ms | 85.448071ms |
    66.624809ms | 95.744309ms |
    jnml.Copy | 161.064838ms | 160.426351ms | 150.014876ms |
    143.287792ms | 142.91072ms |

    +----------------+--------------+--------------+--------------+--------------+--------------+

    On Thu, Jan 29, 2015 at 5:57 PM, Jan Mercl <0xj...@gmail.com <javascript:>
    wrote:
    On Thu Jan 29 2015 at 16:46:58 Péter Szilágyi <pet...@gmail.com
    <javascript:>> wrote:
    @Jan:

    fatal error: all goroutines are asleep - deadlock!
    Sorry, last minute change slipped without testing. Please see [0], thanks!

    [0]: https://github.com/karalabe/bufioprop/pull/2

    -j
    --
    You received this message because you are subscribed to the Google Groups "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/d/optout.
  • Péter Szilágyi at Jan 30, 2015 at 8:15 am
    Hey Piers, thanks for the fix. The throughput there isn't really important
    as the source and sink endpoints are throttled (so the only thing I used it
    for is to see whether an implementation solves the problem or not).
    However, you are correct that it is still wrong reporting, so I've updated
    it.

    In addition I've included a new implementagion from Roger, one from Nick, a
    panicing one from Bakul and disabled the latest version from Jan for
    unrecoverable panics.

    Currently the stats are:

    High throughput tests:
             io.Copy: test passed.
      [!] bufio.Copy: test passed.
    rogerpeppe.Copy: test passed.
    mattharden.Copy: test passed.
          yiyus.Copy: corrupt data on the output.
      egonelbre.Copy: test passed.
           jnml.Copy: unrecoverable panic, disabled.
            ncw.Copy: test passed.
      bakulshah.Copy: panic.
    ------------------------------------------------

    Stable input, stable output shootout:
             io.Copy: 3.388366514s 9.444079 mbps.
      [!] bufio.Copy: 3.406410224s 9.394054 mbps.
    rogerpeppe.Copy: 3.394213932s 9.427809 mbps.
    mattharden.Copy: 6.388273118s 5.009178 mbps.
      egonelbre.Copy: 3.393729865s 9.429154 mbps.
            ncw.Copy: 3.472179061s 9.216115 mbps.

    Stable input, bursty output shootout:
             io.Copy: 6.262102218s 5.110105 mbps.
      [!] bufio.Copy: 3.402419663s 9.405071 mbps.
    rogerpeppe.Copy: 3.396766961s 9.420723 mbps.
      egonelbre.Copy: 3.407603383s 9.390764 mbps.
            ncw.Copy: 3.478610627s 9.199075 mbps.

    Bursty input, stable output shootout:
      [!] bufio.Copy: 3.355840623s 9.535614 mbps.
    rogerpeppe.Copy: 3.394942511s 9.425786 mbps.
      egonelbre.Copy: 3.358180428s 9.528970 mbps.
            ncw.Copy: 3.467073352s 9.229686 mbps.
    ------------------------------------------------

    High throughput benchmarks (256 MB):
    +-----------------+--------------+--------------+--------------+--------------+--------------+
    SOLUTION | BUF-333 | BUF-4155 | BUF-65359 |
    BUF-1048559 | BUF-16777301 |
    +-----------------+--------------+--------------+--------------+--------------+--------------+
    [!] bufio.Copy | 2.117422051s | 286.632357ms | 131.174571ms |
    115.270713ms | 188.939727ms |
    rogerpeppe.Copy | 1.400121588s | 190.094404ms | 140.979307ms |
    127.962595ms | 226.539619ms |
    egonelbre.Copy | 2.532881422s | 274.633225ms | 77.610408ms |
    75.365796ms | 222.150197ms |
    ncw.Copy | 1.567376946s | 180.483826ms | 85.202379ms | 66.42527ms
    187.146007ms |
    +-----------------+--------------+--------------+--------------+--------------+--------------+

    Looking at my very unscientific benchmark, the right solution is probably a
    combination of different ideas from various individual implementations. And
    just a personal addition, all currently passing code bases are between
    150-200 LOC.

    So, aiming at the go devs: would this be proof enough, that a bufio.Copy is
    both warranted for and non-trivial? :)

    Cheers,
       Peter


    On Fri, Jan 30, 2015 at 12:39 AM, Piers wrote:

    The shootout report is rounding the time to integer seconds in the bitrate
    calculation.

    Original:
    throughput := float64(size) / (1024 * 1024) /
    float64(elapsed/time.Second)

    Better:
    throughput := float64(size) / (1024 * 1024) / elapsed.Seconds()

    time.Seconds() returns a float64.

    Tested at: https://play.golang.org/p/Jj8R962bMB

    On Thursday, 29 January 2015 16:05:33 UTC, Péter Szilágyi wrote:

    I'm out, cont tomorrow.

    High throughput tests:
    io.Copy: test passed.
    [!] bufio.Copy: test passed.
    rogerpeppe.Copy: test passed.
    mattharden.Copy: test passed.
    yiyus.Copy: corrupt data on the output.
    egonelbre.Copy: test passed.
    jnml.Copy: test passed.
    ------------------------------------------------
    Stable input, stable output shootout:
    io.Copy: 3.414319512s 10.666667 mbps.
    [!] bufio.Copy: 3.472845836s 10.666667 mbps.
    rogerpeppe.Copy: 3.449201443s 10.666667 mbps.
    mattharden.Copy: 6.422282401s 5.333333 mbps.
    egonelbre.Copy: 3.386334062s 10.666667 mbps.
    jnml.Copy: 3.552567409s 10.666667 mbps.

    Stable input, bursty output shootout:
    io.Copy: 6.296563835s 5.333333 mbps.
    [!] bufio.Copy: 3.408388288s 10.666667 mbps.
    rogerpeppe.Copy: 5.982002029s 6.400000 mbps.
    egonelbre.Copy: 3.406534694s 10.666667 mbps.
    jnml.Copy: 3.444579099s 10.666667 mbps.

    Bursty input, stable output shootout:
    [!] bufio.Copy: 3.357498698s 10.666667 mbps.
    egonelbre.Copy: 3.355458497s 10.666667 mbps.
    jnml.Copy: 3.418233526s 10.666667 mbps.
    ------------------------------------------------
    High throughput benchmarks:
    +----------------+--------------+--------------+------------
    --+--------------+--------------+
    SOLUTION | BUF-333 | BUF-4155 | BUF-65359 |
    BUF-1048559 | BUF-16777301 |
    +----------------+--------------+--------------+------------
    --+--------------+--------------+
    [!] bufio.Copy | 351.547187ms | 95.685503ms | 103.636411ms |
    107.020277ms | 92.015499ms |
    egonelbre.Copy | 416.191585ms | 100.28949ms | 85.448071ms |
    66.624809ms | 95.744309ms |
    jnml.Copy | 161.064838ms | 160.426351ms | 150.014876ms |
    143.287792ms | 142.91072ms |
    +----------------+--------------+--------------+------------
    --+--------------+--------------+
    On Thu, Jan 29, 2015 at 5:57 PM, Jan Mercl wrote:
    On Thu Jan 29 2015 at 16:46:58 Péter Szilágyi wrote:

    @Jan:

    fatal error: all goroutines are asleep - deadlock!
    Sorry, last minute change slipped without testing. Please see [0],
    thanks!

    [0]: https://github.com/karalabe/bufioprop/pull/2

    -j
    --
    You received this message because you are subscribed to the Google Groups
    "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send an
    email to golang-nuts+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/d/optout.
    --
    You received this message because you are subscribed to the Google Groups "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/d/optout.
  • Egon at Jan 30, 2015 at 9:01 am

    On Friday, 30 January 2015 10:15:52 UTC+2, Péter Szilágyi wrote:
    Hey Piers, thanks for the fix. The throughput there isn't really important
    as the source and sink endpoints are throttled (so the only thing I used it
    for is to see whether an implementation solves the problem or not).
    However, you are correct that it is still wrong reporting, so I've updated
    it.
    I think the benchmark should also include memory usage and GC measurements
    as well... Other players are cheating with the buffer size :'( ... :D

    P.S. Péter, did you take a look at the potential deadlock problem, I
    mentioned?

    + Egon


    In addition I've included a new implementagion from Roger, one from Nick,
    a panicing one from Bakul and disabled the latest version from Jan for
    unrecoverable panics.

    Currently the stats are:

    High throughput tests:
    io.Copy: test passed.
    [!] bufio.Copy: test passed.
    rogerpeppe.Copy: test passed.
    mattharden.Copy: test passed.
    yiyus.Copy: corrupt data on the output.
    egonelbre.Copy: test passed.
    jnml.Copy: unrecoverable panic, disabled.
    ncw.Copy: test passed.
    bakulshah.Copy: panic.
    ------------------------------------------------

    Stable input, stable output shootout:
    io.Copy: 3.388366514s 9.444079 mbps.
    [!] bufio.Copy: 3.406410224s 9.394054 mbps.
    rogerpeppe.Copy: 3.394213932s 9.427809 mbps.
    mattharden.Copy: 6.388273118s 5.009178 mbps.
    egonelbre.Copy: 3.393729865s 9.429154 mbps.
    ncw.Copy: 3.472179061s 9.216115 mbps.

    Stable input, bursty output shootout:
    io.Copy: 6.262102218s 5.110105 mbps.
    [!] bufio.Copy: 3.402419663s 9.405071 mbps.
    rogerpeppe.Copy: 3.396766961s 9.420723 mbps.
    egonelbre.Copy: 3.407603383s 9.390764 mbps.
    ncw.Copy: 3.478610627s 9.199075 mbps.

    Bursty input, stable output shootout:
    [!] bufio.Copy: 3.355840623s 9.535614 mbps.
    rogerpeppe.Copy: 3.394942511s 9.425786 mbps.
    egonelbre.Copy: 3.358180428s 9.528970 mbps.
    ncw.Copy: 3.467073352s 9.229686 mbps.
    ------------------------------------------------

    High throughput benchmarks (256 MB):

    +-----------------+--------------+--------------+--------------+--------------+--------------+
    SOLUTION | BUF-333 | BUF-4155 | BUF-65359 |
    BUF-1048559 | BUF-16777301 |

    +-----------------+--------------+--------------+--------------+--------------+--------------+
    [!] bufio.Copy | 2.117422051s | 286.632357ms | 131.174571ms |
    115.270713ms | 188.939727ms |
    rogerpeppe.Copy | 1.400121588s | 190.094404ms | 140.979307ms |
    127.962595ms | 226.539619ms |
    egonelbre.Copy | 2.532881422s | 274.633225ms | 77.610408ms |
    75.365796ms | 222.150197ms |
    ncw.Copy | 1.567376946s | 180.483826ms | 85.202379ms |
    66.42527ms | 187.146007ms |

    +-----------------+--------------+--------------+--------------+--------------+--------------+

    Looking at my very unscientific benchmark, the right solution is probably
    a combination of different ideas from various individual implementations. And
    just a personal addition, all currently passing code bases are between
    150-200 LOC.

    So, aiming at the go devs: would this be proof enough, that a bufio.Copy
    is both warranted for and non-trivial? :)

    Cheers,
    Peter



    On Fri, Jan 30, 2015 at 12:39 AM, Piers <goo...@o172.net <javascript:>>
    wrote:
    The shootout report is rounding the time to integer seconds in the
    bitrate calculation.

    Original:
    throughput := float64(size) / (1024 * 1024) /
    float64(elapsed/time.Second)

    Better:
    throughput := float64(size) / (1024 * 1024) / elapsed.Seconds()

    time.Seconds() returns a float64.

    Tested at: https://play.golang.org/p/Jj8R962bMB

    On Thursday, 29 January 2015 16:05:33 UTC, Péter Szilágyi wrote:

    I'm out, cont tomorrow.

    High throughput tests:
    io.Copy: test passed.
    [!] bufio.Copy: test passed.
    rogerpeppe.Copy: test passed.
    mattharden.Copy: test passed.
    yiyus.Copy: corrupt data on the output.
    egonelbre.Copy: test passed.
    jnml.Copy: test passed.
    ------------------------------------------------
    Stable input, stable output shootout:
    io.Copy: 3.414319512s 10.666667 mbps.
    [!] bufio.Copy: 3.472845836s 10.666667 mbps.
    rogerpeppe.Copy: 3.449201443s 10.666667 mbps.
    mattharden.Copy: 6.422282401s 5.333333 mbps.
    egonelbre.Copy: 3.386334062s 10.666667 mbps.
    jnml.Copy: 3.552567409s 10.666667 mbps.

    Stable input, bursty output shootout:
    io.Copy: 6.296563835s 5.333333 mbps.
    [!] bufio.Copy: 3.408388288s 10.666667 mbps.
    rogerpeppe.Copy: 5.982002029s 6.400000 mbps.
    egonelbre.Copy: 3.406534694s 10.666667 mbps.
    jnml.Copy: 3.444579099s 10.666667 mbps.

    Bursty input, stable output shootout:
    [!] bufio.Copy: 3.357498698s 10.666667 mbps.
    egonelbre.Copy: 3.355458497s 10.666667 mbps.
    jnml.Copy: 3.418233526s 10.666667 mbps.
    ------------------------------------------------
    High throughput benchmarks:
    +----------------+--------------+--------------+------------
    --+--------------+--------------+
    SOLUTION | BUF-333 | BUF-4155 | BUF-65359 |
    BUF-1048559 | BUF-16777301 |
    +----------------+--------------+--------------+------------
    --+--------------+--------------+
    [!] bufio.Copy | 351.547187ms | 95.685503ms | 103.636411ms |
    107.020277ms | 92.015499ms |
    egonelbre.Copy | 416.191585ms | 100.28949ms | 85.448071ms |
    66.624809ms | 95.744309ms |
    jnml.Copy | 161.064838ms | 160.426351ms | 150.014876ms |
    143.287792ms | 142.91072ms |
    +----------------+--------------+--------------+------------
    --+--------------+--------------+
    On Thu, Jan 29, 2015 at 5:57 PM, Jan Mercl wrote:
    On Thu Jan 29 2015 at 16:46:58 Péter Szilágyi wrote:

    @Jan:

    fatal error: all goroutines are asleep - deadlock!
    Sorry, last minute change slipped without testing. Please see [0],
    thanks!

    [0]: https://github.com/karalabe/bufioprop/pull/2

    -j
    --
    You received this message because you are subscribed to the Google Groups
    "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send an
    email to golang-nuts...@googlegroups.com <javascript:>.
    For more options, visit https://groups.google.com/d/optout.
    --
    You received this message because you are subscribed to the Google Groups "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/d/optout.
  • Péter Szilágyi at Jan 30, 2015 at 9:01 am
    To further push the performance and tests, I've also added a latency
    benchmark beside the throughput benchmark. Nick's implementation got shot
    out with a deadlock. Latest code pushed to github.

    Manually disabled contenders:
           jnml.Copy: unrecoverable panic in tests.
            ncw.Copy: deadlock in latency benchmark.
    ------------------------------------------------

    High throughput tests:
             io.Copy: test passed.
      [!] bufio.Copy: test passed.
    rogerpeppe.Copy: test passed.
    mattharden.Copy: test passed.
          yiyus.Copy: corrupt data on the output.
      egonelbre.Copy: test passed.
      bakulshah.Copy: panic.
    ------------------------------------------------

    Stable input, stable output shootout:
             io.Copy: 3.402357575s 9.405243 mbps.
      [!] bufio.Copy: 3.392980474s 9.431236 mbps.
    rogerpeppe.Copy: 3.405857156s 9.395579 mbps.
    mattharden.Copy: 6.396839856s 5.002470 mbps.
      egonelbre.Copy: 3.374836333s 9.481941 mbps.

    Stable input, bursty output shootout:
             io.Copy: 6.28985121s 5.087561 mbps.
      [!] bufio.Copy: 3.410411611s 9.383032 mbps.
    rogerpeppe.Copy: 3.400147463s 9.411357 mbps.
      egonelbre.Copy: 3.402020034s 9.406176 mbps.

    Bursty input, stable output shootout:
      [!] bufio.Copy: 3.354353574s 9.539841 mbps.
    rogerpeppe.Copy: 3.389586073s 9.440681 mbps.
      egonelbre.Copy: 3.351570699s 9.547762 mbps.
    ------------------------------------------------

    Latency benchmarks:
      [!] bufio.Copy: latency 5.522µs.
    rogerpeppe.Copy: latency 5.375µs.
      egonelbre.Copy: latency 5.866µs.

    Throughput benchmarks (32 MB):
    +-----------------+-------------+--------------+--------------+--------------+--------------+
    SOLUTION | BUF-333 | BUF-4155 | BUF-65359 | BUF-1048559
    BUF-16777301 |
    +-----------------+-------------+--------------+--------------+--------------+--------------+
    [!] bufio.Copy | 126.22 mbps | 931.71 mbps | 1967.62 mbps | 1970.62
    mbps | 1239.80 mbps |
    rogerpeppe.Copy | 179.03 mbps | 1386.16 mbps | 1806.48 mbps | 1725.10
    mbps | 1117.54 mbps |
    egonelbre.Copy | 102.78 mbps | 956.23 mbps | 3302.07 mbps | 3131.47
    mbps | 1413.67 mbps |
    +-----------------+-------------+--------------+--------------+--------------+--------------+
    On Fri, Jan 30, 2015 at 10:15 AM, Péter Szilágyi wrote:

    Hey Piers, thanks for the fix. The throughput there isn't really important
    as the source and sink endpoints are throttled (so the only thing I used it
    for is to see whether an implementation solves the problem or not).
    However, you are correct that it is still wrong reporting, so I've updated
    it.

    In addition I've included a new implementagion from Roger, one from Nick,
    a panicing one from Bakul and disabled the latest version from Jan for
    unrecoverable panics.

    Currently the stats are:

    High throughput tests:
    io.Copy: test passed.
    [!] bufio.Copy: test passed.
    rogerpeppe.Copy: test passed.
    mattharden.Copy: test passed.
    yiyus.Copy: corrupt data on the output.
    egonelbre.Copy: test passed.
    jnml.Copy: unrecoverable panic, disabled.
    ncw.Copy: test passed.
    bakulshah.Copy: panic.
    ------------------------------------------------

    Stable input, stable output shootout:
    io.Copy: 3.388366514s 9.444079 mbps.
    [!] bufio.Copy: 3.406410224s 9.394054 mbps.
    rogerpeppe.Copy: 3.394213932s 9.427809 mbps.
    mattharden.Copy: 6.388273118s 5.009178 mbps.
    egonelbre.Copy: 3.393729865s 9.429154 mbps.
    ncw.Copy: 3.472179061s 9.216115 mbps.

    Stable input, bursty output shootout:
    io.Copy: 6.262102218s 5.110105 mbps.
    [!] bufio.Copy: 3.402419663s 9.405071 mbps.
    rogerpeppe.Copy: 3.396766961s 9.420723 mbps.
    egonelbre.Copy: 3.407603383s 9.390764 mbps.
    ncw.Copy: 3.478610627s 9.199075 mbps.

    Bursty input, stable output shootout:
    [!] bufio.Copy: 3.355840623s 9.535614 mbps.
    rogerpeppe.Copy: 3.394942511s 9.425786 mbps.
    egonelbre.Copy: 3.358180428s 9.528970 mbps.
    ncw.Copy: 3.467073352s 9.229686 mbps.
    ------------------------------------------------

    High throughput benchmarks (256 MB):

    +-----------------+--------------+--------------+--------------+--------------+--------------+
    SOLUTION | BUF-333 | BUF-4155 | BUF-65359 |
    BUF-1048559 | BUF-16777301 |

    +-----------------+--------------+--------------+--------------+--------------+--------------+
    [!] bufio.Copy | 2.117422051s | 286.632357ms | 131.174571ms |
    115.270713ms | 188.939727ms |
    rogerpeppe.Copy | 1.400121588s | 190.094404ms | 140.979307ms |
    127.962595ms | 226.539619ms |
    egonelbre.Copy | 2.532881422s | 274.633225ms | 77.610408ms |
    75.365796ms | 222.150197ms |
    ncw.Copy | 1.567376946s | 180.483826ms | 85.202379ms |
    66.42527ms | 187.146007ms |

    +-----------------+--------------+--------------+--------------+--------------+--------------+

    Looking at my very unscientific benchmark, the right solution is probably
    a combination of different ideas from various individual implementations. And
    just a personal addition, all currently passing code bases are between
    150-200 LOC.

    So, aiming at the go devs: would this be proof enough, that a bufio.Copy
    is both warranted for and non-trivial? :)

    Cheers,
    Peter


    On Fri, Jan 30, 2015 at 12:39 AM, Piers wrote:

    The shootout report is rounding the time to integer seconds in the
    bitrate calculation.

    Original:
    throughput := float64(size) / (1024 * 1024) /
    float64(elapsed/time.Second)

    Better:
    throughput := float64(size) / (1024 * 1024) / elapsed.Seconds()

    time.Seconds() returns a float64.

    Tested at: https://play.golang.org/p/Jj8R962bMB

    On Thursday, 29 January 2015 16:05:33 UTC, Péter Szilágyi wrote:

    I'm out, cont tomorrow.

    High throughput tests:
    io.Copy: test passed.
    [!] bufio.Copy: test passed.
    rogerpeppe.Copy: test passed.
    mattharden.Copy: test passed.
    yiyus.Copy: corrupt data on the output.
    egonelbre.Copy: test passed.
    jnml.Copy: test passed.
    ------------------------------------------------
    Stable input, stable output shootout:
    io.Copy: 3.414319512s 10.666667 mbps.
    [!] bufio.Copy: 3.472845836s 10.666667 mbps.
    rogerpeppe.Copy: 3.449201443s 10.666667 mbps.
    mattharden.Copy: 6.422282401s 5.333333 mbps.
    egonelbre.Copy: 3.386334062s 10.666667 mbps.
    jnml.Copy: 3.552567409s 10.666667 mbps.

    Stable input, bursty output shootout:
    io.Copy: 6.296563835s 5.333333 mbps.
    [!] bufio.Copy: 3.408388288s 10.666667 mbps.
    rogerpeppe.Copy: 5.982002029s 6.400000 mbps.
    egonelbre.Copy: 3.406534694s 10.666667 mbps.
    jnml.Copy: 3.444579099s 10.666667 mbps.

    Bursty input, stable output shootout:
    [!] bufio.Copy: 3.357498698s 10.666667 mbps.
    egonelbre.Copy: 3.355458497s 10.666667 mbps.
    jnml.Copy: 3.418233526s 10.666667 mbps.
    ------------------------------------------------
    High throughput benchmarks:
    +----------------+--------------+--------------+------------
    --+--------------+--------------+
    SOLUTION | BUF-333 | BUF-4155 | BUF-65359 |
    BUF-1048559 | BUF-16777301 |
    +----------------+--------------+--------------+------------
    --+--------------+--------------+
    [!] bufio.Copy | 351.547187ms | 95.685503ms | 103.636411ms |
    107.020277ms | 92.015499ms |
    egonelbre.Copy | 416.191585ms | 100.28949ms | 85.448071ms |
    66.624809ms | 95.744309ms |
    jnml.Copy | 161.064838ms | 160.426351ms | 150.014876ms |
    143.287792ms | 142.91072ms |
    +----------------+--------------+--------------+------------
    --+--------------+--------------+
    On Thu, Jan 29, 2015 at 5:57 PM, Jan Mercl wrote:
    On Thu Jan 29 2015 at 16:46:58 Péter Szilágyi wrote:

    @Jan:

    fatal error: all goroutines are asleep - deadlock!
    Sorry, last minute change slipped without testing. Please see [0],
    thanks!

    [0]: https://github.com/karalabe/bufioprop/pull/2

    -j
    --
    You received this message because you are subscribed to the Google Groups
    "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send an
    email to golang-nuts+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/d/optout.
    --
    You received this message because you are subscribed to the Google Groups "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/d/optout.
  • Péter Szilágyi at Jan 30, 2015 at 9:11 am
    Hey Egon, I am just about to :) Sorry, yesterday I had to run and didn't
    have time to dig into it.

    About the buffer "cheating", I've tried to skim through a few
    implementations and see if there are any hard coded extras in the middle
    (like ignoring by buffer size and using 1MB instead). I'm guessing those
    were either just forgotten, or thought to be better with a preset value.
    This of course can be debated, but for the sake of comparison I'm trying to
    make the benchmarks fair, so I've modified those to use the param buffer
    size and not pre-coded ones (small interim buffers are ok imho, just don't
    override the big one :) ).

    About the memory and GC usage... well, this is my 3rd day being wasted on
    this proposal :D Although I really love doing it, it's getting a tad much,
    so if you'd be willing to add a few tests/benchmarks/whatever I'd happily
    merge it, but I don't think I'd like to go into GC measurements juts now :P
    Hope you understand :)

    Cheers,
       Peter

    On Fri, Jan 30, 2015 at 11:01 AM, Péter Szilágyi wrote:

    To further push the performance and tests, I've also added a latency
    benchmark beside the throughput benchmark. Nick's implementation got shot
    out with a deadlock. Latest code pushed to github.

    Manually disabled contenders:
    jnml.Copy: unrecoverable panic in tests.
    ncw.Copy: deadlock in latency benchmark.
    ------------------------------------------------

    High throughput tests:
    io.Copy: test passed.
    [!] bufio.Copy: test passed.
    rogerpeppe.Copy: test passed.
    mattharden.Copy: test passed.
    yiyus.Copy: corrupt data on the output.
    egonelbre.Copy: test passed.
    bakulshah.Copy: panic.
    ------------------------------------------------

    Stable input, stable output shootout:
    io.Copy: 3.402357575s 9.405243 mbps.
    [!] bufio.Copy: 3.392980474s 9.431236 mbps.
    rogerpeppe.Copy: 3.405857156s 9.395579 mbps.
    mattharden.Copy: 6.396839856s 5.002470 mbps.
    egonelbre.Copy: 3.374836333s 9.481941 mbps.

    Stable input, bursty output shootout:
    io.Copy: 6.28985121s 5.087561 mbps.
    [!] bufio.Copy: 3.410411611s 9.383032 mbps.
    rogerpeppe.Copy: 3.400147463s 9.411357 mbps.
    egonelbre.Copy: 3.402020034s 9.406176 mbps.

    Bursty input, stable output shootout:
    [!] bufio.Copy: 3.354353574s 9.539841 mbps.
    rogerpeppe.Copy: 3.389586073s 9.440681 mbps.
    egonelbre.Copy: 3.351570699s 9.547762 mbps.
    ------------------------------------------------

    Latency benchmarks:
    [!] bufio.Copy: latency 5.522µs.
    rogerpeppe.Copy: latency 5.375µs.
    egonelbre.Copy: latency 5.866µs.

    Throughput benchmarks (32 MB):

    +-----------------+-------------+--------------+--------------+--------------+--------------+
    SOLUTION | BUF-333 | BUF-4155 | BUF-65359 |
    BUF-1048559 | BUF-16777301 |

    +-----------------+-------------+--------------+--------------+--------------+--------------+
    [!] bufio.Copy | 126.22 mbps | 931.71 mbps | 1967.62 mbps | 1970.62
    mbps | 1239.80 mbps |
    rogerpeppe.Copy | 179.03 mbps | 1386.16 mbps | 1806.48 mbps | 1725.10
    mbps | 1117.54 mbps |
    egonelbre.Copy | 102.78 mbps | 956.23 mbps | 3302.07 mbps | 3131.47
    mbps | 1413.67 mbps |

    +-----------------+-------------+--------------+--------------+--------------+--------------+
    On Fri, Jan 30, 2015 at 10:15 AM, Péter Szilágyi wrote:

    Hey Piers, thanks for the fix. The throughput there isn't really
    important as the source and sink endpoints are throttled (so the only thing
    I used it for is to see whether an implementation solves the problem or
    not). However, you are correct that it is still wrong reporting, so I've
    updated it.

    In addition I've included a new implementagion from Roger, one from Nick,
    a panicing one from Bakul and disabled the latest version from Jan for
    unrecoverable panics.

    Currently the stats are:

    High throughput tests:
    io.Copy: test passed.
    [!] bufio.Copy: test passed.
    rogerpeppe.Copy: test passed.
    mattharden.Copy: test passed.
    yiyus.Copy: corrupt data on the output.
    egonelbre.Copy: test passed.
    jnml.Copy: unrecoverable panic, disabled.
    ncw.Copy: test passed.
    bakulshah.Copy: panic.
    ------------------------------------------------

    Stable input, stable output shootout:
    io.Copy: 3.388366514s 9.444079 mbps.
    [!] bufio.Copy: 3.406410224s 9.394054 mbps.
    rogerpeppe.Copy: 3.394213932s 9.427809 mbps.
    mattharden.Copy: 6.388273118s 5.009178 mbps.
    egonelbre.Copy: 3.393729865s 9.429154 mbps.
    ncw.Copy: 3.472179061s 9.216115 mbps.

    Stable input, bursty output shootout:
    io.Copy: 6.262102218s 5.110105 mbps.
    [!] bufio.Copy: 3.402419663s 9.405071 mbps.
    rogerpeppe.Copy: 3.396766961s 9.420723 mbps.
    egonelbre.Copy: 3.407603383s 9.390764 mbps.
    ncw.Copy: 3.478610627s 9.199075 mbps.

    Bursty input, stable output shootout:
    [!] bufio.Copy: 3.355840623s 9.535614 mbps.
    rogerpeppe.Copy: 3.394942511s 9.425786 mbps.
    egonelbre.Copy: 3.358180428s 9.528970 mbps.
    ncw.Copy: 3.467073352s 9.229686 mbps.
    ------------------------------------------------

    High throughput benchmarks (256 MB):

    +-----------------+--------------+--------------+--------------+--------------+--------------+
    SOLUTION | BUF-333 | BUF-4155 | BUF-65359 |
    BUF-1048559 | BUF-16777301 |

    +-----------------+--------------+--------------+--------------+--------------+--------------+
    [!] bufio.Copy | 2.117422051s | 286.632357ms | 131.174571ms |
    115.270713ms | 188.939727ms |
    rogerpeppe.Copy | 1.400121588s | 190.094404ms | 140.979307ms |
    127.962595ms | 226.539619ms |
    egonelbre.Copy | 2.532881422s | 274.633225ms | 77.610408ms |
    75.365796ms | 222.150197ms |
    ncw.Copy | 1.567376946s | 180.483826ms | 85.202379ms |
    66.42527ms | 187.146007ms |

    +-----------------+--------------+--------------+--------------+--------------+--------------+

    Looking at my very unscientific benchmark, the right solution is probably
    a combination of different ideas from various individual implementations. And
    just a personal addition, all currently passing code bases are between
    150-200 LOC.

    So, aiming at the go devs: would this be proof enough, that a bufio.Copy
    is both warranted for and non-trivial? :)

    Cheers,
    Peter


    On Fri, Jan 30, 2015 at 12:39 AM, Piers wrote:

    The shootout report is rounding the time to integer seconds in the
    bitrate calculation.

    Original:
    throughput := float64(size) / (1024 * 1024) /
    float64(elapsed/time.Second)

    Better:
    throughput := float64(size) / (1024 * 1024) / elapsed.Seconds()

    time.Seconds() returns a float64.

    Tested at: https://play.golang.org/p/Jj8R962bMB

    On Thursday, 29 January 2015 16:05:33 UTC, Péter Szilágyi wrote:

    I'm out, cont tomorrow.

    High throughput tests:
    io.Copy: test passed.
    [!] bufio.Copy: test passed.
    rogerpeppe.Copy: test passed.
    mattharden.Copy: test passed.
    yiyus.Copy: corrupt data on the output.
    egonelbre.Copy: test passed.
    jnml.Copy: test passed.
    ------------------------------------------------
    Stable input, stable output shootout:
    io.Copy: 3.414319512s 10.666667 mbps.
    [!] bufio.Copy: 3.472845836s 10.666667 mbps.
    rogerpeppe.Copy: 3.449201443s 10.666667 mbps.
    mattharden.Copy: 6.422282401s 5.333333 mbps.
    egonelbre.Copy: 3.386334062s 10.666667 mbps.
    jnml.Copy: 3.552567409s 10.666667 mbps.

    Stable input, bursty output shootout:
    io.Copy: 6.296563835s 5.333333 mbps.
    [!] bufio.Copy: 3.408388288s 10.666667 mbps.
    rogerpeppe.Copy: 5.982002029s 6.400000 mbps.
    egonelbre.Copy: 3.406534694s 10.666667 mbps.
    jnml.Copy: 3.444579099s 10.666667 mbps.

    Bursty input, stable output shootout:
    [!] bufio.Copy: 3.357498698s 10.666667 mbps.
    egonelbre.Copy: 3.355458497s 10.666667 mbps.
    jnml.Copy: 3.418233526s 10.666667 mbps.
    ------------------------------------------------
    High throughput benchmarks:
    +----------------+--------------+--------------+------------
    --+--------------+--------------+
    SOLUTION | BUF-333 | BUF-4155 | BUF-65359 |
    BUF-1048559 | BUF-16777301 |
    +----------------+--------------+--------------+------------
    --+--------------+--------------+
    [!] bufio.Copy | 351.547187ms | 95.685503ms | 103.636411ms |
    107.020277ms | 92.015499ms |
    egonelbre.Copy | 416.191585ms | 100.28949ms | 85.448071ms |
    66.624809ms | 95.744309ms |
    jnml.Copy | 161.064838ms | 160.426351ms | 150.014876ms |
    143.287792ms | 142.91072ms |
    +----------------+--------------+--------------+------------
    --+--------------+--------------+
    On Thu, Jan 29, 2015 at 5:57 PM, Jan Mercl wrote:

    On Thu Jan 29 2015 at 16:46:58 Péter Szilágyi <pet...@gmail.com>
    wrote:
    @Jan:

    fatal error: all goroutines are asleep - deadlock!
    Sorry, last minute change slipped without testing. Please see [0],
    thanks!

    [0]: https://github.com/karalabe/bufioprop/pull/2

    -j
    --
    You received this message because you are subscribed to the Google
    Groups "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send
    an email to golang-nuts+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/d/optout.
    --
    You received this message because you are subscribed to the Google Groups "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/d/optout.
  • Péter Szilágyi at Jan 30, 2015 at 9:20 am
    Just checked the possible deadlock, and as far as I figure, it cannot
    happen. The thing you probably missed is that the signal channels are
    buffered channels with a capacity of 1. This ensures that after the writer
    removes something from the internal buffer, there will always be a signal
    ready for the reader, even if it's not currently waiting for it. There are
    a few cases of course that this leads to spurious wakeups (e.g reader
    already noticed the buffer space and filles it before the writer signals),
    but then the reader will wake up, see that nothing changed and go back to
    sleep.
    On Fri, Jan 30, 2015 at 11:11 AM, Péter Szilágyi wrote:

    Hey Egon, I am just about to :) Sorry, yesterday I had to run and didn't
    have time to dig into it.

    About the buffer "cheating", I've tried to skim through a few
    implementations and see if there are any hard coded extras in the middle
    (like ignoring by buffer size and using 1MB instead). I'm guessing those
    were either just forgotten, or thought to be better with a preset value.
    This of course can be debated, but for the sake of comparison I'm trying to
    make the benchmarks fair, so I've modified those to use the param buffer
    size and not pre-coded ones (small interim buffers are ok imho, just don't
    override the big one :) ).

    About the memory and GC usage... well, this is my 3rd day being wasted on
    this proposal :D Although I really love doing it, it's getting a tad much,
    so if you'd be willing to add a few tests/benchmarks/whatever I'd happily
    merge it, but I don't think I'd like to go into GC measurements juts now :P
    Hope you understand :)

    Cheers,
    Peter

    On Fri, Jan 30, 2015 at 11:01 AM, Péter Szilágyi wrote:

    To further push the performance and tests, I've also added a latency
    benchmark beside the throughput benchmark. Nick's implementation got shot
    out with a deadlock. Latest code pushed to github.

    Manually disabled contenders:
    jnml.Copy: unrecoverable panic in tests.
    ncw.Copy: deadlock in latency benchmark.
    ------------------------------------------------

    High throughput tests:
    io.Copy: test passed.
    [!] bufio.Copy: test passed.
    rogerpeppe.Copy: test passed.
    mattharden.Copy: test passed.
    yiyus.Copy: corrupt data on the output.
    egonelbre.Copy: test passed.
    bakulshah.Copy: panic.
    ------------------------------------------------

    Stable input, stable output shootout:
    io.Copy: 3.402357575s 9.405243 mbps.
    [!] bufio.Copy: 3.392980474s 9.431236 mbps.
    rogerpeppe.Copy: 3.405857156s 9.395579 mbps.
    mattharden.Copy: 6.396839856s 5.002470 mbps.
    egonelbre.Copy: 3.374836333s 9.481941 mbps.

    Stable input, bursty output shootout:
    io.Copy: 6.28985121s 5.087561 mbps.
    [!] bufio.Copy: 3.410411611s 9.383032 mbps.
    rogerpeppe.Copy: 3.400147463s 9.411357 mbps.
    egonelbre.Copy: 3.402020034s 9.406176 mbps.

    Bursty input, stable output shootout:
    [!] bufio.Copy: 3.354353574s 9.539841 mbps.
    rogerpeppe.Copy: 3.389586073s 9.440681 mbps.
    egonelbre.Copy: 3.351570699s 9.547762 mbps.
    ------------------------------------------------

    Latency benchmarks:
    [!] bufio.Copy: latency 5.522µs.
    rogerpeppe.Copy: latency 5.375µs.
    egonelbre.Copy: latency 5.866µs.

    Throughput benchmarks (32 MB):

    +-----------------+-------------+--------------+--------------+--------------+--------------+
    SOLUTION | BUF-333 | BUF-4155 | BUF-65359 |
    BUF-1048559 | BUF-16777301 |

    +-----------------+-------------+--------------+--------------+--------------+--------------+
    [!] bufio.Copy | 126.22 mbps | 931.71 mbps | 1967.62 mbps | 1970.62
    mbps | 1239.80 mbps |
    rogerpeppe.Copy | 179.03 mbps | 1386.16 mbps | 1806.48 mbps | 1725.10
    mbps | 1117.54 mbps |
    egonelbre.Copy | 102.78 mbps | 956.23 mbps | 3302.07 mbps | 3131.47
    mbps | 1413.67 mbps |

    +-----------------+-------------+--------------+--------------+--------------+--------------+

    On Fri, Jan 30, 2015 at 10:15 AM, Péter Szilágyi <peterke@gmail.com>
    wrote:
    Hey Piers, thanks for the fix. The throughput there isn't really
    important as the source and sink endpoints are throttled (so the only thing
    I used it for is to see whether an implementation solves the problem or
    not). However, you are correct that it is still wrong reporting, so I've
    updated it.

    In addition I've included a new implementagion from Roger, one from
    Nick, a panicing one from Bakul and disabled the latest version from Jan
    for unrecoverable panics.

    Currently the stats are:

    High throughput tests:
    io.Copy: test passed.
    [!] bufio.Copy: test passed.
    rogerpeppe.Copy: test passed.
    mattharden.Copy: test passed.
    yiyus.Copy: corrupt data on the output.
    egonelbre.Copy: test passed.
    jnml.Copy: unrecoverable panic, disabled.
    ncw.Copy: test passed.
    bakulshah.Copy: panic.
    ------------------------------------------------

    Stable input, stable output shootout:
    io.Copy: 3.388366514s 9.444079 mbps.
    [!] bufio.Copy: 3.406410224s 9.394054 mbps.
    rogerpeppe.Copy: 3.394213932s 9.427809 mbps.
    mattharden.Copy: 6.388273118s 5.009178 mbps.
    egonelbre.Copy: 3.393729865s 9.429154 mbps.
    ncw.Copy: 3.472179061s 9.216115 mbps.

    Stable input, bursty output shootout:
    io.Copy: 6.262102218s 5.110105 mbps.
    [!] bufio.Copy: 3.402419663s 9.405071 mbps.
    rogerpeppe.Copy: 3.396766961s 9.420723 mbps.
    egonelbre.Copy: 3.407603383s 9.390764 mbps.
    ncw.Copy: 3.478610627s 9.199075 mbps.

    Bursty input, stable output shootout:
    [!] bufio.Copy: 3.355840623s 9.535614 mbps.
    rogerpeppe.Copy: 3.394942511s 9.425786 mbps.
    egonelbre.Copy: 3.358180428s 9.528970 mbps.
    ncw.Copy: 3.467073352s 9.229686 mbps.
    ------------------------------------------------

    High throughput benchmarks (256 MB):

    +-----------------+--------------+--------------+--------------+--------------+--------------+
    SOLUTION | BUF-333 | BUF-4155 | BUF-65359 |
    BUF-1048559 | BUF-16777301 |

    +-----------------+--------------+--------------+--------------+--------------+--------------+
    [!] bufio.Copy | 2.117422051s | 286.632357ms | 131.174571ms |
    115.270713ms | 188.939727ms |
    rogerpeppe.Copy | 1.400121588s | 190.094404ms | 140.979307ms |
    127.962595ms | 226.539619ms |
    egonelbre.Copy | 2.532881422s | 274.633225ms | 77.610408ms |
    75.365796ms | 222.150197ms |
    ncw.Copy | 1.567376946s | 180.483826ms | 85.202379ms |
    66.42527ms | 187.146007ms |

    +-----------------+--------------+--------------+--------------+--------------+--------------+

    Looking at my very unscientific benchmark, the right solution is
    probably a combination of different ideas from various individual
    implementations. And just a personal addition, all currently passing
    code bases are between 150-200 LOC.

    So, aiming at the go devs: would this be proof enough, that a bufio.Copy
    is both warranted for and non-trivial? :)

    Cheers,
    Peter


    On Fri, Jan 30, 2015 at 12:39 AM, Piers wrote:

    The shootout report is rounding the time to integer seconds in the
    bitrate calculation.

    Original:
    throughput := float64(size) / (1024 * 1024) /
    float64(elapsed/time.Second)

    Better:
    throughput := float64(size) / (1024 * 1024) / elapsed.Seconds()

    time.Seconds() returns a float64.

    Tested at: https://play.golang.org/p/Jj8R962bMB

    On Thursday, 29 January 2015 16:05:33 UTC, Péter Szilágyi wrote:

    I'm out, cont tomorrow.

    High throughput tests:
    io.Copy: test passed.
    [!] bufio.Copy: test passed.
    rogerpeppe.Copy: test passed.
    mattharden.Copy: test passed.
    yiyus.Copy: corrupt data on the output.
    egonelbre.Copy: test passed.
    jnml.Copy: test passed.
    ------------------------------------------------
    Stable input, stable output shootout:
    io.Copy: 3.414319512s 10.666667 mbps.
    [!] bufio.Copy: 3.472845836s 10.666667 mbps.
    rogerpeppe.Copy: 3.449201443s 10.666667 mbps.
    mattharden.Copy: 6.422282401s 5.333333 mbps.
    egonelbre.Copy: 3.386334062s 10.666667 mbps.
    jnml.Copy: 3.552567409s 10.666667 mbps.

    Stable input, bursty output shootout:
    io.Copy: 6.296563835s 5.333333 mbps.
    [!] bufio.Copy: 3.408388288s 10.666667 mbps.
    rogerpeppe.Copy: 5.982002029s 6.400000 mbps.
    egonelbre.Copy: 3.406534694s 10.666667 mbps.
    jnml.Copy: 3.444579099s 10.666667 mbps.

    Bursty input, stable output shootout:
    [!] bufio.Copy: 3.357498698s 10.666667 mbps.
    egonelbre.Copy: 3.355458497s 10.666667 mbps.
    jnml.Copy: 3.418233526s 10.666667 mbps.
    ------------------------------------------------
    High throughput benchmarks:
    +----------------+--------------+--------------+------------
    --+--------------+--------------+
    SOLUTION | BUF-333 | BUF-4155 | BUF-65359 |
    BUF-1048559 | BUF-16777301 |
    +----------------+--------------+--------------+------------
    --+--------------+--------------+
    [!] bufio.Copy | 351.547187ms | 95.685503ms | 103.636411ms |
    107.020277ms | 92.015499ms |
    egonelbre.Copy | 416.191585ms | 100.28949ms | 85.448071ms |
    66.624809ms | 95.744309ms |
    jnml.Copy | 161.064838ms | 160.426351ms | 150.014876ms |
    143.287792ms | 142.91072ms |
    +----------------+--------------+--------------+------------
    --+--------------+--------------+
    On Thu, Jan 29, 2015 at 5:57 PM, Jan Mercl wrote:

    On Thu Jan 29 2015 at 16:46:58 Péter Szilágyi <pet...@gmail.com>
    wrote:
    @Jan:

    fatal error: all goroutines are asleep - deadlock!
    Sorry, last minute change slipped without testing. Please see [0],
    thanks!

    [0]: https://github.com/karalabe/bufioprop/pull/2

    -j
    --
    You received this message because you are subscribed to the Google
    Groups "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send
    an email to golang-nuts+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/d/optout.
    --
    You received this message because you are subscribed to the Google Groups "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/d/optout.
  • Egon at Jan 30, 2015 at 10:30 am

    On Friday, 30 January 2015 11:20:40 UTC+2, Péter Szilágyi wrote:
    Just checked the possible deadlock, and as far as I figure, it cannot
    happen. The thing you probably missed is that the signal channels are
    buffered channels with a capacity of 1.
    Yup, missed that part.

    This ensures that after the writer removes something from the internal
    buffer, there will always be a signal ready for the reader, even if it's
    not currently waiting for it. There are a few cases of course that this
    leads to spurious wakeups (e.g reader already noticed the buffer space and
    filles it before the writer signals), but then the reader will wake up, see
    that nothing changed and go back to sleep.

    On Fri, Jan 30, 2015 at 11:11 AM, Péter Szilágyi <pet...@gmail.com
    <javascript:>> wrote:
    Hey Egon, I am just about to :) Sorry, yesterday I had to run and didn't
    have time to dig into it.

    About the buffer "cheating", I've tried to skim through a few
    implementations and see if there are any hard coded extras in the middle
    (like ignoring by buffer size and using 1MB instead). I'm guessing those
    were either just forgotten, or thought to be better with a preset value.
    This of course can be debated, but for the sake of comparison I'm trying to
    make the benchmarks fair, so I've modified those to use the param buffer
    size and not pre-coded ones (small interim buffers are ok imho, just don't
    override the big one :) ).

    About the memory and GC usage... well, this is my 3rd day being wasted on
    this proposal :D Although I really love doing it, it's getting a tad much,
    so if you'd be willing to add a few tests/benchmarks/whatever I'd happily
    merge it, but I don't think I'd like to go into GC measurements juts now :P
    Hope you understand :)

    Cheers,
    Peter


    On Fri, Jan 30, 2015 at 11:01 AM, Péter Szilágyi <pet...@gmail.com
    <javascript:>> wrote:
    To further push the performance and tests, I've also added a latency
    benchmark beside the throughput benchmark. Nick's implementation got shot
    out with a deadlock. Latest code pushed to github.

    Manually disabled contenders:
    jnml.Copy: unrecoverable panic in tests.
    ncw.Copy: deadlock in latency benchmark.
    ------------------------------------------------

    High throughput tests:
    io.Copy: test passed.
    [!] bufio.Copy: test passed.
    rogerpeppe.Copy: test passed.
    mattharden.Copy: test passed.
    yiyus.Copy: corrupt data on the output.
    egonelbre.Copy: test passed.
    bakulshah.Copy: panic.
    ------------------------------------------------

    Stable input, stable output shootout:
    io.Copy: 3.402357575s 9.405243 mbps.
    [!] bufio.Copy: 3.392980474s 9.431236 mbps.
    rogerpeppe.Copy: 3.405857156s 9.395579 mbps.
    mattharden.Copy: 6.396839856s 5.002470 mbps.
    egonelbre.Copy: 3.374836333s 9.481941 mbps.

    Stable input, bursty output shootout:
    io.Copy: 6.28985121s 5.087561 mbps.
    [!] bufio.Copy: 3.410411611s 9.383032 mbps.
    rogerpeppe.Copy: 3.400147463s 9.411357 mbps.
    egonelbre.Copy: 3.402020034s 9.406176 mbps.

    Bursty input, stable output shootout:
    [!] bufio.Copy: 3.354353574s 9.539841 mbps.
    rogerpeppe.Copy: 3.389586073s 9.440681 mbps.
    egonelbre.Copy: 3.351570699s 9.547762 mbps.
    ------------------------------------------------

    Latency benchmarks:
    [!] bufio.Copy: latency 5.522µs.
    rogerpeppe.Copy: latency 5.375µs.
    egonelbre.Copy: latency 5.866µs.

    Throughput benchmarks (32 MB):

    +-----------------+-------------+--------------+--------------+--------------+--------------+
    SOLUTION | BUF-333 | BUF-4155 | BUF-65359 |
    BUF-1048559 | BUF-16777301 |

    +-----------------+-------------+--------------+--------------+--------------+--------------+
    [!] bufio.Copy | 126.22 mbps | 931.71 mbps | 1967.62 mbps | 1970.62
    mbps | 1239.80 mbps |
    rogerpeppe.Copy | 179.03 mbps | 1386.16 mbps | 1806.48 mbps | 1725.10
    mbps | 1117.54 mbps |
    egonelbre.Copy | 102.78 mbps | 956.23 mbps | 3302.07 mbps | 3131.47
    mbps | 1413.67 mbps |

    +-----------------+-------------+--------------+--------------+--------------+--------------+

    On Fri, Jan 30, 2015 at 10:15 AM, Péter Szilágyi <pet...@gmail.com
    <javascript:>> wrote:
    Hey Piers, thanks for the fix. The throughput there isn't really
    important as the source and sink endpoints are throttled (so the only thing
    I used it for is to see whether an implementation solves the problem or
    not). However, you are correct that it is still wrong reporting, so I've
    updated it.

    In addition I've included a new implementagion from Roger, one from
    Nick, a panicing one from Bakul and disabled the latest version from Jan
    for unrecoverable panics.

    Currently the stats are:

    High throughput tests:
    io.Copy: test passed.
    [!] bufio.Copy: test passed.
    rogerpeppe.Copy: test passed.
    mattharden.Copy: test passed.
    yiyus.Copy: corrupt data on the output.
    egonelbre.Copy: test passed.
    jnml.Copy: unrecoverable panic, disabled.
    ncw.Copy: test passed.
    bakulshah.Copy: panic.
    ------------------------------------------------

    Stable input, stable output shootout:
    io.Copy: 3.388366514s 9.444079 mbps.
    [!] bufio.Copy: 3.406410224s 9.394054 mbps.
    rogerpeppe.Copy: 3.394213932s 9.427809 mbps.
    mattharden.Copy: 6.388273118s 5.009178 mbps.
    egonelbre.Copy: 3.393729865s 9.429154 mbps.
    ncw.Copy: 3.472179061s 9.216115 mbps.

    Stable input, bursty output shootout:
    io.Copy: 6.262102218s 5.110105 mbps.
    [!] bufio.Copy: 3.402419663s 9.405071 mbps.
    rogerpeppe.Copy: 3.396766961s 9.420723 mbps.
    egonelbre.Copy: 3.407603383s 9.390764 mbps.
    ncw.Copy: 3.478610627s 9.199075 mbps.

    Bursty input, stable output shootout:
    [!] bufio.Copy: 3.355840623s 9.535614 mbps.
    rogerpeppe.Copy: 3.394942511s 9.425786 mbps.
    egonelbre.Copy: 3.358180428s 9.528970 mbps.
    ncw.Copy: 3.467073352s 9.229686 mbps.
    ------------------------------------------------

    High throughput benchmarks (256 MB):

    +-----------------+--------------+--------------+--------------+--------------+--------------+
    SOLUTION | BUF-333 | BUF-4155 | BUF-65359 |
    BUF-1048559 | BUF-16777301 |

    +-----------------+--------------+--------------+--------------+--------------+--------------+
    [!] bufio.Copy | 2.117422051s | 286.632357ms | 131.174571ms |
    115.270713ms | 188.939727ms |
    rogerpeppe.Copy | 1.400121588s | 190.094404ms | 140.979307ms |
    127.962595ms | 226.539619ms |
    egonelbre.Copy | 2.532881422s | 274.633225ms | 77.610408ms |
    75.365796ms | 222.150197ms |
    ncw.Copy | 1.567376946s | 180.483826ms | 85.202379ms |
    66.42527ms | 187.146007ms |

    +-----------------+--------------+--------------+--------------+--------------+--------------+

    Looking at my very unscientific benchmark, the right solution is
    probably a combination of different ideas from various individual
    implementations. And just a personal addition, all currently passing
    code bases are between 150-200 LOC.

    So, aiming at the go devs: would this be proof enough, that a
    bufio.Copy is both warranted for and non-trivial? :)

    Cheers,
    Peter



    On Fri, Jan 30, 2015 at 12:39 AM, Piers <goo...@o172.net <javascript:>>
    wrote:
    The shootout report is rounding the time to integer seconds in the
    bitrate calculation.

    Original:
    throughput := float64(size) / (1024 * 1024) /
    float64(elapsed/time.Second)

    Better:
    throughput := float64(size) / (1024 * 1024) / elapsed.Seconds()

    time.Seconds() returns a float64.

    Tested at: https://play.golang.org/p/Jj8R962bMB

    On Thursday, 29 January 2015 16:05:33 UTC, Péter Szilágyi wrote:

    I'm out, cont tomorrow.

    High throughput tests:
    io.Copy: test passed.
    [!] bufio.Copy: test passed.
    rogerpeppe.Copy: test passed.
    mattharden.Copy: test passed.
    yiyus.Copy: corrupt data on the output.
    egonelbre.Copy: test passed.
    jnml.Copy: test passed.
    ------------------------------------------------
    Stable input, stable output shootout:
    io.Copy: 3.414319512s 10.666667 mbps.
    [!] bufio.Copy: 3.472845836s 10.666667 mbps.
    rogerpeppe.Copy: 3.449201443s 10.666667 mbps.
    mattharden.Copy: 6.422282401s 5.333333 mbps.
    egonelbre.Copy: 3.386334062s 10.666667 mbps.
    jnml.Copy: 3.552567409s 10.666667 mbps.

    Stable input, bursty output shootout:
    io.Copy: 6.296563835s 5.333333 mbps.
    [!] bufio.Copy: 3.408388288s 10.666667 mbps.
    rogerpeppe.Copy: 5.982002029s 6.400000 mbps.
    egonelbre.Copy: 3.406534694s 10.666667 mbps.
    jnml.Copy: 3.444579099s 10.666667 mbps.

    Bursty input, stable output shootout:
    [!] bufio.Copy: 3.357498698s 10.666667 mbps.
    egonelbre.Copy: 3.355458497s 10.666667 mbps.
    jnml.Copy: 3.418233526s 10.666667 mbps.
    ------------------------------------------------
    High throughput benchmarks:
    +----------------+--------------+--------------+------------
    --+--------------+--------------+
    SOLUTION | BUF-333 | BUF-4155 | BUF-65359 |
    BUF-1048559 | BUF-16777301 |
    +----------------+--------------+--------------+------------
    --+--------------+--------------+
    [!] bufio.Copy | 351.547187ms | 95.685503ms | 103.636411ms |
    107.020277ms | 92.015499ms |
    egonelbre.Copy | 416.191585ms | 100.28949ms | 85.448071ms |
    66.624809ms | 95.744309ms |
    jnml.Copy | 161.064838ms | 160.426351ms | 150.014876ms |
    143.287792ms | 142.91072ms |
    +----------------+--------------+--------------+------------
    --+--------------+--------------+
    On Thu, Jan 29, 2015 at 5:57 PM, Jan Mercl wrote:

    On Thu Jan 29 2015 at 16:46:58 Péter Szilágyi <pet...@gmail.com>
    wrote:
    @Jan:

    fatal error: all goroutines are asleep - deadlock!
    Sorry, last minute change slipped without testing. Please see [0],
    thanks!

    [0]: https://github.com/karalabe/bufioprop/pull/2

    -j
    --
    You received this message because you are subscribed to the Google
    Groups "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send
    an email to golang-nuts...@googlegroups.com <javascript:>.
    For more options, visit https://groups.google.com/d/optout.
    --
    You received this message because you are subscribed to the Google Groups "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/d/optout.
  • Nick Craig-Wood at Jan 30, 2015 at 9:48 am

    On 30/01/15 09:01, Péter Szilágyi wrote:
    Nick's implementation got shot out with a deadlock.
    Just investigating that...

    If I look at the 3 go routines in my Copy routine in the deadlock
    backtrace I see two of them blocked in ranges over channels, but the
    third is blocked in io.ReadFull(src, buf).

    I don't see how my code could cause a deadlock in io.ReadFull so I think
    this must be an artifact of the test suite.

    Please correct me if I am wrong!

    Here is a marked up diff showing where the deadlocks are in my code

    diff --git a/shootout/ncw/bufio.go b/shootout/ncw/bufio.go
    index 35116af..9336e2e 100644
    --- a/shootout/ncw/bufio.go
    +++ b/shootout/ncw/bufio.go
    @@ -39,7 +39,7 @@ func Copy(dst io.Writer, src io.Reader, buffer int)
    (written int64, err error) {
       loop:
        for {
         buf := bufPool.Get().([]byte)
    - n, err := io.ReadFull(src, buf)
    + n, err := io.ReadFull(src, buf) // deadlock
         select {
         case chunks <- chunk{
          buf: buf,
    @@ -62,7 +62,7 @@ func Copy(dst io.Writer, src io.Reader, buffer int)
    (written int64, err error) {

       // Write chunks to the dst
       go func() {
    - for chunk := range chunks {
    + for chunk := range chunks { // deadlock
         n, err := dst.Write(chunk.buf[:chunk.n])
         written += int64(n)
         bufPool.Put(chunk.buf)
    @@ -75,7 +75,7 @@ func Copy(dst io.Writer, src io.Reader, buffer int)
    (written int64, err error) {
       }()

       // Collect errors and return them
    - for err = range errs {
    + for err = range errs { // deadlock
        close(finished)
       }
       return


    --
    Nick Craig-Wood <nick@craig-wood.com> -- http://www.craig-wood.com/nick

    --
    You received this message because you are subscribed to the Google Groups "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/d/optout.
  • Péter Szilágyi at Jan 30, 2015 at 9:55 am
    You are both right and wrong. Your solution specifically got me to add the
    latency benchmark. The deadlock is caused by readfull in combination with
    the latency tester not willing to send anything until the previous byte
    goes through.

    We could argue whether or not this is a desirable thing to have. Imho a
    copy should not block indefinitely waiting for new data and preventing
    handling over buffered data to the writer. As we cannot include a manual
    flush into the operation, the copy just has to figure it out. But this is
    my opinion, so I'm open for debate :)
    On Fri, Jan 30, 2015 at 11:48 AM, Nick Craig-Wood wrote:
    On 30/01/15 09:01, Péter Szilágyi wrote:
    Nick's implementation got shot out with a deadlock.
    Just investigating that...

    If I look at the 3 go routines in my Copy routine in the deadlock
    backtrace I see two of them blocked in ranges over channels, but the
    third is blocked in io.ReadFull(src, buf).

    I don't see how my code could cause a deadlock in io.ReadFull so I think
    this must be an artifact of the test suite.

    Please correct me if I am wrong!

    Here is a marked up diff showing where the deadlocks are in my code

    diff --git a/shootout/ncw/bufio.go b/shootout/ncw/bufio.go
    index 35116af..9336e2e 100644
    --- a/shootout/ncw/bufio.go
    +++ b/shootout/ncw/bufio.go
    @@ -39,7 +39,7 @@ func Copy(dst io.Writer, src io.Reader, buffer int)
    (written int64, err error) {
    loop:
    for {
    buf := bufPool.Get().([]byte)
    - n, err := io.ReadFull(src, buf)
    + n, err := io.ReadFull(src, buf) // deadlock
    select {
    case chunks <- chunk{
    buf: buf,
    @@ -62,7 +62,7 @@ func Copy(dst io.Writer, src io.Reader, buffer int)
    (written int64, err error) {

    // Write chunks to the dst
    go func() {
    - for chunk := range chunks {
    + for chunk := range chunks { // deadlock
    n, err := dst.Write(chunk.buf[:chunk.n])
    written += int64(n)
    bufPool.Put(chunk.buf)
    @@ -75,7 +75,7 @@ func Copy(dst io.Writer, src io.Reader, buffer int)
    (written int64, err error) {
    }()

    // Collect errors and return them
    - for err = range errs {
    + for err = range errs { // deadlock
    close(finished)
    }
    return


    --
    Nick Craig-Wood <nick@craig-wood.com> -- http://www.craig-wood.com/nick

    --
    You received this message because you are subscribed to the Google Groups
    "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send an
    email to golang-nuts+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/d/optout.
    --
    You received this message because you are subscribed to the Google Groups "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/d/optout.
  • Péter Szilágyi at Jan 30, 2015 at 9:59 am
    Back in the game ;)

    I've finally had a little time beside the shootout code to rework my
    solution to read directly into the internal buffer and not go through a
    chunk buffer.

    Manually disabled contenders:
           jnml.Copy: unrecoverable panic in tests.
            ncw.Copy: deadlock in latency benchmark.
    ------------------------------------------------

    High throughput tests:
             io.Copy: test passed.
      [!] bufio.Copy: test passed.
    rogerpeppe.Copy: test passed.
    mattharden.Copy: test passed.
          yiyus.Copy: corrupt data on the output.
      egonelbre.Copy: test passed.
      bakulshah.Copy: panic.
    ------------------------------------------------

    Stable input, stable output shootout:
             io.Copy: 3.400011414s 9.411733 mbps.
      [!] bufio.Copy: 3.383236222s 9.458400 mbps.
    rogerpeppe.Copy: 3.407702138s 9.390492 mbps.
    mattharden.Copy: 6.388047392s 5.009355 mbps.
      egonelbre.Copy: 3.389432005s 9.441110 mbps.

    Stable input, bursty output shootout:
             io.Copy: 6.297632169s 5.081275 mbps.
      [!] bufio.Copy: 3.392565759s 9.432389 mbps.
    rogerpeppe.Copy: 3.408777477s 9.387530 mbps.
      egonelbre.Copy: 3.411612384s 9.379729 mbps.

    Bursty input, stable output shootout:
      [!] bufio.Copy: 3.355081664s 9.537771 mbps.
    rogerpeppe.Copy: 3.392363361s 9.432952 mbps.
      egonelbre.Copy: 3.361821227s 9.518650 mbps.
    ------------------------------------------------

    Latency benchmarks:
      [!] bufio.Copy: latency 5.483µs.
    rogerpeppe.Copy: latency 5.33µs.
      egonelbre.Copy: latency 5.982µs.

    Throughput benchmarks (32 MB):
    +-----------------+-------------+--------------+--------------+--------------+--------------+
    SOLUTION | BUF-333 | BUF-4155 | BUF-65359 | BUF-1048559
    BUF-16777301 |
    +-----------------+-------------+--------------+--------------+--------------+--------------+
    [!] bufio.Copy | 127.05 mbps | 1121.30 mbps | 3241.47 mbps | 4171.52
    mbps | 1645.64 mbps |
    rogerpeppe.Copy | 184.22 mbps | 1357.18 mbps | 1880.57 mbps | 1744.43
    mbps | 1126.03 mbps |
    egonelbre.Copy | 100.67 mbps | 947.68 mbps | 3173.34 mbps | 3256.67
    mbps | 1379.49 mbps |
    +-----------------+-------------+--------------+--------------+--------------+--------------+
    On Fri, Jan 30, 2015 at 11:55 AM, Péter Szilágyi wrote:

    You are both right and wrong. Your solution specifically got me to add the
    latency benchmark. The deadlock is caused by readfull in combination with
    the latency tester not willing to send anything until the previous byte
    goes through.

    We could argue whether or not this is a desirable thing to have. Imho a
    copy should not block indefinitely waiting for new data and preventing
    handling over buffered data to the writer. As we cannot include a manual
    flush into the operation, the copy just has to figure it out. But this is
    my opinion, so I'm open for debate :)
    On Fri, Jan 30, 2015 at 11:48 AM, Nick Craig-Wood wrote:
    On 30/01/15 09:01, Péter Szilágyi wrote:
    Nick's implementation got shot out with a deadlock.
    Just investigating that...

    If I look at the 3 go routines in my Copy routine in the deadlock
    backtrace I see two of them blocked in ranges over channels, but the
    third is blocked in io.ReadFull(src, buf).

    I don't see how my code could cause a deadlock in io.ReadFull so I think
    this must be an artifact of the test suite.

    Please correct me if I am wrong!

    Here is a marked up diff showing where the deadlocks are in my code

    diff --git a/shootout/ncw/bufio.go b/shootout/ncw/bufio.go
    index 35116af..9336e2e 100644
    --- a/shootout/ncw/bufio.go
    +++ b/shootout/ncw/bufio.go
    @@ -39,7 +39,7 @@ func Copy(dst io.Writer, src io.Reader, buffer int)
    (written int64, err error) {
    loop:
    for {
    buf := bufPool.Get().([]byte)
    - n, err := io.ReadFull(src, buf)
    + n, err := io.ReadFull(src, buf) // deadlock
    select {
    case chunks <- chunk{
    buf: buf,
    @@ -62,7 +62,7 @@ func Copy(dst io.Writer, src io.Reader, buffer int)
    (written int64, err error) {

    // Write chunks to the dst
    go func() {
    - for chunk := range chunks {
    + for chunk := range chunks { // deadlock
    n, err := dst.Write(chunk.buf[:chunk.n])
    written += int64(n)
    bufPool.Put(chunk.buf)
    @@ -75,7 +75,7 @@ func Copy(dst io.Writer, src io.Reader, buffer int)
    (written int64, err error) {
    }()

    // Collect errors and return them
    - for err = range errs {
    + for err = range errs { // deadlock
    close(finished)
    }
    return


    --
    Nick Craig-Wood <nick@craig-wood.com> -- http://www.craig-wood.com/nick

    --
    You received this message because you are subscribed to the Google Groups
    "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send an
    email to golang-nuts+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/d/optout.
    --
    You received this message because you are subscribed to the Google Groups "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/d/optout.
  • Jan Mercl at Jan 30, 2015 at 10:26 am

    On Fri Jan 30 2015 at 10:59:55 Péter Szilágyi wrote:
    Back in the game ;)
    Please pull[0] me in ;-)

    FYI: The latency benchmark numbers produced on two different runs of _the
    same_ binary:

    ------------------------------------------------

    Latency benchmarks:
    [!] bufio.Copy: latency 7.102µs.
    rogerpeppe.Copy: latency 4.825µs.
    egonelbre.Copy: latency 5.718µs.
    jnml.Copy: latency 15.289µs.
    ------------------------------------------------

    Latency benchmarks:
    [!] bufio.Copy: latency 13.081µs.
    rogerpeppe.Copy: latency 7.097µs.
    egonelbre.Copy: latency 9.344µs.
    jnml.Copy: latency 7.133µs.

       [0]: https://github.com/karalabe/bufioprop/pull/6

    -j

    --
    You received this message because you are subscribed to the Google Groups "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/d/optout.
  • Péter Szilágyi at Jan 30, 2015 at 10:37 am
    Yup, the latency was benchmarked on a very small number of iterations.
    Increased it by 3 orders of magnitude; Now we're at:

    Latency benchmarks:
      [!] bufio.Copy: latency 5.555µs.
    rogerpeppe.Copy: latency 5.499µs.
      egonelbre.Copy: latency 5.769µs.
           jnml.Copy: latency 5.18µs.

    Throughput benchmarks (32 MB):
    +-----------------+-------------+--------------+--------------+--------------+--------------+
    SOLUTION | BUF-333 | BUF-4155 | BUF-65359 | BUF-1048559
    BUF-16777301 |
    +-----------------+-------------+--------------+--------------+--------------+--------------+
    [!] bufio.Copy | 125.85 mbps | 1074.15 mbps | 3211.47 mbps | 4150.30
    mbps | 1417.53 mbps |
    rogerpeppe.Copy | 181.75 mbps | 1402.07 mbps | 1788.21 mbps | 1919.40
    mbps | 883.57 mbps |
    egonelbre.Copy | 97.59 mbps | 934.46 mbps | 2768.36 mbps | 3493.42
    mbps | 1421.25 mbps |
    jnml.Copy | 206.96 mbps | 1642.54 mbps | 2934.53 mbps | 2763.54
    mbps | 2915.44 mbps |
    +-----------------+-------------+--------------+--------------+--------------+--------------+

    On Fri, Jan 30, 2015 at 12:26 PM, Jan Mercl wrote:
    On Fri Jan 30 2015 at 10:59:55 Péter Szilágyi wrote:

    Back in the game ;)
    Please pull[0] me in ;-)

    FYI: The latency benchmark numbers produced on two different runs of _the
    same_ binary:

    ------------------------------------------------

    Latency benchmarks:
    [!] bufio.Copy: latency 7.102µs.
    rogerpeppe.Copy: latency 4.825µs.
    egonelbre.Copy: latency 5.718µs.
    jnml.Copy: latency 15.289µs.
    ------------------------------------------------

    Latency benchmarks:
    [!] bufio.Copy: latency 13.081µs.
    rogerpeppe.Copy: latency 7.097µs.
    egonelbre.Copy: latency 9.344µs.
    jnml.Copy: latency 7.133µs.

    [0]: https://github.com/karalabe/bufioprop/pull/6

    -j

    --
    You received this message because you are subscribed to the Google Groups
    "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send an
    email to golang-nuts+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/d/optout.
    --
    You received this message because you are subscribed to the Google Groups "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/d/optout.
  • Péter Szilágyi at Jan 30, 2015 at 11:24 am
    Hey all,

       First up, Jan, your solution does not solve the problem. You've put a
    hard limit on the buffer size (if my parameter is > 2048, you simply use
    2048 * 1000 buffer chunks). Your solution passed the shootout since I've
    only tested using 1MB bursts (which fully fit into your buffer) so that we
    don't have to wait an eternity for the tests to run. However, in a real
    world usage scenario (the one that prompted this proposal), the output
    burst is around 100MB +- (gsutil chunked upload). To highlight that your
    solution indeed fails, I've modified the shootout a bit to use 1/10MB
    bursts instread of 100KB/1MB I've used until now. As the results show, your
    code indeed fails (simple because you run out of buffer capacity, and stall
    the stream).

       On another note, I've added a small magic to my solution sort out fast
    reads/writes without resorting to costly syncs. The current leaderboards:

    Manually disabled contenders:
            ncw.Copy: deadlock in latency benchmark.
    ------------------------------------------------

    High throughput tests:
             io.Copy: test passed.
      [!] bufio.Copy: test passed.
    rogerpeppe.Copy: test passed.
    mattharden.Copy: test passed.
          yiyus.Copy: corrupt data on the output.
      egonelbre.Copy: test passed.
           jnml.Copy: test passed.
      bakulshah.Copy: panic.
    ------------------------------------------------

    Stable input, stable output shootout:
             io.Copy: 3.261154638s 9.812476 mbps.
      [!] bufio.Copy: 3.25456565s 9.832341 mbps.
    rogerpeppe.Copy: 3.259815301s 9.816507 mbps.
    mattharden.Copy: 5.789310285s 5.527429 mbps.
      egonelbre.Copy: 3.254562089s 9.832352 mbps.
           jnml.Copy: 3.255943818s 9.828179 mbps.

    Stable input, bursty output shootout:
             io.Copy: 5.949246121s 5.378833 mbps.
      [!] bufio.Copy: 3.962746376s 8.075208 mbps.
    rogerpeppe.Copy: 3.958733536s 8.083393 mbps.
      egonelbre.Copy: 3.94813974s 8.105083 mbps.
           jnml.Copy: 5.150907256s 6.212498 mbps.

    Bursty input, stable output shootout:
      [!] bufio.Copy: 3.149418427s 10.160606 mbps.
    rogerpeppe.Copy: 3.168480415s 10.099479 mbps.
      egonelbre.Copy: 3.154018838s 10.145786 mbps.
    ------------------------------------------------

    Latency benchmarks:
      [!] bufio.Copy: latency 4.595µs.
    rogerpeppe.Copy: latency 5.5µs.
      egonelbre.Copy: latency 5.737µs.

    Throughput benchmarks (32 MB):
    +-----------------+-------------+--------------+--------------+--------------+--------------+
    SOLUTION | BUF-333 | BUF-4155 | BUF-65359 | BUF-1048559
    BUF-16777301 |
    +-----------------+-------------+--------------+--------------+--------------+--------------+
    [!] bufio.Copy | 513.52 mbps | 2724.75 mbps | 3859.12 mbps | 4179.00
    mbps | 1448.95 mbps |
    rogerpeppe.Copy | 184.00 mbps | 1341.57 mbps | 1779.08 mbps | 2002.19
    mbps | 1155.85 mbps |
    egonelbre.Copy | 101.88 mbps | 953.43 mbps | 3326.92 mbps | 3523.91
    mbps | 1225.92 mbps |
    +-----------------+-------------+--------------+--------------+--------------+--------------+

    Cheers,
       Peter
    On Fri, Jan 30, 2015 at 12:37 PM, Péter Szilágyi wrote:

    Yup, the latency was benchmarked on a very small number of iterations.
    Increased it by 3 orders of magnitude; Now we're at:

    Latency benchmarks:
    [!] bufio.Copy: latency 5.555µs.
    rogerpeppe.Copy: latency 5.499µs.
    egonelbre.Copy: latency 5.769µs.
    jnml.Copy: latency 5.18µs.

    Throughput benchmarks (32 MB):

    +-----------------+-------------+--------------+--------------+--------------+--------------+
    SOLUTION | BUF-333 | BUF-4155 | BUF-65359 |
    BUF-1048559 | BUF-16777301 |

    +-----------------+-------------+--------------+--------------+--------------+--------------+
    [!] bufio.Copy | 125.85 mbps | 1074.15 mbps | 3211.47 mbps | 4150.30
    mbps | 1417.53 mbps |
    rogerpeppe.Copy | 181.75 mbps | 1402.07 mbps | 1788.21 mbps | 1919.40
    mbps | 883.57 mbps |
    egonelbre.Copy | 97.59 mbps | 934.46 mbps | 2768.36 mbps | 3493.42
    mbps | 1421.25 mbps |
    jnml.Copy | 206.96 mbps | 1642.54 mbps | 2934.53 mbps | 2763.54
    mbps | 2915.44 mbps |

    +-----------------+-------------+--------------+--------------+--------------+--------------+

    On Fri, Jan 30, 2015 at 12:26 PM, Jan Mercl wrote:
    On Fri Jan 30 2015 at 10:59:55 Péter Szilágyi wrote:

    Back in the game ;)
    Please pull[0] me in ;-)

    FYI: The latency benchmark numbers produced on two different runs of _the
    same_ binary:

    ------------------------------------------------

    Latency benchmarks:
    [!] bufio.Copy: latency 7.102µs.
    rogerpeppe.Copy: latency 4.825µs.
    egonelbre.Copy: latency 5.718µs.
    jnml.Copy: latency 15.289µs.
    ------------------------------------------------

    Latency benchmarks:
    [!] bufio.Copy: latency 13.081µs.
    rogerpeppe.Copy: latency 7.097µs.
    egonelbre.Copy: latency 9.344µs.
    jnml.Copy: latency 7.133µs.

    [0]: https://github.com/karalabe/bufioprop/pull/6

    -j

    --
    You received this message because you are subscribed to the Google Groups
    "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send an
    email to golang-nuts+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/d/optout.
    --
    You received this message because you are subscribed to the Google Groups "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/d/optout.
  • Jan Mercl at Jan 30, 2015 at 11:54 am

    On Fri Jan 30 2015 at 12:24:33 Péter Szilágyi wrote:

    First up, Jan, your solution does not solve the problem. You've put a
    hard limit on the buffer size (if my parameter is > 2048, you simply use
    2048 * 1000 buffer chunks). Your solution passed the shootout since I've
    only tested using 1MB bursts (which fully fit into your buffer) so that we
    don't have to wait an eternity for the tests to run. However, in a real
    world usage scenario (the one that prompted this proposal), the output
    burst is around 100MB +- (gsutil chunked upload). To highlight that your
    solution indeed fails, I've modified the shootout a bit to use 1/10MB
    bursts instread of 100KB/1MB I've used until now. As the results show, your
    code indeed fails (simple because you run out of buffer capacity, and stall
    the stream).
    I interpreted the bufferSize parameter as an advisory maximum. The channel
    capacity limits the maximum overlap or the reader and the writer chunks
    and yes, that indirectly limits the maximum (combined) buffer size.

    You interpretation is IMO not a good design choice. If the reader and
    writer can roughly keep the same pace, one or two small buffers is all what
    is needed, the rest of a potentially huge buffer is just wasting memory
    resources. My solution adapts dynamically and also limits memory used above
    sane maximum (4096*1000), which is what I'm used to do ;-)

    Anyway, please pull the non capping version:
    https://github.com/karalabe/bufioprop/pull/8

    -j

    --
    You received this message because you are subscribed to the Google Groups "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/d/optout.
  • Péter Szilágyi at Jan 30, 2015 at 12:13 pm
    I digress, the whole point behind the bufio.Copy proposal is to handle
    cases where gigantic bursts *will* occur on one end or another. If you
    expect constant throughput of smallish chunks on both ends, then a simple
    io.Copy will suffice as there will probably negligible stalls and
    occasional bursts/hiccups will just sort themselves out.

    On the other hand in the concrete example cases I've brought up I know for
    sure that bursty operation is the norm, and I also know approximately the
    burst size. With these two information I can create a buffered copy that
    can more or less eliminate the stall. If however the copy does not use the
    information I've given it (buffer size in particular), then obviously it
    will try and pick a sane value, that may nonetheless be very wrong. But if
    it picks the value itself, then I'm back to square one: my 30GB transfer
    will complete in 6h opposed to 3ish if it would saturate both network links.

    The current results are:

    Latency benchmarks (GOMAXPROCS = 1):
      [!] bufio.Copy: latency 4.152µs.
    rogerpeppe.Copy: latency 5.206µs.
      egonelbre.Copy: latency 5.308µs.
           jnml.Copy: latency 4.772µs.

    Latency benchmarks (GOMAXPROCS = 8):
      [!] bufio.Copy: latency 4.521µs.
    rogerpeppe.Copy: latency 5.542µs.
      egonelbre.Copy: latency 5.719µs.
           jnml.Copy: latency 5.273µs.

    Throughput benchmarks (GOMAXPROCS = 1):
    +-----------------+-------------+--------------+--------------+--------------+--------------+
    SOLUTION | BUF-333 | BUF-4155 | BUF-65359 | BUF-1048559
    BUF-16777301 |
    +-----------------+-------------+--------------+--------------+--------------+--------------+
    [!] bufio.Copy | 519.48 mbps | 2937.85 mbps | 4313.51 mbps | 4484.70
    mbps | 1648.85 mbps |
    rogerpeppe.Copy | 199.43 mbps | 1526.54 mbps | 2641.20 mbps | 2766.34
    mbps | 1202.03 mbps |
    egonelbre.Copy | 107.80 mbps | 1019.21 mbps | 3553.09 mbps | 4210.60
    mbps | 1612.40 mbps |
    jnml.Copy | 219.71 mbps | 1824.06 mbps | 2846.28 mbps | 2966.26
    mbps | 1373.20 mbps |
    +-----------------+-------------+--------------+--------------+--------------+--------------+

    Throughput benchmarks (GOMAXPROCS = 8):
    +-----------------+-------------+--------------+--------------+--------------+--------------+
    SOLUTION | BUF-333 | BUF-4155 | BUF-65359 | BUF-1048559
    BUF-16777301 |
    +-----------------+-------------+--------------+--------------+--------------+--------------+
    [!] bufio.Copy | 517.67 mbps | 2724.47 mbps | 3884.24 mbps | 4239.76
    mbps | 1636.77 mbps |
    rogerpeppe.Copy | 173.89 mbps | 1410.89 mbps | 1781.03 mbps | 1846.18
    mbps | 1133.70 mbps |
    egonelbre.Copy | 99.44 mbps | 969.11 mbps | 3372.55 mbps | 3577.70
    mbps | 1396.50 mbps |
    jnml.Copy | 193.76 mbps | 1636.79 mbps | 2927.06 mbps | 3027.80
    mbps | 2564.39 mbps |
    +-----------------+-------------+--------------+--------------+--------------+--------------+



    On Fri, Jan 30, 2015 at 1:54 PM, Jan Mercl wrote:
    On Fri Jan 30 2015 at 12:24:33 Péter Szilágyi wrote:

    First up, Jan, your solution does not solve the problem. You've put a
    hard limit on the buffer size (if my parameter is > 2048, you simply use
    2048 * 1000 buffer chunks). Your solution passed the shootout since I've
    only tested using 1MB bursts (which fully fit into your buffer) so that we
    don't have to wait an eternity for the tests to run. However, in a real
    world usage scenario (the one that prompted this proposal), the output
    burst is around 100MB +- (gsutil chunked upload). To highlight that your
    solution indeed fails, I've modified the shootout a bit to use 1/10MB
    bursts instread of 100KB/1MB I've used until now. As the results show, your
    code indeed fails (simple because you run out of buffer capacity, and stall
    the stream).
    I interpreted the bufferSize parameter as an advisory maximum. The channel
    capacity limits the maximum overlap or the reader and the writer chunks
    and yes, that indirectly limits the maximum (combined) buffer size.

    You interpretation is IMO not a good design choice. If the reader and
    writer can roughly keep the same pace, one or two small buffers is all what
    is needed, the rest of a potentially huge buffer is just wasting memory
    resources. My solution adapts dynamically and also limits memory used above
    sane maximum (4096*1000), which is what I'm used to do ;-)

    Anyway, please pull the non capping version:
    https://github.com/karalabe/bufioprop/pull/8

    -j

    --
    You received this message because you are subscribed to the Google Groups
    "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send an
    email to golang-nuts+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/d/optout.
    --
    You received this message because you are subscribed to the Google Groups "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/d/optout.
  • Péter Szilágyi at Jan 30, 2015 at 1:13 pm
    Thanks to Egon, we now have proper memory and GC benchmarking too. There
    are interesting insights too :)

        - Jan's channel solution can sometimes outperform the buffer based ones
        particularly because it can stay in the processor cache, nor requiring to
        touch main memory (see throughput benchmark, gomaxprocs 8, max size).
        - On the other hand, Jan's solution manages to allocate a massive 64MB
        internal buffers during the latency benchmarks, which requested 1KB buffers
        only.

    So, I'm guessing that if we could somehow optimize the buffer based
    solutions to stay in the cpu cache, then probably that would be the winning
    solution :)

    Stable input, stable output shootout:
             io.Copy: 3.262087641s 9.809669 mbps 116 allocs 40248 B
      [!] bufio.Copy: 3.249989963s 9.846184 mbps 151 allocs 12592624 B
    rogerpeppe.Copy: 3.262495092s 9.808444 mbps 129 allocs 12656592 B
    mattharden.Copy: 5.785475748s 5.531092 mbps 101 allocs 25172264 B
      egonelbre.Copy: 3.251709582s 9.840977 mbps 127 allocs 12591008 B
           jnml.Copy: 3.265006775s 9.800898 mbps 8563 allocs 1461296 B

    Stable input, bursty output shootout:
             io.Copy: 5.964191695s 5.365354 mbps 76 allocs 37584 B
      [!] bufio.Copy: 3.954520741s 8.092005 mbps 91 allocs 12589024 B
    rogerpeppe.Copy: 3.963458194s 8.073757 mbps 83 allocs 12653648 B
      egonelbre.Copy: 3.952431105s 8.096283 mbps 83 allocs 12588224 B
           jnml.Copy: 3.969015283s 8.062453 mbps 10567 allocs 9847088 B

    Bursty input, stable output shootout:
      [!] bufio.Copy: 3.147990951s 10.165213 mbps 83 allocs 12588064 B
    rogerpeppe.Copy: 3.16644288s 10.105977 mbps 84 allocs 12653696 B
      egonelbre.Copy: 3.154456319s 10.144379 mbps 87 allocs 12588480 B
           jnml.Copy: 3.178926218s 10.066292 mbps 10834 allocs 10896352 B
    ------------------------------------------------

    Latency benchmarks (GOMAXPROCS = 1):
      [!] bufio.Copy: 4.539µs 34 allocs 2544 B.
    rogerpeppe.Copy: 5.499µs 20 allocs 67440 B.
      egonelbre.Copy: 4.741µs 30 allocs 2432 B.
           jnml.Copy: 5.166µs 2000017 allocs 64001824 B.

    Latency benchmarks (GOMAXPROCS = 8):
      [!] bufio.Copy: 4.505µs 483 allocs 32528 B.
    rogerpeppe.Copy: 5.483µs 226 allocs 80624 B.
      egonelbre.Copy: 4.677µs 510 allocs 33152 B.
           jnml.Copy: 5.326µs 2000252 allocs 64016864 B.

    Throughput (GOMAXPROCS = 1) (256 MB):

    +-----------------+--------+---------+---------+---------+----------+
    THROUGHPUT | 333 | 4155 | 65359 | 1048559 | 16777301 |
    +-----------------+--------+---------+---------+---------+----------+
    [!] bufio.Copy | 528.90 | 2855.72 | 4218.94 | 4365.86 | 1572.65 |
    rogerpeppe.Copy | 195.11 | 1504.33 | 2589.08 | 2772.46 | 1174.39 |
    egonelbre.Copy | 249.37 | 1729.38 | 3986.44 | 4226.69 | 1561.65 |
    jnml.Copy | 215.65 | 1729.10 | 2712.09 | 2823.22 | 1312.72 |
    +-----------------+--------+---------+---------+---------+----------+

    +-----------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+
    ALLOCS/BYTES | 333 | 4155 |
    65359 | 1048559 | 16777301 |
    +-----------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+
    [!] bufio.Copy | ( 24 / 1280) | ( 24 / 5536) | (
      24 / 66464) | ( 24 / 1049504) | ( 24 / 16786336) |
    rogerpeppe.Copy | ( 17 / 9368) | ( 16 / 13560) | (
      16 / 74488) | ( 16 / 1057528) | ( 16 / 16794360) |
    egonelbre.Copy | ( 21 / 1248) | ( 21 / 5504) | (
      21 / 66432) | ( 21 / 1049472) | ( 21 / 16786304) |
    jnml.Copy | ( 806126 / 25796544) | ( 65549 / 2101824) | (
    65564 / 2159792) | ( 65804 / 3153136) | ( 69645 / 19055088) |
    +-----------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+

    Throughput (GOMAXPROCS = 8) (256 MB):

    +-----------------+--------+---------+---------+---------+----------+
    THROUGHPUT | 333 | 4155 | 65359 | 1048559 | 16777301 |
    +-----------------+--------+---------+---------+---------+----------+
    [!] bufio.Copy | 502.58 | 2674.00 | 4035.22 | 4086.62 | 1564.71 |
    rogerpeppe.Copy | 186.50 | 1342.21 | 1969.63 | 1697.48 | 1107.61 |
    egonelbre.Copy | 342.35 | 2108.85 | 3574.06 | 3865.20 | 1441.81 |
    jnml.Copy | 200.20 | 1605.59 | 2831.43 | 2909.83 | 2301.07 |
    +-----------------+--------+---------+---------+---------+----------+

    +-----------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+
    ALLOCS/BYTES | 333 | 4155 |
    65359 | 1048559 | 16777301 |
    +-----------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+
    [!] bufio.Copy | ( 91 / 6848) | ( 40 / 6560) | (
      68 / 69280) | ( 480 / 1078688) | ( 56 / 16788384) |
    rogerpeppe.Copy | ( 17 / 10600) | ( 17 / 14856) | (
      21 / 76040) | ( 23 / 1059208) | ( 19 / 16795784) |
    egonelbre.Copy | ( 136 / 8608) | ( 47 / 7168) | (
      42 / 68000) | ( 30 / 1050272) | ( 27 / 16787136) |
    jnml.Copy | ( 806126 / 25796544) | ( 65549 / 2101824) | (
    65565 / 2160080) | ( 65804 / 3153136) | ( 67105 / 8655520) |
    +-----------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+

    On Fri, Jan 30, 2015 at 2:13 PM, Péter Szilágyi wrote:

    I digress, the whole point behind the bufio.Copy proposal is to handle
    cases where gigantic bursts *will* occur on one end or another. If you
    expect constant throughput of smallish chunks on both ends, then a simple
    io.Copy will suffice as there will probably negligible stalls and
    occasional bursts/hiccups will just sort themselves out.

    On the other hand in the concrete example cases I've brought up I know for
    sure that bursty operation is the norm, and I also know approximately the
    burst size. With these two information I can create a buffered copy that
    can more or less eliminate the stall. If however the copy does not use the
    information I've given it (buffer size in particular), then obviously it
    will try and pick a sane value, that may nonetheless be very wrong. But if
    it picks the value itself, then I'm back to square one: my 30GB transfer
    will complete in 6h opposed to 3ish if it would saturate both network links.

    The current results are:

    Latency benchmarks (GOMAXPROCS = 1):
    [!] bufio.Copy: latency 4.152µs.
    rogerpeppe.Copy: latency 5.206µs.
    egonelbre.Copy: latency 5.308µs.
    jnml.Copy: latency 4.772µs.

    Latency benchmarks (GOMAXPROCS = 8):
    [!] bufio.Copy: latency 4.521µs.
    rogerpeppe.Copy: latency 5.542µs.
    egonelbre.Copy: latency 5.719µs.
    jnml.Copy: latency 5.273µs.

    Throughput benchmarks (GOMAXPROCS = 1):

    +-----------------+-------------+--------------+--------------+--------------+--------------+
    SOLUTION | BUF-333 | BUF-4155 | BUF-65359 |
    BUF-1048559 | BUF-16777301 |

    +-----------------+-------------+--------------+--------------+--------------+--------------+
    [!] bufio.Copy | 519.48 mbps | 2937.85 mbps | 4313.51 mbps | 4484.70
    mbps | 1648.85 mbps |
    rogerpeppe.Copy | 199.43 mbps | 1526.54 mbps | 2641.20 mbps | 2766.34
    mbps | 1202.03 mbps |
    egonelbre.Copy | 107.80 mbps | 1019.21 mbps | 3553.09 mbps | 4210.60
    mbps | 1612.40 mbps |
    jnml.Copy | 219.71 mbps | 1824.06 mbps | 2846.28 mbps | 2966.26
    mbps | 1373.20 mbps |

    +-----------------+-------------+--------------+--------------+--------------+--------------+

    Throughput benchmarks (GOMAXPROCS = 8):

    +-----------------+-------------+--------------+--------------+--------------+--------------+
    SOLUTION | BUF-333 | BUF-4155 | BUF-65359 |
    BUF-1048559 | BUF-16777301 |

    +-----------------+-------------+--------------+--------------+--------------+--------------+
    [!] bufio.Copy | 517.67 mbps | 2724.47 mbps | 3884.24 mbps | 4239.76
    mbps | 1636.77 mbps |
    rogerpeppe.Copy | 173.89 mbps | 1410.89 mbps | 1781.03 mbps | 1846.18
    mbps | 1133.70 mbps |
    egonelbre.Copy | 99.44 mbps | 969.11 mbps | 3372.55 mbps | 3577.70
    mbps | 1396.50 mbps |
    jnml.Copy | 193.76 mbps | 1636.79 mbps | 2927.06 mbps | 3027.80
    mbps | 2564.39 mbps |

    +-----------------+-------------+--------------+--------------+--------------+--------------+



    On Fri, Jan 30, 2015 at 1:54 PM, Jan Mercl wrote:
    On Fri Jan 30 2015 at 12:24:33 Péter Szilágyi wrote:

    First up, Jan, your solution does not solve the problem. You've put a
    hard limit on the buffer size (if my parameter is > 2048, you simply use
    2048 * 1000 buffer chunks). Your solution passed the shootout since I've
    only tested using 1MB bursts (which fully fit into your buffer) so that we
    don't have to wait an eternity for the tests to run. However, in a real
    world usage scenario (the one that prompted this proposal), the output
    burst is around 100MB +- (gsutil chunked upload). To highlight that your
    solution indeed fails, I've modified the shootout a bit to use 1/10MB
    bursts instread of 100KB/1MB I've used until now. As the results show, your
    code indeed fails (simple because you run out of buffer capacity, and stall
    the stream).
    I interpreted the bufferSize parameter as an advisory maximum. The
    channel capacity limits the maximum overlap or the reader and the writer
    chunks and yes, that indirectly limits the maximum (combined) buffer size.

    You interpretation is IMO not a good design choice. If the reader and
    writer can roughly keep the same pace, one or two small buffers is all what
    is needed, the rest of a potentially huge buffer is just wasting memory
    resources. My solution adapts dynamically and also limits memory used above
    sane maximum (4096*1000), which is what I'm used to do ;-)

    Anyway, please pull the non capping version:
    https://github.com/karalabe/bufioprop/pull/8

    -j

    --
    You received this message because you are subscribed to the Google Groups
    "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send an
    email to golang-nuts+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/d/optout.
    --
    You received this message because you are subscribed to the Google Groups "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/d/optout.
  • Jan Mercl at Jan 30, 2015 at 3:19 pm

    On Fri Jan 30 2015 at 14:13:07 Péter Szilágyi wrote:
    - On the other hand, Jan's solution manages to allocate a massive 64MB
    internal buffers during the latency benchmarks, which requested 1KB buffers
    only.

    Memory overallocation fixed, please pull[0].
       [0]: https://github.com/karalabe/bufioprop/pull/10

    -j

    --
    You received this message because you are subscribed to the Google Groups "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/d/optout.
  • Roger peppe at Jan 30, 2015 at 3:44 pm
    My take is that all these proposals are essentially creating a data
    pipe and using it for the copy. If we're going to do that, we may
    as well implement an actual buffered pipe, because that's useful
    for more than just Copy (you can pass the write end directly
    to a Go function that expects an io.Writer for example).

    I can't see that approach necessarily entails any significant
    performance sacrifice.

    On 30 January 2015 at 15:19, Jan Mercl wrote:
    On Fri Jan 30 2015 at 14:13:07 Péter Szilágyi wrote:

    On the other hand, Jan's solution manages to allocate a massive 64MB
    internal buffers during the latency benchmarks, which requested 1KB buffers
    only.
    Memory overallocation fixed, please pull[0].

    [0]: https://github.com/karalabe/bufioprop/pull/10

    -j

    --
    You received this message because you are subscribed to the Google Groups
    "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send an
    email to golang-nuts+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/d/optout.
    --
    You received this message because you are subscribed to the Google Groups "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/d/optout.
  • Péter Szilágyi at Jan 30, 2015 at 4:08 pm
    I've been also playing with this idea. One concrete instance where a
    buffered pipe would be more useful/flexible than a buffered copy is if you
    want to keep the buffer state, but maybe intermittently do something else
    too (e.g. progress report after every N bytes copied). It would play nicely
    with constructs such as (pseudo go):

    for !done {
         io.CopyN(bufDst, src, chunk)
         fmt.Println("Chunk done")
    }

    If we put this functionality directly into Copy, then I guess it's much
    more painful to do such a composable thing. I'm open to this idea of
    converting the proposal into a bufio.Pipe(). Though I'll check how it would
    fit into my design (too see if something's non-obviously wrong with the
    idea).

    In the meanwhile, running Jan's latest fix: all lately updated contenders
    are converging to the same speed, though the internal buffering ones
    outperforming the channels on small ops where the threading overhead hits
    the channel.

    Latency benchmarks (GOMAXPROCS = 1):
      [!] bufio.Copy: 4.282µs 34 allocs 2544 B.
    rogerpeppe.Copy: 5.304µs 20 allocs 67440 B.
      egonelbre.Copy: 4.438µs 30 allocs 2432 B.
           jnml.Copy: 4.812µs 17 allocs 1904 B.

    Latency benchmarks (GOMAXPROCS = 8):
      [!] bufio.Copy: 4.54µs 557 allocs 36016 B.
    rogerpeppe.Copy: 5.547µs 150 allocs 75760 B.
      egonelbre.Copy: 4.738µs 632 allocs 42240 B.
           jnml.Copy: 5.05µs 249 allocs 16752 B.

    Throughput (GOMAXPROCS = 1) (256 MB):

    +-----------------+--------+---------+---------+---------+----------+
    THROUGHPUT | 333 | 4155 | 65359 | 1048559 | 16777301 |
    +-----------------+--------+---------+---------+---------+----------+
    [!] bufio.Copy | 522.82 | 2880.92 | 4230.01 | 4360.93 | 1639.96 |
    rogerpeppe.Copy | 203.08 | 1526.76 | 2609.97 | 2783.97 | 1198.19 |
    egonelbre.Copy | 255.64 | 1791.27 | 4012.22 | 4278.48 | 1624.37 |
    jnml.Copy | 229.16 | 1784.35 | 4024.98 | 4374.86 | 1716.81 |
    +-----------------+--------+---------+---------+---------+----------+

    +-----------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+
    ALLOCS/BYTES | 333 | 4155 |
    65359 | 1048559 | 16777301 |
    +-----------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+
    [!] bufio.Copy | ( 24 / 1280) | ( 24 / 5536) | (
      24 / 66464) | ( 24 / 1049504) | ( 24 / 16786336) |
    rogerpeppe.Copy | ( 17 / 9368) | ( 16 / 13560) | (
      16 / 74488) | ( 16 / 1057528) | ( 16 / 16794360) |
    egonelbre.Copy | ( 21 / 1248) | ( 21 / 5504) | (
      21 / 66432) | ( 21 / 1049472) | ( 21 / 16786304) |
    jnml.Copy | ( 12 / 976) | ( 12 / 5232) | (
      12 / 66160) | ( 12 / 1049200) | ( 27 / 16778848) |
    +-----------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+

    Throughput (GOMAXPROCS = 8) (256 MB):

    +-----------------+--------+---------+---------+---------+----------+
    THROUGHPUT | 333 | 4155 | 65359 | 1048559 | 16777301 |
    +-----------------+--------+---------+---------+---------+----------+
    [!] bufio.Copy | 507.82 | 2689.22 | 3768.93 | 4071.21 | 1637.15 |
    rogerpeppe.Copy | 186.95 | 1403.50 | 1752.47 | 1816.09 | 1147.60 |
    egonelbre.Copy | 345.16 | 2196.97 | 3671.74 | 3919.41 | 1533.35 |
    jnml.Copy | 204.41 | 1712.08 | 3176.28 | 4036.61 | 1664.63 |
    +-----------------+--------+---------+---------+---------+----------+

    +-----------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+
    ALLOCS/BYTES | 333 | 4155 |
    65359 | 1048559 | 16777301 |
    +-----------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+
    [!] bufio.Copy | ( 88 / 5376) | ( 40 / 6560) | (
      88 / 70560) | ( 536 / 1082272) | ( 56 / 16788384) |
    rogerpeppe.Copy | ( 18 / 10664) | ( 18 / 14920) | (
      23 / 76168) | ( 21 / 1059080) | ( 20 / 16796072) |
    egonelbre.Copy | ( 96 / 6496) | ( 47 / 7616) | (
      46 / 68256) | ( 31 / 1050560) | ( 27 / 16786688) |
    jnml.Copy | ( 12 / 976) | ( 13 / 5520) | (
      13 / 66448) | ( 13 / 1049488) | ( 28 / 16779136) |
    +-----------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+

    On Fri, Jan 30, 2015 at 5:44 PM, roger peppe wrote:

    My take is that all these proposals are essentially creating a data
    pipe and using it for the copy. If we're going to do that, we may
    as well implement an actual buffered pipe, because that's useful
    for more than just Copy (you can pass the write end directly
    to a Go function that expects an io.Writer for example).

    I can't see that approach necessarily entails any significant
    performance sacrifice.

    On 30 January 2015 at 15:19, Jan Mercl wrote:
    On Fri Jan 30 2015 at 14:13:07 Péter Szilágyi wrote:

    On the other hand, Jan's solution manages to allocate a massive 64MB
    internal buffers during the latency benchmarks, which requested 1KB
    buffers
    only.
    Memory overallocation fixed, please pull[0].

    [0]: https://github.com/karalabe/bufioprop/pull/10

    -j

    --
    You received this message because you are subscribed to the Google Groups
    "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send an
    email to golang-nuts+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/d/optout.
    --
    You received this message because you are subscribed to the Google Groups
    "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send an
    email to golang-nuts+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/d/optout.
    --
    You received this message because you are subscribed to the Google Groups "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/d/optout.
  • Péter Szilágyi at Jan 30, 2015 at 7:02 pm
    Hi all,

       A few modifications went into the benchmarks:

        - Roger added a repeating reader in order not to need a giant
        pre-allocated test data blob, makes startup faster, also should better
        reflect algo performance.
        - To prevent measuring occasional hiccups, the throughput benchmarks not
        use best-out-of-three. Scores are much stabler.

    Implementation wise the updates are:

        - Roger sent in a full Pipe based solution, arguing that it's way more
        flexible than a simple copy (I agree).
        - Jan sent in a nice optimization that seems to beat other algos in the
        case of very large buffers.
        - I've also ported my solution to a pipe version (both coexisting
        currently). That codebase needs to be cleaned up, it was to see if it works
        ok.

    With these, the current standing is:

    Latency benchmarks (GOMAXPROCS = 1):
           [!] bufio.Copy: 4.322µs 37 allocs 2736 B.
       [!] bufio.PipeCopy: 4.396µs 22 allocs 2224 B.
          rogerpeppe.Copy: 4.666µs 20 allocs 2032 B.
           egonelbre.Copy: 4.681µs 29 allocs 2368 B.
                jnml.Copy: 4.964µs 18 allocs 1936 B.

    Latency benchmarks (GOMAXPROCS = 8):
           [!] bufio.Copy: 4.525µs 398 allocs 25840 B.
       [!] bufio.PipeCopy: 4.58µs 632 allocs 41264 B.
          rogerpeppe.Copy: 4.918µs 325 allocs 21552 B.
           egonelbre.Copy: 4.779µs 518 allocs 33664 B.
                jnml.Copy: 5.187µs 321 allocs 21328 B.

    Throughput (GOMAXPROCS = 1) (256 MB):

    +--------------------+--------+---------+---------+---------+----------+
    THROUGHPUT | 333 | 4155 | 65359 | 1048559 | 16777301 |
    +--------------------+--------+---------+---------+---------+----------+
    [!] bufio.Copy | 524.45 | 3137.31 | 4765.79 | 4922.01 | 2083.84 |
    [!] bufio.PipeCopy | 523.99 | 3115.08 | 4767.71 | 4924.59 | 2083.15 |
    rogerpeppe.Copy | 231.94 | 1942.33 | 4499.48 | 4906.88 | 2085.31 |
    egonelbre.Copy | 252.79 | 1865.77 | 4482.94 | 4832.15 | 2053.85 |
    jnml.Copy | 233.75 | 1947.74 | 4500.00 | 4914.93 | 6055.04 |
    +--------------------+--------+---------+---------+---------+----------+

    +--------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+
    ALLOCS/BYTES | 333 | 4155 |
         65359 | 1048559 | 16777301 |
    +--------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+
    [!] bufio.Copy | ( 24 / 1280) | ( 24 / 5536) | (
        24 / 66464) | ( 24 / 1049504) | ( 24 / 16786336) |
    [!] bufio.PipeCopy | ( 13 / 1024) | ( 13 / 5280) | (
        13 / 66208) | ( 13 / 1049248) | ( 13 / 16786080) |
    rogerpeppe.Copy | ( 12 / 896) | ( 12 / 5152) | (
        12 / 66080) | ( 12 / 1049120) | ( 12 / 16785952) |
    egonelbre.Copy | ( 21 / 1248) | ( 21 / 5504) | (
        21 / 66432) | ( 21 / 1049472) | ( 21 / 16786304) |
    jnml.Copy | ( 13 / 1008) | ( 13 / 5264) | (
        13 / 66192) | ( 13 / 1049232) | ( 13 / 16787424) |
    +--------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+

    Throughput (GOMAXPROCS = 8) (256 MB):

    +--------------------+--------+---------+---------+---------+----------+
    THROUGHPUT | 333 | 4155 | 65359 | 1048559 | 16777301 |
    +--------------------+--------+---------+---------+---------+----------+
    [!] bufio.Copy | 510.38 | 2935.96 | 3899.06 | 4593.50 | 2067.04 |
    [!] bufio.PipeCopy | 503.73 | 2928.34 | 4204.07 | 4602.74 | 2073.62 |
    rogerpeppe.Copy | 206.61 | 1809.26 | 3770.79 | 4608.11 | 2069.03 |
    egonelbre.Copy | 350.66 | 2439.46 | 3946.51 | 4377.70 | 1917.33 |
    jnml.Copy | 214.09 | 1683.43 | 3773.63 | 4621.34 | 5749.54 |
    +--------------------+--------+---------+---------+---------+----------+

    +--------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+
    ALLOCS/BYTES | 333 | 4155 |
         65359 | 1048559 | 16777301 |
    +--------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+
    [!] bufio.Copy | ( 82 / 4992) | ( 46 / 6944) | (
       108 / 71840) | ( 536 / 1082272) | ( 56 / 16788384) |
    [!] bufio.PipeCopy | ( 65 / 4352) | ( 26 / 6336) | (
        66 / 69824) | ( 526 / 1082304) | ( 46 / 16788416) |
    rogerpeppe.Copy | ( 14 / 1248) | ( 14 / 5504) | (
        14 / 66432) | ( 14 / 1049472) | ( 14 / 16786304) |
    egonelbre.Copy | ( 89 / 5600) | ( 49 / 7744) | (
        51 / 68800) | ( 39 / 1051072) | ( 29 / 16787264) |
    jnml.Copy | ( 13 / 1008) | ( 14 / 5552) | (
        13 / 66192) | ( 13 / 1049232) | ( 13 / 16787424) |
    +--------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+

    @Jan, Egon: would you guys agree that a pipe based solution would be
    better/more flexible? If so, we could rework the shootout to use pipe's as
    the underlying implementation and a pre-baked copy function (i.e. I
    blatantly copied mine from Roger).

    Cheers,
       Peter
    On Fri, Jan 30, 2015 at 6:08 PM, Péter Szilágyi wrote:

    I've been also playing with this idea. One concrete instance where a
    buffered pipe would be more useful/flexible than a buffered copy is if you
    want to keep the buffer state, but maybe intermittently do something else
    too (e.g. progress report after every N bytes copied). It would play nicely
    with constructs such as (pseudo go):

    for !done {
    io.CopyN(bufDst, src, chunk)
    fmt.Println("Chunk done")
    }

    If we put this functionality directly into Copy, then I guess it's much
    more painful to do such a composable thing. I'm open to this idea of
    converting the proposal into a bufio.Pipe(). Though I'll check how it would
    fit into my design (too see if something's non-obviously wrong with the
    idea).

    In the meanwhile, running Jan's latest fix: all lately updated contenders
    are converging to the same speed, though the internal buffering ones
    outperforming the channels on small ops where the threading overhead hits
    the channel.

    Latency benchmarks (GOMAXPROCS = 1):
    [!] bufio.Copy: 4.282µs 34 allocs 2544 B.
    rogerpeppe.Copy: 5.304µs 20 allocs 67440 B.
    egonelbre.Copy: 4.438µs 30 allocs 2432 B.
    jnml.Copy: 4.812µs 17 allocs 1904 B.

    Latency benchmarks (GOMAXPROCS = 8):
    [!] bufio.Copy: 4.54µs 557 allocs 36016 B.
    rogerpeppe.Copy: 5.547µs 150 allocs 75760 B.
    egonelbre.Copy: 4.738µs 632 allocs 42240 B.
    jnml.Copy: 5.05µs 249 allocs 16752 B.

    Throughput (GOMAXPROCS = 1) (256 MB):

    +-----------------+--------+---------+---------+---------+----------+
    THROUGHPUT | 333 | 4155 | 65359 | 1048559 | 16777301 |
    +-----------------+--------+---------+---------+---------+----------+
    [!] bufio.Copy | 522.82 | 2880.92 | 4230.01 | 4360.93 | 1639.96 |
    rogerpeppe.Copy | 203.08 | 1526.76 | 2609.97 | 2783.97 | 1198.19 |
    egonelbre.Copy | 255.64 | 1791.27 | 4012.22 | 4278.48 | 1624.37 |
    jnml.Copy | 229.16 | 1784.35 | 4024.98 | 4374.86 | 1716.81 |
    +-----------------+--------+---------+---------+---------+----------+


    +-----------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+
    ALLOCS/BYTES | 333 | 4155 |
    65359 | 1048559 | 16777301 |

    +-----------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+
    [!] bufio.Copy | ( 24 / 1280) | ( 24 / 5536) | (
    24 / 66464) | ( 24 / 1049504) | ( 24 / 16786336) |
    rogerpeppe.Copy | ( 17 / 9368) | ( 16 / 13560) | (
    16 / 74488) | ( 16 / 1057528) | ( 16 / 16794360) |
    egonelbre.Copy | ( 21 / 1248) | ( 21 / 5504) | (
    21 / 66432) | ( 21 / 1049472) | ( 21 / 16786304) |
    jnml.Copy | ( 12 / 976) | ( 12 / 5232) | (
    12 / 66160) | ( 12 / 1049200) | ( 27 / 16778848) |

    +-----------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+

    Throughput (GOMAXPROCS = 8) (256 MB):

    +-----------------+--------+---------+---------+---------+----------+
    THROUGHPUT | 333 | 4155 | 65359 | 1048559 | 16777301 |
    +-----------------+--------+---------+---------+---------+----------+
    [!] bufio.Copy | 507.82 | 2689.22 | 3768.93 | 4071.21 | 1637.15 |
    rogerpeppe.Copy | 186.95 | 1403.50 | 1752.47 | 1816.09 | 1147.60 |
    egonelbre.Copy | 345.16 | 2196.97 | 3671.74 | 3919.41 | 1533.35 |
    jnml.Copy | 204.41 | 1712.08 | 3176.28 | 4036.61 | 1664.63 |
    +-----------------+--------+---------+---------+---------+----------+


    +-----------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+
    ALLOCS/BYTES | 333 | 4155 |
    65359 | 1048559 | 16777301 |

    +-----------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+
    [!] bufio.Copy | ( 88 / 5376) | ( 40 / 6560) | (
    88 / 70560) | ( 536 / 1082272) | ( 56 / 16788384) |
    rogerpeppe.Copy | ( 18 / 10664) | ( 18 / 14920) | (
    23 / 76168) | ( 21 / 1059080) | ( 20 / 16796072) |
    egonelbre.Copy | ( 96 / 6496) | ( 47 / 7616) | (
    46 / 68256) | ( 31 / 1050560) | ( 27 / 16786688) |
    jnml.Copy | ( 12 / 976) | ( 13 / 5520) | (
    13 / 66448) | ( 13 / 1049488) | ( 28 / 16779136) |

    +-----------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+

    On Fri, Jan 30, 2015 at 5:44 PM, roger peppe wrote:

    My take is that all these proposals are essentially creating a data
    pipe and using it for the copy. If we're going to do that, we may
    as well implement an actual buffered pipe, because that's useful
    for more than just Copy (you can pass the write end directly
    to a Go function that expects an io.Writer for example).

    I can't see that approach necessarily entails any significant
    performance sacrifice.

    On 30 January 2015 at 15:19, Jan Mercl wrote:
    On Fri Jan 30 2015 at 14:13:07 Péter Szilágyi <peterke@gmail.com>
    wrote:
    On the other hand, Jan's solution manages to allocate a massive 64MB
    internal buffers during the latency benchmarks, which requested 1KB
    buffers
    only.
    Memory overallocation fixed, please pull[0].

    [0]: https://github.com/karalabe/bufioprop/pull/10

    -j

    --
    You received this message because you are subscribed to the Google Groups
    "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send an
    email to golang-nuts+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/d/optout.
    --
    You received this message because you are subscribed to the Google Groups
    "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send an
    email to golang-nuts+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/d/optout.
    --
    You received this message because you are subscribed to the Google Groups "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/d/optout.
  • Egon at Jan 30, 2015 at 7:45 pm

    On Friday, 30 January 2015 21:03:02 UTC+2, Péter Szilágyi wrote:
    Hi all,

    A few modifications went into the benchmarks:

    - Roger added a repeating reader in order not to need a giant
    pre-allocated test data blob, makes startup faster, also should better
    reflect algo performance.
    - To prevent measuring occasional hiccups, the throughput benchmarks
    not use best-out-of-three. Scores are much stabler.

    Implementation wise the updates are:

    - Roger sent in a full Pipe based solution, arguing that it's way more
    flexible than a simple copy (I agree).
    - Jan sent in a nice optimization that seems to beat other algos in
    the case of very large buffers.
    - I've also ported my solution to a pipe version (both coexisting
    currently). That codebase needs to be cleaned up, it was to see if it works
    ok.

    With these, the current standing is:

    Latency benchmarks (GOMAXPROCS = 1):
    [!] bufio.Copy: 4.322µs 37 allocs 2736 B.
    [!] bufio.PipeCopy: 4.396µs 22 allocs 2224 B.
    rogerpeppe.Copy: 4.666µs 20 allocs 2032 B.
    egonelbre.Copy: 4.681µs 29 allocs 2368 B.
    jnml.Copy: 4.964µs 18 allocs 1936 B.

    Latency benchmarks (GOMAXPROCS = 8):
    [!] bufio.Copy: 4.525µs 398 allocs 25840 B.
    [!] bufio.PipeCopy: 4.58µs 632 allocs 41264 B.
    rogerpeppe.Copy: 4.918µs 325 allocs 21552 B.
    egonelbre.Copy: 4.779µs 518 allocs 33664 B.
    jnml.Copy: 5.187µs 321 allocs 21328 B.

    Throughput (GOMAXPROCS = 1) (256 MB):

    +--------------------+--------+---------+---------+---------+----------+
    THROUGHPUT | 333 | 4155 | 65359 | 1048559 | 16777301 |
    +--------------------+--------+---------+---------+---------+----------+
    [!] bufio.Copy | 524.45 | 3137.31 | 4765.79 | 4922.01 | 2083.84 |
    [!] bufio.PipeCopy | 523.99 | 3115.08 | 4767.71 | 4924.59 | 2083.15 |
    rogerpeppe.Copy | 231.94 | 1942.33 | 4499.48 | 4906.88 | 2085.31 |
    egonelbre.Copy | 252.79 | 1865.77 | 4482.94 | 4832.15 | 2053.85 |
    jnml.Copy | 233.75 | 1947.74 | 4500.00 | 4914.93 | 6055.04 |
    +--------------------+--------+---------+---------+---------+----------+


    +--------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+
    ALLOCS/BYTES | 333 | 4155 |
    65359 | 1048559 | 16777301 |

    +--------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+
    [!] bufio.Copy | ( 24 / 1280) | ( 24 / 5536) | (
    24 / 66464) | ( 24 / 1049504) | ( 24 / 16786336) |
    [!] bufio.PipeCopy | ( 13 / 1024) | ( 13 / 5280) | (
    13 / 66208) | ( 13 / 1049248) | ( 13 / 16786080) |
    rogerpeppe.Copy | ( 12 / 896) | ( 12 / 5152) | (
    12 / 66080) | ( 12 / 1049120) | ( 12 / 16785952) |
    egonelbre.Copy | ( 21 / 1248) | ( 21 / 5504) | (
    21 / 66432) | ( 21 / 1049472) | ( 21 / 16786304) |
    jnml.Copy | ( 13 / 1008) | ( 13 / 5264) | (
    13 / 66192) | ( 13 / 1049232) | ( 13 / 16787424) |

    +--------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+

    Throughput (GOMAXPROCS = 8) (256 MB):

    +--------------------+--------+---------+---------+---------+----------+
    THROUGHPUT | 333 | 4155 | 65359 | 1048559 | 16777301 |
    +--------------------+--------+---------+---------+---------+----------+
    [!] bufio.Copy | 510.38 | 2935.96 | 3899.06 | 4593.50 | 2067.04 |
    [!] bufio.PipeCopy | 503.73 | 2928.34 | 4204.07 | 4602.74 | 2073.62 |
    rogerpeppe.Copy | 206.61 | 1809.26 | 3770.79 | 4608.11 | 2069.03 |
    egonelbre.Copy | 350.66 | 2439.46 | 3946.51 | 4377.70 | 1917.33 |
    jnml.Copy | 214.09 | 1683.43 | 3773.63 | 4621.34 | 5749.54 |
    +--------------------+--------+---------+---------+---------+----------+


    +--------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+
    ALLOCS/BYTES | 333 | 4155 |
    65359 | 1048559 | 16777301 |

    +--------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+
    [!] bufio.Copy | ( 82 / 4992) | ( 46 / 6944) | (
    108 / 71840) | ( 536 / 1082272) | ( 56 / 16788384) |
    [!] bufio.PipeCopy | ( 65 / 4352) | ( 26 / 6336) | (
    66 / 69824) | ( 526 / 1082304) | ( 46 / 16788416) |
    rogerpeppe.Copy | ( 14 / 1248) | ( 14 / 5504) | (
    14 / 66432) | ( 14 / 1049472) | ( 14 / 16786304) |
    egonelbre.Copy | ( 89 / 5600) | ( 49 / 7744) | (
    51 / 68800) | ( 39 / 1051072) | ( 29 / 16787264) |
    jnml.Copy | ( 13 / 1008) | ( 14 / 5552) | (
    13 / 66192) | ( 13 / 1049232) | ( 13 / 16787424) |

    +--------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+

    @Jan, Egon: would you guys agree that a pipe based solution would be
    better/more flexible? If so, we could rework the shootout to use pipe's as
    the underlying implementation and a pre-baked copy function (i.e. I
    blatantly copied mine from Roger).
    SGTM

    --
    You received this message because you are subscribed to the Google Groups "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/d/optout.
  • Péter Szilágyi at Jan 30, 2015 at 8:14 pm
    Up till now, this is what I'm aiming to include in bufio. Feedback welcome
    API wise too (though it's just based on Roger's and the io packages
    Copy/Pipe)

    http://godoc.org/github.com/karalabe/bufioprop
    On Fri, Jan 30, 2015 at 9:45 PM, Egon wrote:


    On Friday, 30 January 2015 21:03:02 UTC+2, Péter Szilágyi wrote:

    Hi all,

    A few modifications went into the benchmarks:

    - Roger added a repeating reader in order not to need a giant
    pre-allocated test data blob, makes startup faster, also should better
    reflect algo performance.
    - To prevent measuring occasional hiccups, the throughput benchmarks
    not use best-out-of-three. Scores are much stabler.

    Implementation wise the updates are:

    - Roger sent in a full Pipe based solution, arguing that it's way
    more flexible than a simple copy (I agree).
    - Jan sent in a nice optimization that seems to beat other algos in
    the case of very large buffers.
    - I've also ported my solution to a pipe version (both coexisting
    currently). That codebase needs to be cleaned up, it was to see if it works
    ok.

    With these, the current standing is:

    Latency benchmarks (GOMAXPROCS = 1):
    [!] bufio.Copy: 4.322µs 37 allocs 2736 B.
    [!] bufio.PipeCopy: 4.396µs 22 allocs 2224 B.
    rogerpeppe.Copy: 4.666µs 20 allocs 2032 B.
    egonelbre.Copy: 4.681µs 29 allocs 2368 B.
    jnml.Copy: 4.964µs 18 allocs 1936 B.

    Latency benchmarks (GOMAXPROCS = 8):
    [!] bufio.Copy: 4.525µs 398 allocs 25840 B.
    [!] bufio.PipeCopy: 4.58µs 632 allocs 41264 B.
    rogerpeppe.Copy: 4.918µs 325 allocs 21552 B.
    egonelbre.Copy: 4.779µs 518 allocs 33664 B.
    jnml.Copy: 5.187µs 321 allocs 21328 B.

    Throughput (GOMAXPROCS = 1) (256 MB):

    +--------------------+--------+---------+---------+---------+----------+
    THROUGHPUT | 333 | 4155 | 65359 | 1048559 | 16777301 |
    +--------------------+--------+---------+---------+---------+----------+
    [!] bufio.Copy | 524.45 | 3137.31 | 4765.79 | 4922.01 | 2083.84 |
    [!] bufio.PipeCopy | 523.99 | 3115.08 | 4767.71 | 4924.59 | 2083.15 |
    rogerpeppe.Copy | 231.94 | 1942.33 | 4499.48 | 4906.88 | 2085.31 |
    egonelbre.Copy | 252.79 | 1865.77 | 4482.94 | 4832.15 | 2053.85 |
    jnml.Copy | 233.75 | 1947.74 | 4500.00 | 4914.93 | 6055.04 |
    +--------------------+--------+---------+---------+---------+----------+

    +--------------------+-----------------------+--------------
    ---------+-----------------------+-----------------------+--
    ---------------------+
    ALLOCS/BYTES | 333 | 4155 |
    65359 | 1048559 | 16777301 |
    +--------------------+-----------------------+--------------
    ---------+-----------------------+-----------------------+--
    ---------------------+
    [!] bufio.Copy | ( 24 / 1280) | ( 24 / 5536) | (
    24 / 66464) | ( 24 / 1049504) | ( 24 / 16786336) |
    [!] bufio.PipeCopy | ( 13 / 1024) | ( 13 / 5280) | (
    13 / 66208) | ( 13 / 1049248) | ( 13 / 16786080) |
    rogerpeppe.Copy | ( 12 / 896) | ( 12 / 5152) | (
    12 / 66080) | ( 12 / 1049120) | ( 12 / 16785952) |
    egonelbre.Copy | ( 21 / 1248) | ( 21 / 5504) | (
    21 / 66432) | ( 21 / 1049472) | ( 21 / 16786304) |
    jnml.Copy | ( 13 / 1008) | ( 13 / 5264) | (
    13 / 66192) | ( 13 / 1049232) | ( 13 / 16787424) |
    +--------------------+-----------------------+--------------
    ---------+-----------------------+-----------------------+--
    ---------------------+

    Throughput (GOMAXPROCS = 8) (256 MB):

    +--------------------+--------+---------+---------+---------+----------+
    THROUGHPUT | 333 | 4155 | 65359 | 1048559 | 16777301 |
    +--------------------+--------+---------+---------+---------+----------+
    [!] bufio.Copy | 510.38 | 2935.96 | 3899.06 | 4593.50 | 2067.04 |
    [!] bufio.PipeCopy | 503.73 | 2928.34 | 4204.07 | 4602.74 | 2073.62 |
    rogerpeppe.Copy | 206.61 | 1809.26 | 3770.79 | 4608.11 | 2069.03 |
    egonelbre.Copy | 350.66 | 2439.46 | 3946.51 | 4377.70 | 1917.33 |
    jnml.Copy | 214.09 | 1683.43 | 3773.63 | 4621.34 | 5749.54 |
    +--------------------+--------+---------+---------+---------+----------+

    +--------------------+-----------------------+--------------
    ---------+-----------------------+-----------------------+--
    ---------------------+
    ALLOCS/BYTES | 333 | 4155 |
    65359 | 1048559 | 16777301 |
    +--------------------+-----------------------+--------------
    ---------+-----------------------+-----------------------+--
    ---------------------+
    [!] bufio.Copy | ( 82 / 4992) | ( 46 / 6944) | (
    108 / 71840) | ( 536 / 1082272) | ( 56 / 16788384) |
    [!] bufio.PipeCopy | ( 65 / 4352) | ( 26 / 6336) | (
    66 / 69824) | ( 526 / 1082304) | ( 46 / 16788416) |
    rogerpeppe.Copy | ( 14 / 1248) | ( 14 / 5504) | (
    14 / 66432) | ( 14 / 1049472) | ( 14 / 16786304) |
    egonelbre.Copy | ( 89 / 5600) | ( 49 / 7744) | (
    51 / 68800) | ( 39 / 1051072) | ( 29 / 16787264) |
    jnml.Copy | ( 13 / 1008) | ( 14 / 5552) | (
    13 / 66192) | ( 13 / 1049232) | ( 13 / 16787424) |
    +--------------------+-----------------------+--------------
    ---------+-----------------------+-----------------------+--
    ---------------------+

    @Jan, Egon: would you guys agree that a pipe based solution would be
    better/more flexible? If so, we could rework the shootout to use pipe's as
    the underlying implementation and a pre-baked copy function (i.e. I
    blatantly copied mine from Roger).
    SGTM

    --
    You received this message because you are subscribed to the Google Groups
    "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send an
    email to golang-nuts+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/d/optout.
    --
    You received this message because you are subscribed to the Google Groups "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/d/optout.
  • Péter Szilágyi at Jan 31, 2015 at 9:05 am
    Jan, you're going to hate me (if you don't already) :P I've just shot out
    your code as *not solving* the problem again :P

    Currently the reason you can copy blazing fast is because you've optimized
    memory usage to not blindly use the entire buffer as the rest of us, bur
    rather work with chunks and reuse the hottest one (i.e. still in the
    cache). The flaw in your current implementation is that you don't fill one
    chunk fully, but rather stuff into it whatever's available and send it to
    the reader. However, if the output stream doesn't touch your chunk for a
    good while - even though it might be almost completely empty - the empty
    space is wasted, and you run out of buffer allowance to accept the input.

    By hitting your copy's input with many many tiny data chunks, if quickly
    uses up all "pages", and starts idling because it doesn't have any
    allowance left. I've managed to show this exact behavior by modifying the
    shootout's "stable" streams to use 10KB/ms throughput instead of 1MB/100ms.
    All other implementations work correctly, but your's stalls.

    However, I didn't like the idea of trying to fail your code just to show
    that it performs badly in some fabricated scenario, so I've actually placed
    your implementation in my production code to see if it's just some
    theoretical issue, or if it indeed does fail (streaming download, chunked
    upload). The result was as I've thought, that since I have a fast download,
    it hits copy with a lot of small ops, and your implementation behaves
    exactly like io.Copy, it instantly uses up all your pages and then stalls
    the download.

    To support my claim I've created a small repro, (though it requires
    gsutils, google cloud account etc configured), but it's my full code
    <https://gist.github.com/karalabe/024fe1f132c18471d411>, which I ran with
    io.Copy, jnml.Copy and bufioprop.Copy. The results can be seen on the
    attached chart (sorry for the bad/quick photoshop). The yellow line is the
    download, whereas the purple one is the upload. As you can see, whenever
    gsutils starts uploading, it will block accepting new data. As a
    consequence io.Copy stalls more or less immediately, your solution makes a
    few more ticks until it wastes all its pages, whereas the proposed solution
    happily downloads while the upload is running (note, there are of course
    interferences, but that is understandable imho).

    Cheers,
       Peter

    PS: The current stats are:

    Manually disabled contenders:
                 ncw.Copy: deadlock in latency benchmark.
    ------------------------------------------------

    High throughput tests:
                  io.Copy: test passed.
           [!] bufio.Copy: test passed.
          rogerpeppe.Copy: test passed.
        rogerpeppe.IOCopy: test passed.
          mattharden.Copy: test passed.
               yiyus.Copy: corrupt data on the output.
           egonelbre.Copy: test passed.
                jnml.Copy: test passed.
           bakulshah.Copy: panic.
        augustoroman.Copy: test passed.
    ------------------------------------------------

    Stable input, stable output shootout:
                  io.Copy: 3.848840481s 8.314192 mbps 6954 allocs 477712
    B
           [!] bufio.Copy: 3.93977615s 8.122289 mbps 7017 allocs 13033344
    B
          rogerpeppe.Copy: 3.957739623s 8.085423 mbps 7075 allocs 13035712
    B
        rogerpeppe.IOCopy: 4.006578497s 7.986865 mbps 7075 allocs 518288
    B
          mattharden.Copy: 7.587068709s 4.217703 mbps 6686 allocs 25593688
    B
           egonelbre.Copy: 3.916161761s 8.171266 mbps 7054 allocs 13034368
    B
                jnml.Copy: 3.996595672s 8.006814 mbps 7061 allocs 13035840
    B
        augustoroman.Copy: 3.871614938s 8.265285 mbps 6849 allocs 13021312
    B

    Stable input, bursty output shootout:
                  io.Copy: 6.836185084s 4.680973 mbps 3444 allocs 254400
    B
           [!] bufio.Copy: 4.60999474s 6.941440 mbps 3588 allocs 12812608
    B
          rogerpeppe.Copy: 4.569635021s 7.002747 mbps 3576 allocs 12811776
    B
        rogerpeppe.IOCopy: 6.923576695s 4.621889 mbps 3439 allocs 285584
    B
           egonelbre.Copy: 4.612420089s 6.937790 mbps 3754 allocs 12823168
    B
                jnml.Copy: 6.895554042s 4.640671 mbps 3601 allocs 12814400
    B
        augustoroman.Copy: 4.611330731s 6.939429 mbps 3521 allocs 12808320
    B

    Bursty input, stable output shootout:
           [!] bufio.Copy: 3.799650855s 8.421826 mbps 3311 allocs 12794880
    B
          rogerpeppe.Copy: 3.792738473s 8.437175 mbps 3309 allocs 12794688
    B
           egonelbre.Copy: 3.803217893s 8.413928 mbps 3393 allocs 12800064
    B
        augustoroman.Copy: 3.797589395s 8.426398 mbps 3306 allocs 12794560
    B
    ------------------------------------------------

    Latency benchmarks (GOMAXPROCS = 1):
           [!] bufio.Copy: 4.422µs 23 allocs 2288 B.
          rogerpeppe.Copy: 4.754µs 21 allocs 2096 B.
           egonelbre.Copy: 4.645µs 29 allocs 2368 B.
        augustoroman.Copy: 5.034µs 17 allocs 1904 B.

    Latency benchmarks (GOMAXPROCS = 8):
           [!] bufio.Copy: 4.596µs 673 allocs 43888 B.
          rogerpeppe.Copy: 4.934µs 354 allocs 23408 B.
           egonelbre.Copy: 4.856µs 370 allocs 24192 B.
        augustoroman.Copy: 5.296µs 114 allocs 8112 B.

    Throughput (GOMAXPROCS = 1) (256 MB):

    +-------------------+--------+---------+---------+---------+----------+
    THROUGHPUT | 333 | 4155 | 65359 | 1048559 | 16777301 |
    +-------------------+--------+---------+---------+---------+----------+
    [!] bufio.Copy | 492.98 | 3011.88 | 4758.31 | 4923.77 | 2057.23 |
    rogerpeppe.Copy | 226.69 | 1918.60 | 4495.06 | 4903.63 | 2058.11 |
    egonelbre.Copy | 253.95 | 1867.21 | 4484.71 | 4801.99 | 2033.94 |
    augustoroman.Copy | 162.78 | 1513.39 | 4324.66 | 4888.18 | 2060.91 |
    +-------------------+--------+---------+---------+---------+----------+

    +-------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+
    ALLOCS/BYTES | 333 | 4155 |
       65359 | 1048559 | 16777301 |
    +-------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+
    [!] bufio.Copy | ( 13 / 1024) | ( 13 / 5280) | (
      13 / 66208) | ( 13 / 1049248) | ( 13 / 16786080) |
    rogerpeppe.Copy | ( 12 / 896) | ( 12 / 5152) | (
      12 / 66080) | ( 12 / 1049120) | ( 12 / 16785952) |
    egonelbre.Copy | ( 21 / 1248) | ( 21 / 5504) | (
      21 / 66432) | ( 21 / 1049472) | ( 21 / 16786304) |
    augustoroman.Copy | ( 12 / 960) | ( 12 / 5216) | (
      12 / 66144) | ( 12 / 1049184) | ( 12 / 16786016) |
    +-------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+

    Throughput (GOMAXPROCS = 8) (256 MB):

    +-------------------+--------+---------+---------+---------+----------+
    THROUGHPUT | 333 | 4155 | 65359 | 1048559 | 16777301 |
    +-------------------+--------+---------+---------+---------+----------+
    [!] bufio.Copy | 499.11 | 2922.10 | 4262.06 | 4589.26 | 2041.15 |
    rogerpeppe.Copy | 220.56 | 1698.86 | 3953.09 | 4611.53 | 2048.96 |
    egonelbre.Copy | 360.14 | 2439.69 | 3931.45 | 4339.05 | 1907.04 |
    augustoroman.Copy | 150.06 | 1392.57 | 3735.92 | 4572.96 | 2052.33 |
    +-------------------+--------+---------+---------+---------+----------+

    +-------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+
    ALLOCS/BYTES | 333 | 4155 |
       65359 | 1048559 | 16777301 |
    +-------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+
    [!] bufio.Copy | ( 130 / 9792) | ( 46 / 7392) | (
      94 / 71392) | ( 781 / 1098400) | ( 61 / 16789152) |
    rogerpeppe.Copy | ( 13 / 960) | ( 14 / 5504) | (
      14 / 66432) | ( 14 / 1049472) | ( 14 / 16786304) |
    egonelbre.Copy | ( 93 / 6304) | ( 50 / 7584) | (
      53 / 68928) | ( 34 / 1050528) | ( 34 / 16787360) |
    augustoroman.Copy | ( 22 / 1824) | ( 13 / 5504) | (
      14 / 66496) | ( 12 / 1049184) | ( 12 / 16786016) |
    +-------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+


    On Fri, Jan 30, 2015 at 10:14 PM, Péter Szilágyi wrote:

    Up till now, this is what I'm aiming to include in bufio. Feedback welcome
    API wise too (though it's just based on Roger's and the io packages
    Copy/Pipe)

    http://godoc.org/github.com/karalabe/bufioprop
    On Fri, Jan 30, 2015 at 9:45 PM, Egon wrote:


    On Friday, 30 January 2015 21:03:02 UTC+2, Péter Szilágyi wrote:

    Hi all,

    A few modifications went into the benchmarks:

    - Roger added a repeating reader in order not to need a giant
    pre-allocated test data blob, makes startup faster, also should better
    reflect algo performance.
    - To prevent measuring occasional hiccups, the throughput benchmarks
    not use best-out-of-three. Scores are much stabler.

    Implementation wise the updates are:

    - Roger sent in a full Pipe based solution, arguing that it's way
    more flexible than a simple copy (I agree).
    - Jan sent in a nice optimization that seems to beat other algos in
    the case of very large buffers.
    - I've also ported my solution to a pipe version (both coexisting
    currently). That codebase needs to be cleaned up, it was to see if it works
    ok.

    With these, the current standing is:

    Latency benchmarks (GOMAXPROCS = 1):
    [!] bufio.Copy: 4.322µs 37 allocs 2736 B.
    [!] bufio.PipeCopy: 4.396µs 22 allocs 2224 B.
    rogerpeppe.Copy: 4.666µs 20 allocs 2032 B.
    egonelbre.Copy: 4.681µs 29 allocs 2368 B.
    jnml.Copy: 4.964µs 18 allocs 1936 B.

    Latency benchmarks (GOMAXPROCS = 8):
    [!] bufio.Copy: 4.525µs 398 allocs 25840 B.
    [!] bufio.PipeCopy: 4.58µs 632 allocs 41264 B.
    rogerpeppe.Copy: 4.918µs 325 allocs 21552 B.
    egonelbre.Copy: 4.779µs 518 allocs 33664 B.
    jnml.Copy: 5.187µs 321 allocs 21328 B.

    Throughput (GOMAXPROCS = 1) (256 MB):

    +--------------------+--------+---------+---------+---------+----------+
    THROUGHPUT | 333 | 4155 | 65359 | 1048559 | 16777301 |
    +--------------------+--------+---------+---------+---------+----------+
    [!] bufio.Copy | 524.45 | 3137.31 | 4765.79 | 4922.01 | 2083.84 |
    [!] bufio.PipeCopy | 523.99 | 3115.08 | 4767.71 | 4924.59 | 2083.15 |
    rogerpeppe.Copy | 231.94 | 1942.33 | 4499.48 | 4906.88 | 2085.31 |
    egonelbre.Copy | 252.79 | 1865.77 | 4482.94 | 4832.15 | 2053.85 |
    jnml.Copy | 233.75 | 1947.74 | 4500.00 | 4914.93 | 6055.04 |
    +--------------------+--------+---------+---------+---------+----------+

    +--------------------+-----------------------+--------------
    ---------+-----------------------+-----------------------+--
    ---------------------+
    ALLOCS/BYTES | 333 | 4155 |
    65359 | 1048559 | 16777301 |
    +--------------------+-----------------------+--------------
    ---------+-----------------------+-----------------------+--
    ---------------------+
    [!] bufio.Copy | ( 24 / 1280) | ( 24 / 5536) | (
    24 / 66464) | ( 24 / 1049504) | ( 24 / 16786336) |
    [!] bufio.PipeCopy | ( 13 / 1024) | ( 13 / 5280) | (
    13 / 66208) | ( 13 / 1049248) | ( 13 / 16786080) |
    rogerpeppe.Copy | ( 12 / 896) | ( 12 / 5152) | (
    12 / 66080) | ( 12 / 1049120) | ( 12 / 16785952) |
    egonelbre.Copy | ( 21 / 1248) | ( 21 / 5504) | (
    21 / 66432) | ( 21 / 1049472) | ( 21 / 16786304) |
    jnml.Copy | ( 13 / 1008) | ( 13 / 5264) | (
    13 / 66192) | ( 13 / 1049232) | ( 13 / 16787424) |
    +--------------------+-----------------------+--------------
    ---------+-----------------------+-----------------------+--
    ---------------------+

    Throughput (GOMAXPROCS = 8) (256 MB):

    +--------------------+--------+---------+---------+---------+----------+
    THROUGHPUT | 333 | 4155 | 65359 | 1048559 | 16777301 |
    +--------------------+--------+---------+---------+---------+----------+
    [!] bufio.Copy | 510.38 | 2935.96 | 3899.06 | 4593.50 | 2067.04 |
    [!] bufio.PipeCopy | 503.73 | 2928.34 | 4204.07 | 4602.74 | 2073.62 |
    rogerpeppe.Copy | 206.61 | 1809.26 | 3770.79 | 4608.11 | 2069.03 |
    egonelbre.Copy | 350.66 | 2439.46 | 3946.51 | 4377.70 | 1917.33 |
    jnml.Copy | 214.09 | 1683.43 | 3773.63 | 4621.34 | 5749.54 |
    +--------------------+--------+---------+---------+---------+----------+

    +--------------------+-----------------------+--------------
    ---------+-----------------------+-----------------------+--
    ---------------------+
    ALLOCS/BYTES | 333 | 4155 |
    65359 | 1048559 | 16777301 |
    +--------------------+-----------------------+--------------
    ---------+-----------------------+-----------------------+--
    ---------------------+
    [!] bufio.Copy | ( 82 / 4992) | ( 46 / 6944) | (
    108 / 71840) | ( 536 / 1082272) | ( 56 / 16788384) |
    [!] bufio.PipeCopy | ( 65 / 4352) | ( 26 / 6336) | (
    66 / 69824) | ( 526 / 1082304) | ( 46 / 16788416) |
    rogerpeppe.Copy | ( 14 / 1248) | ( 14 / 5504) | (
    14 / 66432) | ( 14 / 1049472) | ( 14 / 16786304) |
    egonelbre.Copy | ( 89 / 5600) | ( 49 / 7744) | (
    51 / 68800) | ( 39 / 1051072) | ( 29 / 16787264) |
    jnml.Copy | ( 13 / 1008) | ( 14 / 5552) | (
    13 / 66192) | ( 13 / 1049232) | ( 13 / 16787424) |
    +--------------------+-----------------------+--------------
    ---------+-----------------------+-----------------------+--
    ---------------------+

    @Jan, Egon: would you guys agree that a pipe based solution would be
    better/more flexible? If so, we could rework the shootout to use pipe's as
    the underlying implementation and a pre-baked copy function (i.e. I
    blatantly copied mine from Roger).
    SGTM

    --
    You received this message because you are subscribed to the Google Groups
    "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send an
    email to golang-nuts+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/d/optout.
    --
    You received this message because you are subscribed to the Google Groups "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/d/optout.
  • Aroman at Jan 31, 2015 at 9:46 am
    Interesting -- I was thinking today it would be useful to have some tests
    using stuff from testing/iotest <http://godoc.org/testing/iotest>, for
    example the OneByteReader (would have caught Jan's issue, I think),
    DataErrReader (cropped up earlier), and HalfReader. Error handling should
    be tested as well for both the source and dest.

    For more general discussion, I structured my design around accepting an
    external buffer rather than accepting a buffer size. I think this is
    better since it allows the caller to reuse buffers (e.g. arena-allocated
    bufs), but it doesn't significantly inconvenience the casual user that can
    easily provide "make([]byte, N)".

    - Augusto
    On Saturday, January 31, 2015 at 1:05:45 AM UTC-8, Péter Szilágyi wrote:

    Jan, you're going to hate me (if you don't already) :P I've just shot out
    your code as *not solving* the problem again :P

    Currently the reason you can copy blazing fast is because you've optimized
    memory usage to not blindly use the entire buffer as the rest of us, bur
    rather work with chunks and reuse the hottest one (i.e. still in the
    cache). The flaw in your current implementation is that you don't fill one
    chunk fully, but rather stuff into it whatever's available and send it to
    the reader. However, if the output stream doesn't touch your chunk for a
    good while - even though it might be almost completely empty - the empty
    space is wasted, and you run out of buffer allowance to accept the input.

    By hitting your copy's input with many many tiny data chunks, if quickly
    uses up all "pages", and starts idling because it doesn't have any
    allowance left. I've managed to show this exact behavior by modifying the
    shootout's "stable" streams to use 10KB/ms throughput instead of 1MB/100ms.
    All other implementations work correctly, but your's stalls.

    However, I didn't like the idea of trying to fail your code just to show
    that it performs badly in some fabricated scenario, so I've actually placed
    your implementation in my production code to see if it's just some
    theoretical issue, or if it indeed does fail (streaming download, chunked
    upload). The result was as I've thought, that since I have a fast download,
    it hits copy with a lot of small ops, and your implementation behaves
    exactly like io.Copy, it instantly uses up all your pages and then stalls
    the download.

    To support my claim I've created a small repro, (though it requires
    gsutils, google cloud account etc configured), but it's my full code
    <https://gist.github.com/karalabe/024fe1f132c18471d411>, which I ran with
    io.Copy, jnml.Copy and bufioprop.Copy. The results can be seen on the
    attached chart (sorry for the bad/quick photoshop). The yellow line is the
    download, whereas the purple one is the upload. As you can see, whenever
    gsutils starts uploading, it will block accepting new data. As a
    consequence io.Copy stalls more or less immediately, your solution makes a
    few more ticks until it wastes all its pages, whereas the proposed solution
    happily downloads while the upload is running (note, there are of course
    interferences, but that is understandable imho).

    Cheers,
    Peter

    PS: The current stats are:

    Manually disabled contenders:
    ncw.Copy: deadlock in latency benchmark.
    ------------------------------------------------

    High throughput tests:
    io.Copy: test passed.
    [!] bufio.Copy: test passed.
    rogerpeppe.Copy: test passed.
    rogerpeppe.IOCopy: test passed.
    mattharden.Copy: test passed.
    yiyus.Copy: corrupt data on the output.
    egonelbre.Copy: test passed.
    jnml.Copy: test passed.
    bakulshah.Copy: panic.
    augustoroman.Copy: test passed.
    ------------------------------------------------

    Stable input, stable output shootout:
    io.Copy: 3.848840481s 8.314192 mbps 6954 allocs
    477712 B
    [!] bufio.Copy: 3.93977615s 8.122289 mbps 7017 allocs
    13033344 B
    rogerpeppe.Copy: 3.957739623s 8.085423 mbps 7075 allocs
    13035712 B
    rogerpeppe.IOCopy: 4.006578497s 7.986865 mbps 7075 allocs
    518288 B
    mattharden.Copy: 7.587068709s 4.217703 mbps 6686 allocs
    25593688 B
    egonelbre.Copy: 3.916161761s 8.171266 mbps 7054 allocs
    13034368 B
    jnml.Copy: 3.996595672s 8.006814 mbps 7061 allocs
    13035840 B
    augustoroman.Copy: 3.871614938s 8.265285 mbps 6849 allocs
    13021312 B

    Stable input, bursty output shootout:
    io.Copy: 6.836185084s 4.680973 mbps 3444 allocs
    254400 B
    [!] bufio.Copy: 4.60999474s 6.941440 mbps 3588 allocs
    12812608 B
    rogerpeppe.Copy: 4.569635021s 7.002747 mbps 3576 allocs
    12811776 B
    rogerpeppe.IOCopy: 6.923576695s 4.621889 mbps 3439 allocs
    285584 B
    egonelbre.Copy: 4.612420089s 6.937790 mbps 3754 allocs
    12823168 B
    jnml.Copy: 6.895554042s 4.640671 mbps 3601 allocs
    12814400 B
    augustoroman.Copy: 4.611330731s 6.939429 mbps 3521 allocs
    12808320 B

    Bursty input, stable output shootout:
    [!] bufio.Copy: 3.799650855s 8.421826 mbps 3311 allocs
    12794880 B
    rogerpeppe.Copy: 3.792738473s 8.437175 mbps 3309 allocs
    12794688 B
    egonelbre.Copy: 3.803217893s 8.413928 mbps 3393 allocs
    12800064 B
    augustoroman.Copy: 3.797589395s 8.426398 mbps 3306 allocs
    12794560 B
    ------------------------------------------------

    Latency benchmarks (GOMAXPROCS = 1):
    [!] bufio.Copy: 4.422µs 23 allocs 2288 B.
    rogerpeppe.Copy: 4.754µs 21 allocs 2096 B.
    egonelbre.Copy: 4.645µs 29 allocs 2368 B.
    augustoroman.Copy: 5.034µs 17 allocs 1904 B.

    Latency benchmarks (GOMAXPROCS = 8):
    [!] bufio.Copy: 4.596µs 673 allocs 43888 B.
    rogerpeppe.Copy: 4.934µs 354 allocs 23408 B.
    egonelbre.Copy: 4.856µs 370 allocs 24192 B.
    augustoroman.Copy: 5.296µs 114 allocs 8112 B.

    Throughput (GOMAXPROCS = 1) (256 MB):

    +-------------------+--------+---------+---------+---------+----------+
    THROUGHPUT | 333 | 4155 | 65359 | 1048559 | 16777301 |
    +-------------------+--------+---------+---------+---------+----------+
    [!] bufio.Copy | 492.98 | 3011.88 | 4758.31 | 4923.77 | 2057.23 |
    rogerpeppe.Copy | 226.69 | 1918.60 | 4495.06 | 4903.63 | 2058.11 |
    egonelbre.Copy | 253.95 | 1867.21 | 4484.71 | 4801.99 | 2033.94 |
    augustoroman.Copy | 162.78 | 1513.39 | 4324.66 | 4888.18 | 2060.91 |
    +-------------------+--------+---------+---------+---------+----------+


    +-------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+
    ALLOCS/BYTES | 333 | 4155 |
    65359 | 1048559 | 16777301 |

    +-------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+
    [!] bufio.Copy | ( 13 / 1024) | ( 13 / 5280) | (
    13 / 66208) | ( 13 / 1049248) | ( 13 / 16786080) |
    rogerpeppe.Copy | ( 12 / 896) | ( 12 / 5152) | (
    12 / 66080) | ( 12 / 1049120) | ( 12 / 16785952) |
    egonelbre.Copy | ( 21 / 1248) | ( 21 / 5504) | (
    21 / 66432) | ( 21 / 1049472) | ( 21 / 16786304) |
    augustoroman.Copy | ( 12 / 960) | ( 12 / 5216) | (
    12 / 66144) | ( 12 / 1049184) | ( 12 / 16786016) |

    +-------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+

    Throughput (GOMAXPROCS = 8) (256 MB):

    +-------------------+--------+---------+---------+---------+----------+
    THROUGHPUT | 333 | 4155 | 65359 | 1048559 | 16777301 |
    +-------------------+--------+---------+---------+---------+----------+
    [!] bufio.Copy | 499.11 | 2922.10 | 4262.06 | 4589.26 | 2041.15 |
    rogerpeppe.Copy | 220.56 | 1698.86 | 3953.09 | 4611.53 | 2048.96 |
    egonelbre.Copy | 360.14 | 2439.69 | 3931.45 | 4339.05 | 1907.04 |
    augustoroman.Copy | 150.06 | 1392.57 | 3735.92 | 4572.96 | 2052.33 |
    +-------------------+--------+---------+---------+---------+----------+


    +-------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+
    ALLOCS/BYTES | 333 | 4155 |
    65359 | 1048559 | 16777301 |

    +-------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+
    [!] bufio.Copy | ( 130 / 9792) | ( 46 / 7392) | (
    94 / 71392) | ( 781 / 1098400) | ( 61 / 16789152) |
    rogerpeppe.Copy | ( 13 / 960) | ( 14 / 5504) | (
    14 / 66432) | ( 14 / 1049472) | ( 14 / 16786304) |
    egonelbre.Copy | ( 93 / 6304) | ( 50 / 7584) | (
    53 / 68928) | ( 34 / 1050528) | ( 34 / 16787360) |
    augustoroman.Copy | ( 22 / 1824) | ( 13 / 5504) | (
    14 / 66496) | ( 12 / 1049184) | ( 12 / 16786016) |

    +-------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+



    On Fri, Jan 30, 2015 at 10:14 PM, Péter Szilágyi <pet...@gmail.com
    <javascript:>> wrote:
    Up till now, this is what I'm aiming to include in bufio. Feedback
    welcome API wise too (though it's just based on Roger's and the io packages
    Copy/Pipe)

    http://godoc.org/github.com/karalabe/bufioprop

    On Fri, Jan 30, 2015 at 9:45 PM, Egon <egon...@gmail.com <javascript:>>
    wrote:
    On Friday, 30 January 2015 21:03:02 UTC+2, Péter Szilágyi wrote:

    Hi all,

    A few modifications went into the benchmarks:

    - Roger added a repeating reader in order not to need a giant
    pre-allocated test data blob, makes startup faster, also should better
    reflect algo performance.
    - To prevent measuring occasional hiccups, the throughput
    benchmarks not use best-out-of-three. Scores are much stabler.

    Implementation wise the updates are:

    - Roger sent in a full Pipe based solution, arguing that it's way
    more flexible than a simple copy (I agree).
    - Jan sent in a nice optimization that seems to beat other algos in
    the case of very large buffers.
    - I've also ported my solution to a pipe version (both coexisting
    currently). That codebase needs to be cleaned up, it was to see if it works
    ok.

    With these, the current standing is:

    Latency benchmarks (GOMAXPROCS = 1):
    [!] bufio.Copy: 4.322µs 37 allocs 2736 B.
    [!] bufio.PipeCopy: 4.396µs 22 allocs 2224 B.
    rogerpeppe.Copy: 4.666µs 20 allocs 2032 B.
    egonelbre.Copy: 4.681µs 29 allocs 2368 B.
    jnml.Copy: 4.964µs 18 allocs 1936 B.

    Latency benchmarks (GOMAXPROCS = 8):
    [!] bufio.Copy: 4.525µs 398 allocs 25840 B.
    [!] bufio.PipeCopy: 4.58µs 632 allocs 41264 B.
    rogerpeppe.Copy: 4.918µs 325 allocs 21552 B.
    egonelbre.Copy: 4.779µs 518 allocs 33664 B.
    jnml.Copy: 5.187µs 321 allocs 21328 B.

    Throughput (GOMAXPROCS = 1) (256 MB):

    +--------------------+--------+---------+---------+---------
    +----------+
    THROUGHPUT | 333 | 4155 | 65359 | 1048559 | 16777301 |
    +--------------------+--------+---------+---------+---------
    +----------+
    [!] bufio.Copy | 524.45 | 3137.31 | 4765.79 | 4922.01 | 2083.84 |
    [!] bufio.PipeCopy | 523.99 | 3115.08 | 4767.71 | 4924.59 | 2083.15 |
    rogerpeppe.Copy | 231.94 | 1942.33 | 4499.48 | 4906.88 | 2085.31 |
    egonelbre.Copy | 252.79 | 1865.77 | 4482.94 | 4832.15 | 2053.85 |
    jnml.Copy | 233.75 | 1947.74 | 4500.00 | 4914.93 | 6055.04 |
    +--------------------+--------+---------+---------+---------
    +----------+

    +--------------------+-----------------------+--------------
    ---------+-----------------------+-----------------------+--
    ---------------------+
    ALLOCS/BYTES | 333 | 4155 |
    65359 | 1048559 | 16777301 |
    +--------------------+-----------------------+--------------
    ---------+-----------------------+-----------------------+--
    ---------------------+
    [!] bufio.Copy | ( 24 / 1280) | ( 24 / 5536) |
    ( 24 / 66464) | ( 24 / 1049504) | ( 24 / 16786336) |
    [!] bufio.PipeCopy | ( 13 / 1024) | ( 13 / 5280) |
    ( 13 / 66208) | ( 13 / 1049248) | ( 13 / 16786080) |
    rogerpeppe.Copy | ( 12 / 896) | ( 12 / 5152) |
    ( 12 / 66080) | ( 12 / 1049120) | ( 12 / 16785952) |
    egonelbre.Copy | ( 21 / 1248) | ( 21 / 5504) |
    ( 21 / 66432) | ( 21 / 1049472) | ( 21 / 16786304) |
    jnml.Copy | ( 13 / 1008) | ( 13 / 5264) |
    ( 13 / 66192) | ( 13 / 1049232) | ( 13 / 16787424) |
    +--------------------+-----------------------+--------------
    ---------+-----------------------+-----------------------+--
    ---------------------+

    Throughput (GOMAXPROCS = 8) (256 MB):

    +--------------------+--------+---------+---------+---------
    +----------+
    THROUGHPUT | 333 | 4155 | 65359 | 1048559 | 16777301 |
    +--------------------+--------+---------+---------+---------
    +----------+
    [!] bufio.Copy | 510.38 | 2935.96 | 3899.06 | 4593.50 | 2067.04 |
    [!] bufio.PipeCopy | 503.73 | 2928.34 | 4204.07 | 4602.74 | 2073.62 |
    rogerpeppe.Copy | 206.61 | 1809.26 | 3770.79 | 4608.11 | 2069.03 |
    egonelbre.Copy | 350.66 | 2439.46 | 3946.51 | 4377.70 | 1917.33 |
    jnml.Copy | 214.09 | 1683.43 | 3773.63 | 4621.34 | 5749.54 |
    +--------------------+--------+---------+---------+---------
    +----------+

    +--------------------+-----------------------+--------------
    ---------+-----------------------+-----------------------+--
    ---------------------+
    ALLOCS/BYTES | 333 | 4155 |
    65359 | 1048559 | 16777301 |
    +--------------------+-----------------------+--------------
    ---------+-----------------------+-----------------------+--
    ---------------------+
    [!] bufio.Copy | ( 82 / 4992) | ( 46 / 6944) |
    ( 108 / 71840) | ( 536 / 1082272) | ( 56 / 16788384) |
    [!] bufio.PipeCopy | ( 65 / 4352) | ( 26 / 6336) |
    ( 66 / 69824) | ( 526 / 1082304) | ( 46 / 16788416) |
    rogerpeppe.Copy | ( 14 / 1248) | ( 14 / 5504) |
    ( 14 / 66432) | ( 14 / 1049472) | ( 14 / 16786304) |
    egonelbre.Copy | ( 89 / 5600) | ( 49 / 7744) |
    ( 51 / 68800) | ( 39 / 1051072) | ( 29 / 16787264) |
    jnml.Copy | ( 13 / 1008) | ( 14 / 5552) |
    ( 13 / 66192) | ( 13 / 1049232) | ( 13 / 16787424) |
    +--------------------+-----------------------+--------------
    ---------+-----------------------+-----------------------+--
    ---------------------+

    @Jan, Egon: would you guys agree that a pipe based solution would be
    better/more flexible? If so, we could rework the shootout to use pipe's as
    the underlying implementation and a pre-baked copy function (i.e. I
    blatantly copied mine from Roger).
    SGTM

    --
    You received this message because you are subscribed to the Google
    Groups "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send
    an email to golang-nuts...@googlegroups.com <javascript:>.
    For more options, visit https://groups.google.com/d/optout.
    --
    You received this message because you are subscribed to the Google Groups "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/d/optout.
  • Péter Szilágyi at Jan 31, 2015 at 9:56 am
    I also shot a few emails with Roger that it might be useful to try and
    convert the shootout to a testing package based solution. The benefit would
    be that we don't need to start writing tests all over when we finally agree
    upon a solution. The slight issue is that the testing framework isn't
    really compatible with the idea of "shooting out" bad solutions (i.e. if a
    test fails on one implementation, the rest shouldn't even run, as we're
    trying to filter out bad ones and only reach the benchmarks with the good
    ones). Maybe there's some mechanism in the testing package to "do some
    stuff" after a test finishes but before the next starts, and then we could
    filter the still alive solutions. Dunno, if somebody's up for the
    challenge, I'd gladly accept PRs :P

    On the same note, I've also pulled in the io.Pipe tests from the std libs
    to validate my pipe solution. It is probably not nearly enough, but it's
    way better than nothing. I concur that we should definitely try and pull in
    every test from the std libs that might catch nasty bugs, but we need to
    converge on the API for that. The previous one was a simple copy, but now
    we do have a Pipe proposal that not everyone implemented yet.

    On Augusto's proposal of using bufio.Pipe(buffer []byte) instead of
    bufio.Pipe(buffer int), I think it might be a valid point. It shouldn't
    pose significant inconvenience, but could allow finer memory management if
    someone needs it. Though it would be nice to have the feedback of others
    too, maybe we're missing some things :)

    Cheers,
       Peter


    On Sat, Jan 31, 2015 at 11:46 AM, wrote:

    Interesting -- I was thinking today it would be useful to have some tests
    using stuff from testing/iotest <http://godoc.org/testing/iotest>, for
    example the OneByteReader (would have caught Jan's issue, I think),
    DataErrReader (cropped up earlier), and HalfReader. Error handling should
    be tested as well for both the source and dest.

    For more general discussion, I structured my design around accepting an
    external buffer rather than accepting a buffer size. I think this is
    better since it allows the caller to reuse buffers (e.g. arena-allocated
    bufs), but it doesn't significantly inconvenience the casual user that can
    easily provide "make([]byte, N)".

    - Augusto
    On Saturday, January 31, 2015 at 1:05:45 AM UTC-8, Péter Szilágyi wrote:

    Jan, you're going to hate me (if you don't already) :P I've just shot out
    your code as *not solving* the problem again :P

    Currently the reason you can copy blazing fast is because you've
    optimized memory usage to not blindly use the entire buffer as the rest of
    us, bur rather work with chunks and reuse the hottest one (i.e. still in
    the cache). The flaw in your current implementation is that you don't fill
    one chunk fully, but rather stuff into it whatever's available and send it
    to the reader. However, if the output stream doesn't touch your chunk for a
    good while - even though it might be almost completely empty - the empty
    space is wasted, and you run out of buffer allowance to accept the input.

    By hitting your copy's input with many many tiny data chunks, if quickly
    uses up all "pages", and starts idling because it doesn't have any
    allowance left. I've managed to show this exact behavior by modifying the
    shootout's "stable" streams to use 10KB/ms throughput instead of 1MB/100ms.
    All other implementations work correctly, but your's stalls.

    However, I didn't like the idea of trying to fail your code just to show
    that it performs badly in some fabricated scenario, so I've actually placed
    your implementation in my production code to see if it's just some
    theoretical issue, or if it indeed does fail (streaming download, chunked
    upload). The result was as I've thought, that since I have a fast download,
    it hits copy with a lot of small ops, and your implementation behaves
    exactly like io.Copy, it instantly uses up all your pages and then stalls
    the download.

    To support my claim I've created a small repro, (though it requires
    gsutils, google cloud account etc configured), but it's my full code
    <https://gist.github.com/karalabe/024fe1f132c18471d411>, which I ran
    with io.Copy, jnml.Copy and bufioprop.Copy. The results can be seen on the
    attached chart (sorry for the bad/quick photoshop). The yellow line is the
    download, whereas the purple one is the upload. As you can see, whenever
    gsutils starts uploading, it will block accepting new data. As a
    consequence io.Copy stalls more or less immediately, your solution makes a
    few more ticks until it wastes all its pages, whereas the proposed solution
    happily downloads while the upload is running (note, there are of course
    interferences, but that is understandable imho).

    Cheers,
    Peter

    PS: The current stats are:

    Manually disabled contenders:
    ncw.Copy: deadlock in latency benchmark.
    ------------------------------------------------

    High throughput tests:
    io.Copy: test passed.
    [!] bufio.Copy: test passed.
    rogerpeppe.Copy: test passed.
    rogerpeppe.IOCopy: test passed.
    mattharden.Copy: test passed.
    yiyus.Copy: corrupt data on the output.
    egonelbre.Copy: test passed.
    jnml.Copy: test passed.
    bakulshah.Copy: panic.
    augustoroman.Copy: test passed.
    ------------------------------------------------

    Stable input, stable output shootout:
    io.Copy: 3.848840481s 8.314192 mbps 6954 allocs
    477712 B
    [!] bufio.Copy: 3.93977615s 8.122289 mbps 7017 allocs
    13033344 B
    rogerpeppe.Copy: 3.957739623s 8.085423 mbps 7075 allocs
    13035712 B
    rogerpeppe.IOCopy: 4.006578497s 7.986865 mbps 7075 allocs
    518288 B
    mattharden.Copy: 7.587068709s 4.217703 mbps 6686 allocs
    25593688 B
    egonelbre.Copy: 3.916161761s 8.171266 mbps 7054 allocs
    13034368 B
    jnml.Copy: 3.996595672s 8.006814 mbps 7061 allocs
    13035840 B
    augustoroman.Copy: 3.871614938s 8.265285 mbps 6849 allocs
    13021312 B

    Stable input, bursty output shootout:
    io.Copy: 6.836185084s 4.680973 mbps 3444 allocs
    254400 B
    [!] bufio.Copy: 4.60999474s 6.941440 mbps 3588 allocs
    12812608 B
    rogerpeppe.Copy: 4.569635021s 7.002747 mbps 3576 allocs
    12811776 B
    rogerpeppe.IOCopy: 6.923576695s 4.621889 mbps 3439 allocs
    285584 B
    egonelbre.Copy: 4.612420089s 6.937790 mbps 3754 allocs
    12823168 B
    jnml.Copy: 6.895554042s 4.640671 mbps 3601 allocs
    12814400 B
    augustoroman.Copy: 4.611330731s 6.939429 mbps 3521 allocs
    12808320 B

    Bursty input, stable output shootout:
    [!] bufio.Copy: 3.799650855s 8.421826 mbps 3311 allocs
    12794880 B
    rogerpeppe.Copy: 3.792738473s 8.437175 mbps 3309 allocs
    12794688 B
    egonelbre.Copy: 3.803217893s 8.413928 mbps 3393 allocs
    12800064 B
    augustoroman.Copy: 3.797589395s 8.426398 mbps 3306 allocs
    12794560 B
    ------------------------------------------------

    Latency benchmarks (GOMAXPROCS = 1):
    [!] bufio.Copy: 4.422µs 23 allocs 2288 B.
    rogerpeppe.Copy: 4.754µs 21 allocs 2096 B.
    egonelbre.Copy: 4.645µs 29 allocs 2368 B.
    augustoroman.Copy: 5.034µs 17 allocs 1904 B.

    Latency benchmarks (GOMAXPROCS = 8):
    [!] bufio.Copy: 4.596µs 673 allocs 43888 B.
    rogerpeppe.Copy: 4.934µs 354 allocs 23408 B.
    egonelbre.Copy: 4.856µs 370 allocs 24192 B.
    augustoroman.Copy: 5.296µs 114 allocs 8112 B.

    Throughput (GOMAXPROCS = 1) (256 MB):

    +-------------------+--------+---------+---------+---------+----------+
    THROUGHPUT | 333 | 4155 | 65359 | 1048559 | 16777301 |
    +-------------------+--------+---------+---------+---------+----------+
    [!] bufio.Copy | 492.98 | 3011.88 | 4758.31 | 4923.77 | 2057.23 |
    rogerpeppe.Copy | 226.69 | 1918.60 | 4495.06 | 4903.63 | 2058.11 |
    egonelbre.Copy | 253.95 | 1867.21 | 4484.71 | 4801.99 | 2033.94 |
    augustoroman.Copy | 162.78 | 1513.39 | 4324.66 | 4888.18 | 2060.91 |
    +-------------------+--------+---------+---------+---------+----------+

    +-------------------+-----------------------+---------------
    --------+-----------------------+-----------------------+---
    --------------------+
    ALLOCS/BYTES | 333 | 4155 |
    65359 | 1048559 | 16777301 |
    +-------------------+-----------------------+---------------
    --------+-----------------------+-----------------------+---
    --------------------+
    [!] bufio.Copy | ( 13 / 1024) | ( 13 / 5280) | (
    13 / 66208) | ( 13 / 1049248) | ( 13 / 16786080) |
    rogerpeppe.Copy | ( 12 / 896) | ( 12 / 5152) | (
    12 / 66080) | ( 12 / 1049120) | ( 12 / 16785952) |
    egonelbre.Copy | ( 21 / 1248) | ( 21 / 5504) | (
    21 / 66432) | ( 21 / 1049472) | ( 21 / 16786304) |
    augustoroman.Copy | ( 12 / 960) | ( 12 / 5216) | (
    12 / 66144) | ( 12 / 1049184) | ( 12 / 16786016) |
    +-------------------+-----------------------+---------------
    --------+-----------------------+-----------------------+---
    --------------------+

    Throughput (GOMAXPROCS = 8) (256 MB):

    +-------------------+--------+---------+---------+---------+----------+
    THROUGHPUT | 333 | 4155 | 65359 | 1048559 | 16777301 |
    +-------------------+--------+---------+---------+---------+----------+
    [!] bufio.Copy | 499.11 | 2922.10 | 4262.06 | 4589.26 | 2041.15 |
    rogerpeppe.Copy | 220.56 | 1698.86 | 3953.09 | 4611.53 | 2048.96 |
    egonelbre.Copy | 360.14 | 2439.69 | 3931.45 | 4339.05 | 1907.04 |
    augustoroman.Copy | 150.06 | 1392.57 | 3735.92 | 4572.96 | 2052.33 |
    +-------------------+--------+---------+---------+---------+----------+

    +-------------------+-----------------------+---------------
    --------+-----------------------+-----------------------+---
    --------------------+
    ALLOCS/BYTES | 333 | 4155 |
    65359 | 1048559 | 16777301 |
    +-------------------+-----------------------+---------------
    --------+-----------------------+-----------------------+---
    --------------------+
    [!] bufio.Copy | ( 130 / 9792) | ( 46 / 7392) | (
    94 / 71392) | ( 781 / 1098400) | ( 61 / 16789152) |
    rogerpeppe.Copy | ( 13 / 960) | ( 14 / 5504) | (
    14 / 66432) | ( 14 / 1049472) | ( 14 / 16786304) |
    egonelbre.Copy | ( 93 / 6304) | ( 50 / 7584) | (
    53 / 68928) | ( 34 / 1050528) | ( 34 / 16787360) |
    augustoroman.Copy | ( 22 / 1824) | ( 13 / 5504) | (
    14 / 66496) | ( 12 / 1049184) | ( 12 / 16786016) |
    +-------------------+-----------------------+---------------
    --------+-----------------------+-----------------------+---
    --------------------+



    On Fri, Jan 30, 2015 at 10:14 PM, Péter Szilágyi <pet...@gmail.com>
    wrote:
    Up till now, this is what I'm aiming to include in bufio. Feedback
    welcome API wise too (though it's just based on Roger's and the io packages
    Copy/Pipe)

    http://godoc.org/github.com/karalabe/bufioprop
    On Fri, Jan 30, 2015 at 9:45 PM, Egon wrote:


    On Friday, 30 January 2015 21:03:02 UTC+2, Péter Szilágyi wrote:

    Hi all,

    A few modifications went into the benchmarks:

    - Roger added a repeating reader in order not to need a giant
    pre-allocated test data blob, makes startup faster, also should better
    reflect algo performance.
    - To prevent measuring occasional hiccups, the throughput
    benchmarks not use best-out-of-three. Scores are much stabler.

    Implementation wise the updates are:

    - Roger sent in a full Pipe based solution, arguing that it's way
    more flexible than a simple copy (I agree).
    - Jan sent in a nice optimization that seems to beat other algos
    in the case of very large buffers.
    - I've also ported my solution to a pipe version (both coexisting
    currently). That codebase needs to be cleaned up, it was to see if it works
    ok.

    With these, the current standing is:

    Latency benchmarks (GOMAXPROCS = 1):
    [!] bufio.Copy: 4.322µs 37 allocs 2736 B.
    [!] bufio.PipeCopy: 4.396µs 22 allocs 2224 B.
    rogerpeppe.Copy: 4.666µs 20 allocs 2032 B.
    egonelbre.Copy: 4.681µs 29 allocs 2368 B.
    jnml.Copy: 4.964µs 18 allocs 1936 B.

    Latency benchmarks (GOMAXPROCS = 8):
    [!] bufio.Copy: 4.525µs 398 allocs 25840 B.
    [!] bufio.PipeCopy: 4.58µs 632 allocs 41264 B.
    rogerpeppe.Copy: 4.918µs 325 allocs 21552 B.
    egonelbre.Copy: 4.779µs 518 allocs 33664 B.
    jnml.Copy: 5.187µs 321 allocs 21328 B.

    Throughput (GOMAXPROCS = 1) (256 MB):

    +--------------------+--------+---------+---------+---------
    +----------+
    THROUGHPUT | 333 | 4155 | 65359 | 1048559 | 16777301
    +--------------------+--------+---------+---------+---------
    +----------+
    [!] bufio.Copy | 524.45 | 3137.31 | 4765.79 | 4922.01 | 2083.84

    [!] bufio.PipeCopy | 523.99 | 3115.08 | 4767.71 | 4924.59 | 2083.15

    rogerpeppe.Copy | 231.94 | 1942.33 | 4499.48 | 4906.88 | 2085.31

    egonelbre.Copy | 252.79 | 1865.77 | 4482.94 | 4832.15 | 2053.85

    jnml.Copy | 233.75 | 1947.74 | 4500.00 | 4914.93 | 6055.04
    +--------------------+--------+---------+---------+---------
    +----------+

    +--------------------+-----------------------+--------------
    ---------+-----------------------+-----------------------+--
    ---------------------+
    ALLOCS/BYTES | 333 | 4155 |
    65359 | 1048559 | 16777301 |
    +--------------------+-----------------------+--------------
    ---------+-----------------------+-----------------------+--
    ---------------------+
    [!] bufio.Copy | ( 24 / 1280) | ( 24 / 5536) |
    ( 24 / 66464) | ( 24 / 1049504) | ( 24 / 16786336) |
    [!] bufio.PipeCopy | ( 13 / 1024) | ( 13 / 5280) |
    ( 13 / 66208) | ( 13 / 1049248) | ( 13 / 16786080) |
    rogerpeppe.Copy | ( 12 / 896) | ( 12 / 5152) |
    ( 12 / 66080) | ( 12 / 1049120) | ( 12 / 16785952) |
    egonelbre.Copy | ( 21 / 1248) | ( 21 / 5504) |
    ( 21 / 66432) | ( 21 / 1049472) | ( 21 / 16786304) |
    jnml.Copy | ( 13 / 1008) | ( 13 / 5264) |
    ( 13 / 66192) | ( 13 / 1049232) | ( 13 / 16787424) |
    +--------------------+-----------------------+--------------
    ---------+-----------------------+-----------------------+--
    ---------------------+

    Throughput (GOMAXPROCS = 8) (256 MB):

    +--------------------+--------+---------+---------+---------
    +----------+
    THROUGHPUT | 333 | 4155 | 65359 | 1048559 | 16777301
    +--------------------+--------+---------+---------+---------
    +----------+
    [!] bufio.Copy | 510.38 | 2935.96 | 3899.06 | 4593.50 | 2067.04

    [!] bufio.PipeCopy | 503.73 | 2928.34 | 4204.07 | 4602.74 | 2073.62

    rogerpeppe.Copy | 206.61 | 1809.26 | 3770.79 | 4608.11 | 2069.03

    egonelbre.Copy | 350.66 | 2439.46 | 3946.51 | 4377.70 | 1917.33

    jnml.Copy | 214.09 | 1683.43 | 3773.63 | 4621.34 | 5749.54
    +--------------------+--------+---------+---------+---------
    +----------+

    +--------------------+-----------------------+--------------
    ---------+-----------------------+-----------------------+--
    ---------------------+
    ALLOCS/BYTES | 333 | 4155 |
    65359 | 1048559 | 16777301 |
    +--------------------+-----------------------+--------------
    ---------+-----------------------+-----------------------+--
    ---------------------+
    [!] bufio.Copy | ( 82 / 4992) | ( 46 / 6944) |
    ( 108 / 71840) | ( 536 / 1082272) | ( 56 / 16788384) |
    [!] bufio.PipeCopy | ( 65 / 4352) | ( 26 / 6336) |
    ( 66 / 69824) | ( 526 / 1082304) | ( 46 / 16788416) |
    rogerpeppe.Copy | ( 14 / 1248) | ( 14 / 5504) |
    ( 14 / 66432) | ( 14 / 1049472) | ( 14 / 16786304) |
    egonelbre.Copy | ( 89 / 5600) | ( 49 / 7744) |
    ( 51 / 68800) | ( 39 / 1051072) | ( 29 / 16787264) |
    jnml.Copy | ( 13 / 1008) | ( 14 / 5552) |
    ( 13 / 66192) | ( 13 / 1049232) | ( 13 / 16787424) |
    +--------------------+-----------------------+--------------
    ---------+-----------------------+-----------------------+--
    ---------------------+

    @Jan, Egon: would you guys agree that a pipe based solution would be
    better/more flexible? If so, we could rework the shootout to use pipe's as
    the underlying implementation and a pre-baked copy function (i.e. I
    blatantly copied mine from Roger).
    SGTM

    --
    You received this message because you are subscribed to the Google
    Groups "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send
    an email to golang-nuts...@googlegroups.com.
    For more options, visit https://groups.google.com/d/optout.
    --
    You received this message because you are subscribed to the Google Groups
    "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send an
    email to golang-nuts+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/d/optout.
    --
    You received this message because you are subscribed to the Google Groups "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/d/optout.
  • Péter Szilágyi at Jan 31, 2015 at 10:14 am
    I've got an open question regarding my synchronization.

    I am using channels to wake sleeping goroutines in case some notable event
    occurs (reader close, writer close, data/space available (example
    <https://github.com/karalabe/bufioprop/blob/master/pipe.go#L132>)). The
    solution is imho simple, yet performs really well (the core concept behind
    my implementation is probably the same as anyone else's, so the performance
    diff is imho related to the syncing). Yet is some rare cases I see an
    increased allocation count (see gomaxprocs 8 alloc table), which makes a
    minor hit on performance. I'm guessing the culprit is in the internals of
    channels, but could someone hint as to what might happen? Why is my code
    allocating sporadically?

    Thanks
    On Sat, Jan 31, 2015 at 11:56 AM, Péter Szilágyi wrote:

    I also shot a few emails with Roger that it might be useful to try and
    convert the shootout to a testing package based solution. The benefit would
    be that we don't need to start writing tests all over when we finally agree
    upon a solution. The slight issue is that the testing framework isn't
    really compatible with the idea of "shooting out" bad solutions (i.e. if a
    test fails on one implementation, the rest shouldn't even run, as we're
    trying to filter out bad ones and only reach the benchmarks with the good
    ones). Maybe there's some mechanism in the testing package to "do some
    stuff" after a test finishes but before the next starts, and then we could
    filter the still alive solutions. Dunno, if somebody's up for the
    challenge, I'd gladly accept PRs :P

    On the same note, I've also pulled in the io.Pipe tests from the std libs
    to validate my pipe solution. It is probably not nearly enough, but it's
    way better than nothing. I concur that we should definitely try and pull in
    every test from the std libs that might catch nasty bugs, but we need to
    converge on the API for that. The previous one was a simple copy, but now
    we do have a Pipe proposal that not everyone implemented yet.

    On Augusto's proposal of using bufio.Pipe(buffer []byte) instead of
    bufio.Pipe(buffer int), I think it might be a valid point. It shouldn't
    pose significant inconvenience, but could allow finer memory management if
    someone needs it. Though it would be nice to have the feedback of others
    too, maybe we're missing some things :)

    Cheers,
    Peter


    On Sat, Jan 31, 2015 at 11:46 AM, wrote:

    Interesting -- I was thinking today it would be useful to have some tests
    using stuff from testing/iotest <http://godoc.org/testing/iotest>, for
    example the OneByteReader (would have caught Jan's issue, I think),
    DataErrReader (cropped up earlier), and HalfReader. Error handling should
    be tested as well for both the source and dest.

    For more general discussion, I structured my design around accepting an
    external buffer rather than accepting a buffer size. I think this is
    better since it allows the caller to reuse buffers (e.g. arena-allocated
    bufs), but it doesn't significantly inconvenience the casual user that can
    easily provide "make([]byte, N)".

    - Augusto
    On Saturday, January 31, 2015 at 1:05:45 AM UTC-8, Péter Szilágyi wrote:

    Jan, you're going to hate me (if you don't already) :P I've just shot
    out your code as *not solving* the problem again :P

    Currently the reason you can copy blazing fast is because you've
    optimized memory usage to not blindly use the entire buffer as the rest of
    us, bur rather work with chunks and reuse the hottest one (i.e. still in
    the cache). The flaw in your current implementation is that you don't fill
    one chunk fully, but rather stuff into it whatever's available and send it
    to the reader. However, if the output stream doesn't touch your chunk for a
    good while - even though it might be almost completely empty - the empty
    space is wasted, and you run out of buffer allowance to accept the input.

    By hitting your copy's input with many many tiny data chunks, if quickly
    uses up all "pages", and starts idling because it doesn't have any
    allowance left. I've managed to show this exact behavior by modifying the
    shootout's "stable" streams to use 10KB/ms throughput instead of 1MB/100ms.
    All other implementations work correctly, but your's stalls.

    However, I didn't like the idea of trying to fail your code just to show
    that it performs badly in some fabricated scenario, so I've actually placed
    your implementation in my production code to see if it's just some
    theoretical issue, or if it indeed does fail (streaming download, chunked
    upload). The result was as I've thought, that since I have a fast download,
    it hits copy with a lot of small ops, and your implementation behaves
    exactly like io.Copy, it instantly uses up all your pages and then stalls
    the download.

    To support my claim I've created a small repro, (though it requires
    gsutils, google cloud account etc configured), but it's my full code
    <https://gist.github.com/karalabe/024fe1f132c18471d411>, which I ran
    with io.Copy, jnml.Copy and bufioprop.Copy. The results can be seen on the
    attached chart (sorry for the bad/quick photoshop). The yellow line is the
    download, whereas the purple one is the upload. As you can see, whenever
    gsutils starts uploading, it will block accepting new data. As a
    consequence io.Copy stalls more or less immediately, your solution makes a
    few more ticks until it wastes all its pages, whereas the proposed solution
    happily downloads while the upload is running (note, there are of course
    interferences, but that is understandable imho).

    Cheers,
    Peter

    PS: The current stats are:

    Manually disabled contenders:
    ncw.Copy: deadlock in latency benchmark.
    ------------------------------------------------

    High throughput tests:
    io.Copy: test passed.
    [!] bufio.Copy: test passed.
    rogerpeppe.Copy: test passed.
    rogerpeppe.IOCopy: test passed.
    mattharden.Copy: test passed.
    yiyus.Copy: corrupt data on the output.
    egonelbre.Copy: test passed.
    jnml.Copy: test passed.
    bakulshah.Copy: panic.
    augustoroman.Copy: test passed.
    ------------------------------------------------

    Stable input, stable output shootout:
    io.Copy: 3.848840481s 8.314192 mbps 6954 allocs
    477712 B
    [!] bufio.Copy: 3.93977615s 8.122289 mbps 7017 allocs
    13033344 B
    rogerpeppe.Copy: 3.957739623s 8.085423 mbps 7075 allocs
    13035712 B
    rogerpeppe.IOCopy: 4.006578497s 7.986865 mbps 7075 allocs
    518288 B
    mattharden.Copy: 7.587068709s 4.217703 mbps 6686 allocs
    25593688 B
    egonelbre.Copy: 3.916161761s 8.171266 mbps 7054 allocs
    13034368 B
    jnml.Copy: 3.996595672s 8.006814 mbps 7061 allocs
    13035840 B
    augustoroman.Copy: 3.871614938s 8.265285 mbps 6849 allocs
    13021312 B

    Stable input, bursty output shootout:
    io.Copy: 6.836185084s 4.680973 mbps 3444 allocs
    254400 B
    [!] bufio.Copy: 4.60999474s 6.941440 mbps 3588 allocs
    12812608 B
    rogerpeppe.Copy: 4.569635021s 7.002747 mbps 3576 allocs
    12811776 B
    rogerpeppe.IOCopy: 6.923576695s 4.621889 mbps 3439 allocs
    285584 B
    egonelbre.Copy: 4.612420089s 6.937790 mbps 3754 allocs
    12823168 B
    jnml.Copy: 6.895554042s 4.640671 mbps 3601 allocs
    12814400 B
    augustoroman.Copy: 4.611330731s 6.939429 mbps 3521 allocs
    12808320 B

    Bursty input, stable output shootout:
    [!] bufio.Copy: 3.799650855s 8.421826 mbps 3311 allocs
    12794880 B
    rogerpeppe.Copy: 3.792738473s 8.437175 mbps 3309 allocs
    12794688 B
    egonelbre.Copy: 3.803217893s 8.413928 mbps 3393 allocs
    12800064 B
    augustoroman.Copy: 3.797589395s 8.426398 mbps 3306 allocs
    12794560 B
    ------------------------------------------------

    Latency benchmarks (GOMAXPROCS = 1):
    [!] bufio.Copy: 4.422µs 23 allocs 2288 B.
    rogerpeppe.Copy: 4.754µs 21 allocs 2096 B.
    egonelbre.Copy: 4.645µs 29 allocs 2368 B.
    augustoroman.Copy: 5.034µs 17 allocs 1904 B.

    Latency benchmarks (GOMAXPROCS = 8):
    [!] bufio.Copy: 4.596µs 673 allocs 43888 B.
    rogerpeppe.Copy: 4.934µs 354 allocs 23408 B.
    egonelbre.Copy: 4.856µs 370 allocs 24192 B.
    augustoroman.Copy: 5.296µs 114 allocs 8112 B.

    Throughput (GOMAXPROCS = 1) (256 MB):

    +-------------------+--------+---------+---------+---------+----------+
    THROUGHPUT | 333 | 4155 | 65359 | 1048559 | 16777301 |
    +-------------------+--------+---------+---------+---------+----------+
    [!] bufio.Copy | 492.98 | 3011.88 | 4758.31 | 4923.77 | 2057.23 |
    rogerpeppe.Copy | 226.69 | 1918.60 | 4495.06 | 4903.63 | 2058.11 |
    egonelbre.Copy | 253.95 | 1867.21 | 4484.71 | 4801.99 | 2033.94 |
    augustoroman.Copy | 162.78 | 1513.39 | 4324.66 | 4888.18 | 2060.91 |
    +-------------------+--------+---------+---------+---------+----------+

    +-------------------+-----------------------+---------------
    --------+-----------------------+-----------------------+---
    --------------------+
    ALLOCS/BYTES | 333 | 4155 |
    65359 | 1048559 | 16777301 |
    +-------------------+-----------------------+---------------
    --------+-----------------------+-----------------------+---
    --------------------+
    [!] bufio.Copy | ( 13 / 1024) | ( 13 / 5280) | (
    13 / 66208) | ( 13 / 1049248) | ( 13 / 16786080) |
    rogerpeppe.Copy | ( 12 / 896) | ( 12 / 5152) | (
    12 / 66080) | ( 12 / 1049120) | ( 12 / 16785952) |
    egonelbre.Copy | ( 21 / 1248) | ( 21 / 5504) | (
    21 / 66432) | ( 21 / 1049472) | ( 21 / 16786304) |
    augustoroman.Copy | ( 12 / 960) | ( 12 / 5216) | (
    12 / 66144) | ( 12 / 1049184) | ( 12 / 16786016) |
    +-------------------+-----------------------+---------------
    --------+-----------------------+-----------------------+---
    --------------------+

    Throughput (GOMAXPROCS = 8) (256 MB):

    +-------------------+--------+---------+---------+---------+----------+
    THROUGHPUT | 333 | 4155 | 65359 | 1048559 | 16777301 |
    +-------------------+--------+---------+---------+---------+----------+
    [!] bufio.Copy | 499.11 | 2922.10 | 4262.06 | 4589.26 | 2041.15 |
    rogerpeppe.Copy | 220.56 | 1698.86 | 3953.09 | 4611.53 | 2048.96 |
    egonelbre.Copy | 360.14 | 2439.69 | 3931.45 | 4339.05 | 1907.04 |
    augustoroman.Copy | 150.06 | 1392.57 | 3735.92 | 4572.96 | 2052.33 |
    +-------------------+--------+---------+---------+---------+----------+

    +-------------------+-----------------------+---------------
    --------+-----------------------+-----------------------+---
    --------------------+
    ALLOCS/BYTES | 333 | 4155 |
    65359 | 1048559 | 16777301 |
    +-------------------+-----------------------+---------------
    --------+-----------------------+-----------------------+---
    --------------------+
    [!] bufio.Copy | ( 130 / 9792) | ( 46 / 7392) | (
    94 / 71392) | ( 781 / 1098400) | ( 61 / 16789152) |
    rogerpeppe.Copy | ( 13 / 960) | ( 14 / 5504) | (
    14 / 66432) | ( 14 / 1049472) | ( 14 / 16786304) |
    egonelbre.Copy | ( 93 / 6304) | ( 50 / 7584) | (
    53 / 68928) | ( 34 / 1050528) | ( 34 / 16787360) |
    augustoroman.Copy | ( 22 / 1824) | ( 13 / 5504) | (
    14 / 66496) | ( 12 / 1049184) | ( 12 / 16786016) |
    +-------------------+-----------------------+---------------
    --------+-----------------------+-----------------------+---
    --------------------+



    On Fri, Jan 30, 2015 at 10:14 PM, Péter Szilágyi <pet...@gmail.com>
    wrote:
    Up till now, this is what I'm aiming to include in bufio. Feedback
    welcome API wise too (though it's just based on Roger's and the io packages
    Copy/Pipe)

    http://godoc.org/github.com/karalabe/bufioprop
    On Fri, Jan 30, 2015 at 9:45 PM, Egon wrote:


    On Friday, 30 January 2015 21:03:02 UTC+2, Péter Szilágyi wrote:

    Hi all,

    A few modifications went into the benchmarks:

    - Roger added a repeating reader in order not to need a giant
    pre-allocated test data blob, makes startup faster, also should better
    reflect algo performance.
    - To prevent measuring occasional hiccups, the throughput
    benchmarks not use best-out-of-three. Scores are much stabler.

    Implementation wise the updates are:

    - Roger sent in a full Pipe based solution, arguing that it's way
    more flexible than a simple copy (I agree).
    - Jan sent in a nice optimization that seems to beat other algos
    in the case of very large buffers.
    - I've also ported my solution to a pipe version (both coexisting
    currently). That codebase needs to be cleaned up, it was to see if it works
    ok.

    With these, the current standing is:

    Latency benchmarks (GOMAXPROCS = 1):
    [!] bufio.Copy: 4.322µs 37 allocs 2736 B.
    [!] bufio.PipeCopy: 4.396µs 22 allocs 2224 B.
    rogerpeppe.Copy: 4.666µs 20 allocs 2032 B.
    egonelbre.Copy: 4.681µs 29 allocs 2368 B.
    jnml.Copy: 4.964µs 18 allocs 1936 B.

    Latency benchmarks (GOMAXPROCS = 8):
    [!] bufio.Copy: 4.525µs 398 allocs 25840 B.
    [!] bufio.PipeCopy: 4.58µs 632 allocs 41264 B.
    rogerpeppe.Copy: 4.918µs 325 allocs 21552 B.
    egonelbre.Copy: 4.779µs 518 allocs 33664 B.
    jnml.Copy: 5.187µs 321 allocs 21328 B.

    Throughput (GOMAXPROCS = 1) (256 MB):

    +--------------------+--------+---------+---------+---------
    +----------+
    THROUGHPUT | 333 | 4155 | 65359 | 1048559 |
    16777301 |
    +--------------------+--------+---------+---------+---------
    +----------+
    [!] bufio.Copy | 524.45 | 3137.31 | 4765.79 | 4922.01 | 2083.84 |
    [!] bufio.PipeCopy | 523.99 | 3115.08 | 4767.71 | 4924.59 | 2083.15 |
    rogerpeppe.Copy | 231.94 | 1942.33 | 4499.48 | 4906.88 | 2085.31 |
    egonelbre.Copy | 252.79 | 1865.77 | 4482.94 | 4832.15 | 2053.85 |
    jnml.Copy | 233.75 | 1947.74 | 4500.00 | 4914.93 |
    6055.04 |
    +--------------------+--------+---------+---------+---------
    +----------+

    +--------------------+-----------------------+--------------
    ---------+-----------------------+-----------------------+--
    ---------------------+
    ALLOCS/BYTES | 333 | 4155
    65359 | 1048559 | 16777301 |
    +--------------------+-----------------------+--------------
    ---------+-----------------------+-----------------------+--
    ---------------------+
    [!] bufio.Copy | ( 24 / 1280) | ( 24 / 5536)
    ( 24 / 66464) | ( 24 / 1049504) | ( 24 / 16786336) |
    [!] bufio.PipeCopy | ( 13 / 1024) | ( 13 / 5280)
    ( 13 / 66208) | ( 13 / 1049248) | ( 13 / 16786080) |
    rogerpeppe.Copy | ( 12 / 896) | ( 12 / 5152)
    ( 12 / 66080) | ( 12 / 1049120) | ( 12 / 16785952) |
    egonelbre.Copy | ( 21 / 1248) | ( 21 / 5504)
    ( 21 / 66432) | ( 21 / 1049472) | ( 21 / 16786304) |
    jnml.Copy | ( 13 / 1008) | ( 13 / 5264)
    ( 13 / 66192) | ( 13 / 1049232) | ( 13 / 16787424) |
    +--------------------+-----------------------+--------------
    ---------+-----------------------+-----------------------+--
    ---------------------+

    Throughput (GOMAXPROCS = 8) (256 MB):

    +--------------------+--------+---------+---------+---------
    +----------+
    THROUGHPUT | 333 | 4155 | 65359 | 1048559 |
    16777301 |
    +--------------------+--------+---------+---------+---------
    +----------+
    [!] bufio.Copy | 510.38 | 2935.96 | 3899.06 | 4593.50 | 2067.04 |
    [!] bufio.PipeCopy | 503.73 | 2928.34 | 4204.07 | 4602.74 | 2073.62 |
    rogerpeppe.Copy | 206.61 | 1809.26 | 3770.79 | 4608.11 | 2069.03 |
    egonelbre.Copy | 350.66 | 2439.46 | 3946.51 | 4377.70 | 1917.33 |
    jnml.Copy | 214.09 | 1683.43 | 3773.63 | 4621.34 |
    5749.54 |
    +--------------------+--------+---------+---------+---------
    +----------+

    +--------------------+-----------------------+--------------
    ---------+-----------------------+-----------------------+--
    ---------------------+
    ALLOCS/BYTES | 333 | 4155
    65359 | 1048559 | 16777301 |
    +--------------------+-----------------------+--------------
    ---------+-----------------------+-----------------------+--
    ---------------------+
    [!] bufio.Copy | ( 82 / 4992) | ( 46 / 6944)
    ( 108 / 71840) | ( 536 / 1082272) | ( 56 / 16788384) |
    [!] bufio.PipeCopy | ( 65 / 4352) | ( 26 / 6336)
    ( 66 / 69824) | ( 526 / 1082304) | ( 46 / 16788416) |
    rogerpeppe.Copy | ( 14 / 1248) | ( 14 / 5504)
    ( 14 / 66432) | ( 14 / 1049472) | ( 14 / 16786304) |
    egonelbre.Copy | ( 89 / 5600) | ( 49 / 7744)
    ( 51 / 68800) | ( 39 / 1051072) | ( 29 / 16787264) |
    jnml.Copy | ( 13 / 1008) | ( 14 / 5552)
    ( 13 / 66192) | ( 13 / 1049232) | ( 13 / 16787424) |
    +--------------------+-----------------------+--------------
    ---------+-----------------------+-----------------------+--
    ---------------------+

    @Jan, Egon: would you guys agree that a pipe based solution would be
    better/more flexible? If so, we could rework the shootout to use pipe's as
    the underlying implementation and a pre-baked copy function (i.e. I
    blatantly copied mine from Roger).
    SGTM

    --
    You received this message because you are subscribed to the Google
    Groups "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send
    an email to golang-nuts...@googlegroups.com.
    For more options, visit https://groups.google.com/d/optout.
    --
    You received this message because you are subscribed to the Google Groups
    "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send an
    email to golang-nuts+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/d/optout.
    --
    You received this message because you are subscribed to the Google Groups "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/d/optout.
  • Jan Mercl at Jan 31, 2015 at 4:34 pm

    On Sat Jan 31 2015 at 10:05:34 Péter Szilágyi wrote:

    I've just shot out your code as *not solving* the problem again :P
    Please pull[0], thank you.

       [0]: https://github.com/karalabe/bufioprop/pull/15

    -j

    --
    You received this message because you are subscribed to the Google Groups "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/d/optout.
  • Péter Szilágyi at Jan 31, 2015 at 5:25 pm
    Hi all,

       @Jan: Merged and the new solution indeed passes the shootout. With the
    current version, your implementation too is in the same ballpark as the
    rest. I think we're starting to converge on the achievable performance from
    all implementations.

    My solution as of lately seems to beat others significantly for smaller
    buffer sizes. This is due to a neat optimization
    <https://github.com/karalabe/bufioprop/blob/master/pipe.go#L126> of doing a
    short spin-lock before going down to deep sleep if no data/space is
    available in the internal buffers. The idea is that it should be very
    short, not to take a toll on the performance if no data is coming, but it
    should be long enough that if data *is* streamed, then it doesn't need to
    sync it. Jan, I guess this isn't something you could try as you're
    completely channel based, but give it a thought (if it's doable, it might
    bring up your performance on small buffers). The others should maybe play
    around with the idea, I think I saw it in Egon's code a while back but
    haven't checked lately.

    Jan's previous optimization was actually a really really good observation
    that the reason performance goes down with large buffers is because you are
    missing the CPU cache and need to go through main memory, essentially
    limiting your performance by that. His solution was to try and reuse hot
    parts of the buffer that can probably still be found in L1/2 caches, but it
    didn't pan out correctly (see the previous longish description for
    details). Nonetheless the observation is a good one, so it *could* be
    worthwhile to try an implement this hot cache reuse. I am thinking in a
    solution that would split the buffer up the same way Jan did previously,
    but keep writing to one cache-line/chunk/piece until it's full and only
    then start the next. The issue is that synchronization can get really messy.

    I guess the last algorithmic challenge in this proposal would be to figure
    out if the buffer can be kept hot. If yes, great, if not, we could proceed
    to finalizing the API around the buffered copy/pipe.

    Cheers,
       Peter

    PS: Jury's still out on why I get hit by memory allocs *always* at the same
    tests, never others.

    Latency benchmarks (GOMAXPROCS = 1):
           [!] bufio.Copy: 4.596µs 23 allocs 2288 B.
          rogerpeppe.Copy: 4.883µs 21 allocs 2096 B.
           egonelbre.Copy: 4.77µs 29 allocs 2368 B.
                jnml.Copy: 5.041µs 19 allocs 2144 B.
        augustoroman.Copy: 5.229µs 16 allocs 1840 B.

    Latency benchmarks (GOMAXPROCS = 8):
           [!] bufio.Copy: 4.597µs 481 allocs 30736 B.
          rogerpeppe.Copy: 4.916µs 348 allocs 22224 B.
           egonelbre.Copy: 4.786µs 398 allocs 27408 B.
                jnml.Copy: 4.92µs 343 allocs 21904 B.
        augustoroman.Copy: 5.208µs 162 allocs 11184 B.

    Throughput (GOMAXPROCS = 1) (256 MB):

    +-------------------+--------+---------+---------+---------+----------+
    THROUGHPUT | 333 | 4155 | 65359 | 1048559 | 16777301 |
    +-------------------+--------+---------+---------+---------+----------+
    [!] bufio.Copy | 470.22 | 2950.34 | 4738.61 | 4924.87 | 2001.32 |
    rogerpeppe.Copy | 216.22 | 1856.43 | 4462.28 | 4911.21 | 2000.13 |
    egonelbre.Copy | 253.42 | 1852.45 | 4479.33 | 4788.88 | 1982.38 |
    jnml.Copy | 231.71 | 1947.09 | 4503.48 | 4907.54 | 2008.88 |
    augustoroman.Copy | 158.89 | 1479.22 | 4299.65 | 4887.31 | 2008.11 |
    +-------------------+--------+---------+---------+---------+----------+

    +-------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+
    ALLOCS/BYTES | 333 | 4155 |
       65359 | 1048559 | 16777301 |
    +-------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+
    [!] bufio.Copy | ( 13 / 1024) | ( 13 / 5280) | (
      13 / 66208) | ( 13 / 1049248) | ( 13 / 16786080) |
    rogerpeppe.Copy | ( 12 / 896) | ( 12 / 5152) | (
      12 / 66080) | ( 12 / 1049120) | ( 12 / 16785952) |
    egonelbre.Copy | ( 21 / 1248) | ( 21 / 5504) | (
      21 / 66432) | ( 21 / 1049472) | ( 21 / 16786304) |
    jnml.Copy | ( 12 / 1088) | ( 12 / 5344) | (
      12 / 66272) | ( 12 / 1049312) | ( 12 / 16786144) |
    augustoroman.Copy | ( 12 / 960) | ( 12 / 5216) | (
      12 / 66144) | ( 12 / 1049184) | ( 12 / 16786016) |
    +-------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+

    Throughput (GOMAXPROCS = 8) (256 MB):

    +-------------------+--------+---------+---------+---------+----------+
    THROUGHPUT | 333 | 4155 | 65359 | 1048559 | 16777301 |
    +-------------------+--------+---------+---------+---------+----------+
    [!] bufio.Copy | 490.45 | 2934.83 | 4543.90 | 4590.13 | 1994.79 |
    rogerpeppe.Copy | 215.50 | 1721.20 | 3904.96 | 4596.47 | 2000.28 |
    egonelbre.Copy | 344.52 | 2467.48 | 3907.59 | 4348.90 | 1850.96 |
    jnml.Copy | 239.27 | 897.75 | 4029.99 | 4625.88 | 1996.11 |
    augustoroman.Copy | 152.99 | 1415.63 | 3581.26 | 4587.91 | 1999.95 |
    +-------------------+--------+---------+---------+---------+----------+

    +-------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+
    ALLOCS/BYTES | 333 | 4155 |
       65359 | 1048559 | 16777301 |
    +-------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+
    [!] bufio.Copy | ( 143 / 9344) | ( 38 / 6880) | (
      91 / 71200) | ( 781 / 1098400) | ( 61 / 16789152) |
    rogerpeppe.Copy | ( 13 / 960) | ( 14 / 5504) | (
      14 / 66432) | ( 14 / 1049472) | ( 14 / 16786304) |
    egonelbre.Copy | ( 97 / 6560) | ( 48 / 7456) | (
      49 / 68672) | ( 30 / 1050272) | ( 31 / 16786944) |
    jnml.Copy | ( 12 / 1088) | ( 13 / 5632) | (
      13 / 66560) | ( 12 / 1049312) | ( 12 / 16786144) |
    augustoroman.Copy | ( 25 / 2016) | ( 14 / 5568) | (
      13 / 66432) | ( 12 / 1049184) | ( 12 / 16786016) |
    +-------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+

    On Sat, Jan 31, 2015 at 6:33 PM, Jan Mercl wrote:
    On Sat Jan 31 2015 at 10:05:34 Péter Szilágyi wrote:

    I've just shot out your code as *not solving* the problem again :P
    Please pull[0], thank you.

    [0]: https://github.com/karalabe/bufioprop/pull/15

    -j
    --
    You received this message because you are subscribed to the Google Groups "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/d/optout.
  • Péter Szilágyi at Feb 3, 2015 at 11:21 am
    Bump :)

    After a short repro [0] and Dmitry's hard work [1][2], it turned out that
    the allocations I was seeing are actually a bug in the Go runtime. Applying
    the previously mentioned two fixes to the Go master branch almost
    completely eliminates the synchronization allocations (some are required,
    but are amortized, so the longer a process runs, the less allocs it does,
    hence why the latency benchmarks do report some, but the throughput ones
    which are repeated multiple times don't).

    Cheers,
       Peter

    Refs:
       [0] https://groups.google.com/forum/#!topic/golang-nuts/a8ZoAhAeO7k
       [1] https://go-review.googlesource.com/#/c/3742
       [2] https://go-review.googlesource.com/#/c/3741

    Latency benchmarks (GOMAXPROCS = 1):
           [!] bufio.Copy: 5.187µs 11 allocs 1696 B.
          rogerpeppe.Copy: 5.531µs 11 allocs 1632 B.
          mattharden.Copy: 5.653µs 9 allocs 67056 B.
           egonelbre.Copy: 5.432µs 17 allocs 2000 B.
                jnml.Copy: 5.706µs 8 allocs 1712 B.
        augustoroman.Copy: 5.866µs 8 allocs 1504 B.

    Latency benchmarks (GOMAXPROCS = 8):
           [!] bufio.Copy: 4.506µs 541 allocs 35840 B.
          rogerpeppe.Copy: 5.696µs 66 allocs 5376 B.
          mattharden.Copy: 5.742µs 34 allocs 68864 B.
           egonelbre.Copy: 4.564µs 277 allocs 19088 B.
                jnml.Copy: 5.744µs 28 allocs 3216 B.
        augustoroman.Copy: 5.976µs 19 allocs 1168 B.

    Throughput (GOMAXPROCS = 1) (256 MB):

    +-------------------+--------+---------+---------+---------+----------+
    THROUGHPUT | 333 | 4155 | 65359 | 1048559 | 16777301 |
    +-------------------+--------+---------+---------+---------+----------+
    [!] bufio.Copy | 437.80 | 2793.60 | 4667.71 | 4842.16 | 2054.19 |
    rogerpeppe.Copy | 198.44 | 1743.79 | 4397.40 | 4820.92 | 2054.45 |
    mattharden.Copy | 183.49 | 1272.47 | 2201.86 | 2353.29 | 1185.67 |
    egonelbre.Copy | 225.30 | 1737.59 | 4377.38 | 4722.25 | 2027.28 |
    jnml.Copy | 222.91 | 1875.31 | 4433.52 | 4832.32 | 2052.50 |
    augustoroman.Copy | 142.77 | 1365.51 | 4185.75 | 4818.16 | 2052.86 |
    +-------------------+--------+---------+---------+---------+----------+

    +-------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+
    ALLOCS/BYTES | 333 | 4155 |
       65359 | 1048559 | 16777301 |
    +-------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+
    [!] bufio.Copy | ( 10 / 1008) | ( 10 / 5264) | (
      10 / 66192) | ( 10 / 1049232) | ( 10 / 16786064) |
    rogerpeppe.Copy | ( 7 / 976) | ( 6 / 4944) | (
       6 / 65872) | ( 6 / 1048912) | ( 6 / 16785744) |
    mattharden.Copy | ( 9 / 41880) | ( 9 / 46136) | (
       9 / 107064) | ( 9 / 1090104) | ( 9 / 16826936) |
    egonelbre.Copy | ( 12 / 1056) | ( 12 / 5312) | (
      12 / 66240) | ( 12 / 1049280) | ( 12 / 16786112) |
    jnml.Copy | ( 5 / 896) | ( 5 / 5152) | (
       5 / 66080) | ( 5 / 1049120) | ( 5 / 16785952) |
    augustoroman.Copy | ( 5 / 688) | ( 5 / 4944) | (
       5 / 65872) | ( 5 / 1048912) | ( 5 / 16785744) |
    +-------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+

    Throughput (GOMAXPROCS = 8) (256 MB):

    +-------------------+--------+---------+---------+---------+----------+
    THROUGHPUT | 333 | 4155 | 65359 | 1048559 | 16777301 |
    +-------------------+--------+---------+---------+---------+----------+
    [!] bufio.Copy | 409.02 | 2927.20 | 4395.27 | 4481.92 | 2043.71 |
    rogerpeppe.Copy | 195.11 | 1706.20 | 3779.57 | 4523.37 | 2044.33 |
    mattharden.Copy | 177.87 | 1236.07 | 2109.07 | 2009.68 | 1143.01 |
    egonelbre.Copy | 335.69 | 2283.42 | 3854.51 | 4338.55 | 1896.28 |
    jnml.Copy | 211.20 | 1825.58 | 3983.32 | 4549.05 | 2044.82 |
    augustoroman.Copy | 139.52 | 1336.84 | 3337.52 | 4463.26 | 2041.43 |
    +-------------------+--------+---------+---------+---------+----------+

    +-------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+
    ALLOCS/BYTES | 333 | 4155 |
       65359 | 1048559 | 16777301 |
    +-------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+
    [!] bufio.Copy | ( 31 / 2352) | ( 12 / 5616) | (
      13 / 66384) | ( 26 / 1050256) | ( 10 / 16786064) |
    rogerpeppe.Copy | ( 6 / 688) | ( 6 / 4944) | (
       6 / 65872) | ( 8 / 1050224) | ( 7 / 16786032) |
    mattharden.Copy | ( 10 / 43160) | ( 11 / 47480) | (
      10 / 247560) | ( 10 / 3188488) | ( 10 / 50423560) |
    egonelbre.Copy | ( 12 / 1056) | ( 63 / 8800) | (
      13 / 66528) | ( 12 / 1049280) | ( 16 / 16786368) |
    jnml.Copy | ( 5 / 896) | ( 5 / 5152) | (
       5 / 66080) | ( 5 / 1049120) | ( 6 / 16786016) |
    augustoroman.Copy | ( 9 / 944) | ( 5 / 4944) | (
       5 / 65872) | ( 5 / 1048912) | ( 5 / 16785744) |
    +-------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+
    On Sat, Jan 31, 2015 at 7:25 PM, Péter Szilágyi wrote:

    Hi all,

    @Jan: Merged and the new solution indeed passes the shootout. With the
    current version, your implementation too is in the same ballpark as the
    rest. I think we're starting to converge on the achievable performance from
    all implementations.

    My solution as of lately seems to beat others significantly for smaller
    buffer sizes. This is due to a neat optimization
    <https://github.com/karalabe/bufioprop/blob/master/pipe.go#L126> of doing
    a short spin-lock before going down to deep sleep if no data/space is
    available in the internal buffers. The idea is that it should be very
    short, not to take a toll on the performance if no data is coming, but it
    should be long enough that if data *is* streamed, then it doesn't need to
    sync it. Jan, I guess this isn't something you could try as you're
    completely channel based, but give it a thought (if it's doable, it might
    bring up your performance on small buffers). The others should maybe play
    around with the idea, I think I saw it in Egon's code a while back but
    haven't checked lately.

    Jan's previous optimization was actually a really really good observation
    that the reason performance goes down with large buffers is because you are
    missing the CPU cache and need to go through main memory, essentially
    limiting your performance by that. His solution was to try and reuse hot
    parts of the buffer that can probably still be found in L1/2 caches, but it
    didn't pan out correctly (see the previous longish description for
    details). Nonetheless the observation is a good one, so it *could* be
    worthwhile to try an implement this hot cache reuse. I am thinking in a
    solution that would split the buffer up the same way Jan did previously,
    but keep writing to one cache-line/chunk/piece until it's full and only
    then start the next. The issue is that synchronization can get really messy.

    I guess the last algorithmic challenge in this proposal would be to figure
    out if the buffer can be kept hot. If yes, great, if not, we could proceed
    to finalizing the API around the buffered copy/pipe.

    Cheers,
    Peter

    PS: Jury's still out on why I get hit by memory allocs *always* at the
    same tests, never others.

    Latency benchmarks (GOMAXPROCS = 1):
    [!] bufio.Copy: 4.596µs 23 allocs 2288 B.
    rogerpeppe.Copy: 4.883µs 21 allocs 2096 B.
    egonelbre.Copy: 4.77µs 29 allocs 2368 B.
    jnml.Copy: 5.041µs 19 allocs 2144 B.
    augustoroman.Copy: 5.229µs 16 allocs 1840 B.

    Latency benchmarks (GOMAXPROCS = 8):
    [!] bufio.Copy: 4.597µs 481 allocs 30736 B.
    rogerpeppe.Copy: 4.916µs 348 allocs 22224 B.
    egonelbre.Copy: 4.786µs 398 allocs 27408 B.
    jnml.Copy: 4.92µs 343 allocs 21904 B.
    augustoroman.Copy: 5.208µs 162 allocs 11184 B.

    Throughput (GOMAXPROCS = 1) (256 MB):

    +-------------------+--------+---------+---------+---------+----------+
    THROUGHPUT | 333 | 4155 | 65359 | 1048559 | 16777301 |
    +-------------------+--------+---------+---------+---------+----------+
    [!] bufio.Copy | 470.22 | 2950.34 | 4738.61 | 4924.87 | 2001.32 |
    rogerpeppe.Copy | 216.22 | 1856.43 | 4462.28 | 4911.21 | 2000.13 |
    egonelbre.Copy | 253.42 | 1852.45 | 4479.33 | 4788.88 | 1982.38 |
    jnml.Copy | 231.71 | 1947.09 | 4503.48 | 4907.54 | 2008.88 |
    augustoroman.Copy | 158.89 | 1479.22 | 4299.65 | 4887.31 | 2008.11 |
    +-------------------+--------+---------+---------+---------+----------+


    +-------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+
    ALLOCS/BYTES | 333 | 4155 |
    65359 | 1048559 | 16777301 |

    +-------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+
    [!] bufio.Copy | ( 13 / 1024) | ( 13 / 5280) | (
    13 / 66208) | ( 13 / 1049248) | ( 13 / 16786080) |
    rogerpeppe.Copy | ( 12 / 896) | ( 12 / 5152) | (
    12 / 66080) | ( 12 / 1049120) | ( 12 / 16785952) |
    egonelbre.Copy | ( 21 / 1248) | ( 21 / 5504) | (
    21 / 66432) | ( 21 / 1049472) | ( 21 / 16786304) |
    jnml.Copy | ( 12 / 1088) | ( 12 / 5344) | (
    12 / 66272) | ( 12 / 1049312) | ( 12 / 16786144) |
    augustoroman.Copy | ( 12 / 960) | ( 12 / 5216) | (
    12 / 66144) | ( 12 / 1049184) | ( 12 / 16786016) |

    +-------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+

    Throughput (GOMAXPROCS = 8) (256 MB):

    +-------------------+--------+---------+---------+---------+----------+
    THROUGHPUT | 333 | 4155 | 65359 | 1048559 | 16777301 |
    +-------------------+--------+---------+---------+---------+----------+
    [!] bufio.Copy | 490.45 | 2934.83 | 4543.90 | 4590.13 | 1994.79 |
    rogerpeppe.Copy | 215.50 | 1721.20 | 3904.96 | 4596.47 | 2000.28 |
    egonelbre.Copy | 344.52 | 2467.48 | 3907.59 | 4348.90 | 1850.96 |
    jnml.Copy | 239.27 | 897.75 | 4029.99 | 4625.88 | 1996.11 |
    augustoroman.Copy | 152.99 | 1415.63 | 3581.26 | 4587.91 | 1999.95 |
    +-------------------+--------+---------+---------+---------+----------+


    +-------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+
    ALLOCS/BYTES | 333 | 4155 |
    65359 | 1048559 | 16777301 |

    +-------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+
    [!] bufio.Copy | ( 143 / 9344) | ( 38 / 6880) | (
    91 / 71200) | ( 781 / 1098400) | ( 61 / 16789152) |
    rogerpeppe.Copy | ( 13 / 960) | ( 14 / 5504) | (
    14 / 66432) | ( 14 / 1049472) | ( 14 / 16786304) |
    egonelbre.Copy | ( 97 / 6560) | ( 48 / 7456) | (
    49 / 68672) | ( 30 / 1050272) | ( 31 / 16786944) |
    jnml.Copy | ( 12 / 1088) | ( 13 / 5632) | (
    13 / 66560) | ( 12 / 1049312) | ( 12 / 16786144) |
    augustoroman.Copy | ( 25 / 2016) | ( 14 / 5568) | (
    13 / 66432) | ( 12 / 1049184) | ( 12 / 16786016) |

    +-------------------+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+

    On Sat, Jan 31, 2015 at 6:33 PM, Jan Mercl wrote:
    On Sat Jan 31 2015 at 10:05:34 Péter Szilágyi wrote:

    I've just shot out your code as *not solving* the problem again :P
    Please pull[0], thank you.

    [0]: https://github.com/karalabe/bufioprop/pull/15

    -j
    --
    You received this message because you are subscribed to the Google Groups "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/d/optout.
  • Nick Craig-Wood at Jan 30, 2015 at 10:26 am

    On 30/01/15 09:55, Péter Szilágyi wrote:
    You are both right and wrong. Your solution specifically got me to add
    the latency benchmark. The deadlock is caused by readfull in combination
    with the latency tester not willing to send anything until the previous
    byte goes through. Ah OK!
    We could argue whether or not this is a desirable thing to have. Imho a
    copy should not block indefinitely waiting for new data and preventing
    handling over buffered data to the writer. As we cannot include a manual
    flush into the operation, the copy just has to figure it out. But this
    is my opinion, so I'm open for debate :)
    I see what you mean.

    I think it is a desirable thing to have, but I'm worried about the fact
    that this makes for hidden parameters or heuristics in the algorithms
    which aren't clearly defined.

    I'd say if you don't want Copy() to hang onto data indefinitely then
    there should be a latency time.Duration parameter on it. Defined as
    something like - make sure that data from src is sent to dst within this
    time duration.

    I don't think I can easily fix my solution to work within this
    constraint. Turning the ReadFull into a Read works, but then it wrecks
    the performance...

    --
    Nick Craig-Wood <nick@craig-wood.com> -- http://www.craig-wood.com/nick

    --
    You received this message because you are subscribed to the Google Groups "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/d/optout.

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupgolang-nuts @
categoriesgo
postedJan 29, '15 at 11:01a
activeFeb 3, '15 at 11:21a
posts50
users8
websitegolang.org

People

Translate

site design / logo © 2021 Grokbase