This is the formalization of a previous thread
<https://groups.google.com/forum/#!topic/golang-nuts/434c3YInH_M> in which
I'll try to make a case for including an additional copy primitive into the
standard libs. The reason I propose its inclusion is because it is
non-trivial to write, and apparently - based on the previous discussion -,
people find it really hard to realize that/why simpler solutions do not
work as intended.
*Problem to solve:*
Go makes it very easy to connect readers with writers and copy/pipe data
from one endpoint to the other. In an average case where the reader can
produce data in a similar throughput and stability as the writer, this
solution works as intended.
However, if the reader produces data in bursts, or the writer consumes data
in bursts, the existing solutions break down, because the whole pipeline is
synchronous: if the writer cannot accept data for a while, the reader won't
fetch, even if there's available. Similarly if the reader produces in
periodic bursts, but the writer cannot consume immediately, then the reader
stalls, even though it might take it a long time to deliver the next burst.
Two concrete examples where this scenario appeared:
- *Streaming a file from one remote location to another.*
The reader/downloader can be considered fairly stable, producing data at
a constant rate. The writer/uploader however works in bursts, as if first
collects a larger chunk of data and only uploads afterwards (e.g. gsutils
collects ~100MB worth of data). The issue is, that after the uploader's
buffer fills up, it blocks further writes until it finishes processing. But
if the writer is blocked, the reader will block too (nowhere to put the
data), and the download stalls.
- *Generating a data block and uploading to a remote location.*
If we have a non-streaming data generator (e.g. compress a file in a
non-streaming way), then it will most probably generate large chunks of
data periodically. Compared to this, the uploader may be considered stable.
However, while the uploader is stably but slowly pushing the data, the
generator stops potentially long running tasks, since the writer is not
accepting new data.
*Why existing constructs don't work:*
Per the previous thread, people readily jump to buffered readers/writers
and pipes, thinking that they must work, but they don't realize why they
- *Buffered readers are synchronous and blocking.*
The goal of a buffered reader is to provide peeking capabilities for an
input stream, and to store some data that arrived but we haven't consumed.
The issue is, that if there is something already in the buffer, newly
arriving data won't be pushed in until the existing data is consumed (even
if the buffer is mostly empty). This means that a buffered reader will
still stall reading, even though it has the capacity to accept it.
- *Buffered writers are synchronous and blocking.*
The goal of a buffered writer is to save potentially expensive write
operations by accumulating arriving data, and only forwarding it when
enough was collected. However, the moment the buffer is full and/or a flush
is executed, the writer is completely blocked until the data in its
entirety can be transferred. But this means, that anything that streams
data into a writer immediately stalls.
- *Pipes are non-buffered*
Piping an input stream into an output stream will not work, as neither
io.Pipe nor os.Pipe uses buffered pipes (yes, they do use 64K buffers, but
that is not nearly enough to cover these issues, and they are not
modifyable), so a pipe doesn't really do anything more that a simple copy.
All in all, in order to handle bursty readers and/or writers, a buffer
needs to be placed *in between* the differing read and writer operations,
not before or after.
My proposed solution to these problems is the introduction of a specialized
copy operation into the bufio package, that would run on two separate
threads: one consuming data from the reader and feeding it into an internal
buffer of configurable size; the other one consuming data from the internal
buffer and feeding it to the writer endpoint. The goal of this mechanism is
to completely isolate the reader and writer threads, allowing each to
execute even if the other one is temporarily blocked.
The signature of the copy would be analogous to the io.Copy, just with the
configurable buffer size added:
bufio.Copy(dst io.Writer, src io.Reader, buffer int) (written int64, err
Internally the operation would be based on a single circular buffer with
both reader and writer threads using atomic operations for data handling,
only resorting to channels when the buffer is full or empty (since then one
thread must obviously block). The solution would also not require any
memory allocations beyond the initial buffer setup, making arbitrarily long
running copy operations GC friendly/free.
I've written up an implementation for the above mentioned Copy operation.
In both algorithmic construction and naming conventions (internal
variables) follows the io.Copy implementation, however consider it a
starting point for further refinements.
The implementation and some fairly trivial tests have been included here:
$ go get github.com/karalabe/bufioprop
Furthermore, to prove my point that existing constructs and even other
simple-looking solutions don't work as intended, I've written a small
*shootout* code simulating three copy scenarios:
- Classical streaming copy where the source and sink have a similar
throughput and production/consumption style.
- Stable source producing data at a constant rate, but a bursty sink,
accepting big batches periodically. The overall throughput of the two
endpoints are the same, only the production/consumption cycles and data
chunks are different.
- Bursty source producing big data chunks in rare occasions, and a
stable sink consuming data at a constant rate. Again, the overall
throughput of the two endpoints are the same, only the
production/consumption cycles and data chunks are different.
You can run these tests via:
$ go get github.com/karalabe/bufioprop/shootout
Stable input, stable output:
io.Copy: 3.38504052s 10.666667 mbps.
[!] bufio.Copy: 3.37012021s 10.666667 mbps.
rogerpeppe.Copy: 3.414476536s 10.666667 mbps.
mattharden.Copy: 6.368713887s 5.333333 mbps.
Stable input, bursty output:
io.Copy: 6.251177787s 5.333333 mbps.
[!] bufio.Copy: 3.387935437s 10.666667 mbps.
rogerpeppe.Copy: 5.98428305s 6.400000 mbps.
mattharden.Copy: 6.250739081s 5.333333 mbps.
Bursty input, stable output:
io.Copy: 6.25889809s 5.333333 mbps.
[!] bufio.Copy: 3.347354357s 10.666667 mbps.
rogerpeppe.Copy: 5.999921216s 6.400000 mbps.
mattharden.Copy: 3.473998412s 10.666667 mbps.
To add your own challenger code, simple create a new package inside the
shootout folder, write a Copy with the above proposed signature and insert
it into the shootout.go contenders
A further solution was proposed by Jan Mercl
which if I understood correctly entailed reading chunks of data on one
thread, and passing those chunks through a channel to a writer thread.
Although this indeed works, the disadvantages compared to my proposal are:
- If data chunks are placed into the channel immediately after being
read, then it's hard to control the total buffer size, as the reads may be
of arbitrary length (and the channel limits chunk counts, not sizes).
- If data chunks are first accumulated into larger pieces, and only
queued for writing afterwards, then there will be a delay between incoming
data and outgoing data, even though it's available already. Bigger issue
still if the larder piece is not yet full but no more data arrives for a
while, essentially stalling the writer for nothing.
- All read/write operations need syncing through a channel, potentially
having a minor performance hit (debatable if it's a big enough problem).
- There are hidden costs in memory allocations or buffer reuses, based
on the internal implementation.
These issues could probably be solved one way or another, but my point
still stands, that a proper solution is non-trivial.
I would welcome constructive feedback, and also aiming at the core
developers, please give it a deeper thought as to whether this would be
worthwhile to get into the libs.
PS: I invite anyone to propose other solutions (maybe simpler ones, maybe I
missed something in the libs that would make this trivial). *However*, the
reason I've spent so much time on preparing a go gettable implementation
and an associated simulator/shootout is because the underlying issue is not
trivial. Please verify that your solution indeed passes the shootout before
dismissing the need for this proposal.
You received this message because you are subscribed to the Google Groups "golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to email@example.com.
For more options, visit https://groups.google.com/d/optout.