FAQ
Hi,

   I am seeing an interesting race report in my code. Specifically, in a
complex interaction with a WaitGroup, the race detector signals quite a lot
of races for Add/Wait or Done/Wait.

   I haven't managed to reproduce a simple example where this happens, but
my code on github does indeed display this quite often, so I guess that
will do.

   You can fetch the code via

$ go get github.com/karalabe/iris

   And inside the proto/pastry package run the TestMaintenance test with the
following options. The race isn't reported in every run, but usually every
second one does introduce it.

$ cd $GOPATH/github.com/karalabe/iris/proto/pastry
$ GOMAXPROCS=32 go test -cpu=32 -race -v -run=TestMaintenance

=== RUN TestMaintenance-32
==================
WARNING: DATA RACE
Read by goroutine 186:
   sync.raceRead()
       /opt/google/go/src/pkg/sync/race.go:37 +0x35
   sync.(*WaitGroup).Add()
       /opt/google/go/src/pkg/sync/waitgroup.go:60 +0xd0
   github.com/karalabe/iris/proto/pastry.(*Overlay).exch()
       /home/karalabe/work/iris/src/
github.com/karalabe/iris/proto/pastry/maintenance.go:197 +0x50
   github.com/karalabe/iris/proto/pastry.(*Overlay).process()
       /home/karalabe/work/iris/src/
github.com/karalabe/iris/proto/pastry/routing.go:198 +0xe54
   github.com/karalabe/iris/proto/pastry.(*Overlay).deliver()
       /home/karalabe/work/iris/src/
github.com/karalabe/iris/proto/pastry/routing.go:96 +0xac
   github.com/karalabe/iris/proto/pastry.(*Overlay).route()
       /home/karalabe/work/iris/src/
github.com/karalabe/iris/proto/pastry/routing.go:58 +0x4e0
   github.com/karalabe/iris/proto/pastry.(*peer).processor()
       /home/karalabe/work/iris/src/
github.com/karalabe/iris/proto/pastry/peer.go:158 +0x442

Previous write by goroutine 122:
   sync.raceWrite()
       /opt/google/go/src/pkg/sync/race.go:41 +0x35
   sync.(*WaitGroup).Wait()
       /opt/google/go/src/pkg/sync/waitgroup.go:120 +0x176
   github.com/karalabe/iris/proto/pastry.(*Overlay).manager()
       /home/karalabe/work/iris/src/
github.com/karalabe/iris/proto/pastry/maintenance.go:84 +0x437

Goroutine 186 (running) created at:
   github.com/karalabe/iris/proto/pastry.(*peer).Start()
       /home/karalabe/work/iris/src/
github.com/karalabe/iris/proto/pastry/peer.go:83 +0x3d
   github.com/karalabe/iris/proto/pastry.(*Overlay).dedup()
       /home/karalabe/work/iris/src/
github.com/karalabe/iris/proto/pastry/handshake.go:240 +0x8d
   github.com/karalabe/iris/proto/pastry.(*Overlay).shake()
       /home/karalabe/work/iris/src/
github.com/karalabe/iris/proto/pastry/handshake.go:221 +0x7df
   github.com/karalabe/iris/proto/pastry.(*Overlay).dial()
       /home/karalabe/work/iris/src/
github.com/karalabe/iris/proto/pastry/handshake.go:172 +0x3e2
   github.com/karalabe/iris/proto/pastry.func·004()<http://github.com/karalabe/iris/proto/pastry.func%C2%B7004()>
       /home/karalabe/work/iris/src/
github.com/karalabe/iris/proto/pastry/maintenance.go:133 +0xa3
   github.com/karalabe/iris/pool.(*ThreadPool).runner()
       /home/karalabe/work/iris/src/
github.com/karalabe/iris/pool/thread.go:140 +0xa7

Goroutine 122 (running) created at:
   github.com/karalabe/iris/proto/pastry.(*Overlay).Boot()
       /home/karalabe/work/iris/src/
github.com/karalabe/iris/proto/pastry/overlay.go:148 +0x3d0
   github.com/karalabe/iris/proto/pastry.TestMaintenance()
       /home/karalabe/work/iris/src/
github.com/karalabe/iris/proto/pastry/maintenance_test.go:134 +0xa5a
   testing.tRunner()
       /opt/google/go/src/pkg/testing/testing.go:422 +0x130
==================
--- PASS: TestMaintenance-32 (4.76 seconds)
PASS
Found 1 data race(s)
exit status 66
FAIL github.com/karalabe/iris/proto/pastry 5.768s

   I'm guessing this is a race detector bug, but maybe someone more intimate
with the detector could verify it :)

Cheers,
   Peter

--
You received this message because you are subscribed to the Google Groups "golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Search Discussions

  • Rob Pike at Mar 25, 2014 at 9:43 am
    From the stack traces alone, this looks like a bug that showed up
    today in some Google code. The issue is that there is no
    happens-before relation between Waitgroup.Add and Waitgroup.Wait, so
    your code does an Add after a Wait, the race detector will tell you.
    You need to establish a synchronization to guarantee the Add does not
    pass the Wait.

    -rob

    --
    You received this message because you are subscribed to the Google Groups "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/d/optout.
  • Dmitry Vyukov at Mar 25, 2014 at 9:47 am

    On Tue, Mar 25, 2014 at 1:43 PM, Rob Pike wrote:
    From the stack traces alone, this looks like a bug that showed up
    today in some Google code. The issue is that there is no
    happens-before relation between Waitgroup.Add and Waitgroup.Wait, so
    your code does an Add after a Wait, the race detector will tell you.
    You need to establish a synchronization to guarantee the Add does not
    pass the Wait.

    Yes, this is the situation these reads and writes in WaitGroup are
    intended to detect.
    Peter, please check that you call wg.Wait strictly after all "root" wg.Add's.

    --
    You received this message because you are subscribed to the Google Groups "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/d/optout.
  • Péter Szilágyi at Mar 25, 2014 at 10:26 am
    Sorry, forgot to reply to the mailing list too.

    ---------- Forwarded message ----------
    From: Péter Szilágyi <peterke@gmail.com>
    Date: Tue, Mar 25, 2014 at 12:25 PM
    Subject: Re: [go-nuts] Data race reported in the race detection code within
    a waitgroup
    To: Dmitry Vyukov <dvyukov@google.com>


    Hi all,

       I can understand the logic in detecting such cases, as usually that could
    signal a race, but my use case actually is functioning as intended and uses
    Add/Done/Wait quite interleaved.

       Specifically, I'm maintaining a routing table which gets updated every
    time some remote connections or state updates arrive. Since my updates are
    quite expensive (they fire network state exchanges), I first collect all
    received updates into a temporary map and merge that into my routing table
    after no more is being received. Currently that is implemented by having a
    WaitGroup to count the number of threads waiting to add their updates into
    the temp collector. When all processes finish adding, a manager process
    swaps out the temporary map and merges them into the routing table. The
    manager then loops, waiting for new updates. I guess the race detector is
    bothered by this reuse of a waitgroup. But imho, this shouldn't be a
    problem.

       Generically, I have a stream of events inbound from arbitrarily many
    goroutines, and I want to batch them together and process them only when no
    more is arriving (i.e. during a silent period). Maybe a WaitGroup isn't the
    best solution, but what would be more appropriate?

        - Channels aren't appropriate because they are bounded, and I want my
        event reporting routines to quickly continue (i.e. async) and not stall
        until the manager actually consumes them (buffered channels can fill up and
        then we're back to blocking).
        - A collector map and mutex combo can handle the batching, but then my
        manager doesn't have any mechanism to wait for all pending updates to reach
        the collector.

       The perfect solution for my use case would be an unbounded channel. But
    since I don't have such a thing, I'm open to other design suggestions.
    WaitGroups solve the issue. Is my use case abusing it?

    Cheers,
       Peter


    On Tue, Mar 25, 2014 at 11:46 AM, Dmitry Vyukov wrote:
    On Tue, Mar 25, 2014 at 1:43 PM, Rob Pike wrote:
    From the stack traces alone, this looks like a bug that showed up
    today in some Google code. The issue is that there is no
    happens-before relation between Waitgroup.Add and Waitgroup.Wait, so
    your code does an Add after a Wait, the race detector will tell you.
    You need to establish a synchronization to guarantee the Add does not
    pass the Wait.

    Yes, this is the situation these reads and writes in WaitGroup are
    intended to detect.
    Peter, please check that you call wg.Wait strictly after all "root"
    wg.Add's.
    --
    You received this message because you are subscribed to the Google Groups "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/d/optout.
  • Péter Szilágyi at Mar 25, 2014 at 11:05 am
    After pondering about it a bit, I could circumvent a waitgroup by
    effectively simulating it with atomic ops and a channel, but it feels odd
    to go down to atomic level to implement waiting for some operations to
    finish (pseudo code, not tested or even compiled):

    var signal chan struct{}
    var pend int32
    var lock sync.Mutex
    var coll []*Event

    func batch(e *Event) {
       // Increment a pending counter
       atomic.AddInt32(&pend, 1)

       // Store the event into a temp collector
       lock.Lock()
       coll = append(coll, e)
       lock.Unlock()

       // Decrement the pending counter and check if silent period
       if atomic.AddInt32(&pend, -1) == 0 {
         // Silent period, nobody else wants in (maybe just temporary, we don't
    care)
         select {
           case signal <- struct{}{}:
             // Ok, processor notified
           default:
             // Somebody already notified it, return
         }
       }
    }

    func processor() {
       var temp []*Event

       for {
         // Wait for a (potential) silent period
         <- signal

         // Swap out collector
         lock.Lock()
         temp, coll = coll, []*Event{}
         lock.Unlock()

         // Process batched events
         // ...
       }
    }

    Note, I don't want a synchronization guarantee that pend is strictly zero
    when my processing is run. I just want an ability to optimize system state
    exchanges by batching events.

    On Tue, Mar 25, 2014 at 12:26 PM, Péter Szilágyi wrote:

    Sorry, forgot to reply to the mailing list too.


    ---------- Forwarded message ----------
    From: Péter Szilágyi <peterke@gmail.com>
    Date: Tue, Mar 25, 2014 at 12:25 PM
    Subject: Re: [go-nuts] Data race reported in the race detection code
    within a waitgroup
    To: Dmitry Vyukov <dvyukov@google.com>


    Hi all,

    I can understand the logic in detecting such cases, as usually that
    could signal a race, but my use case actually is functioning as intended
    and uses Add/Done/Wait quite interleaved.

    Specifically, I'm maintaining a routing table which gets updated every
    time some remote connections or state updates arrive. Since my updates are
    quite expensive (they fire network state exchanges), I first collect all
    received updates into a temporary map and merge that into my routing table
    after no more is being received. Currently that is implemented by having a
    WaitGroup to count the number of threads waiting to add their updates into
    the temp collector. When all processes finish adding, a manager process
    swaps out the temporary map and merges them into the routing table. The
    manager then loops, waiting for new updates. I guess the race detector is
    bothered by this reuse of a waitgroup. But imho, this shouldn't be a
    problem.

    Generically, I have a stream of events inbound from arbitrarily many
    goroutines, and I want to batch them together and process them only when no
    more is arriving (i.e. during a silent period). Maybe a WaitGroup isn't the
    best solution, but what would be more appropriate?

    - Channels aren't appropriate because they are bounded, and I want my
    event reporting routines to quickly continue (i.e. async) and not stall
    until the manager actually consumes them (buffered channels can fill up and
    then we're back to blocking).
    - A collector map and mutex combo can handle the batching, but then my
    manager doesn't have any mechanism to wait for all pending updates to reach
    the collector.

    The perfect solution for my use case would be an unbounded channel. But
    since I don't have such a thing, I'm open to other design suggestions.
    WaitGroups solve the issue. Is my use case abusing it?

    Cheers,
    Peter


    On Tue, Mar 25, 2014 at 11:46 AM, Dmitry Vyukov wrote:
    On Tue, Mar 25, 2014 at 1:43 PM, Rob Pike wrote:
    From the stack traces alone, this looks like a bug that showed up
    today in some Google code. The issue is that there is no
    happens-before relation between Waitgroup.Add and Waitgroup.Wait, so
    your code does an Add after a Wait, the race detector will tell you.
    You need to establish a synchronization to guarantee the Add does not
    pass the Wait.

    Yes, this is the situation these reads and writes in WaitGroup are
    intended to detect.
    Peter, please check that you call wg.Wait strictly after all "root"
    wg.Add's.
    --
    You received this message because you are subscribed to the Google Groups "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/d/optout.
  • Dmitry Vyukov at Mar 25, 2014 at 11:15 am
    pend buys you nothing in this case
    just

    func batch(e *Event) {
       lock.Lock()
       coll = append(coll, e)
       lock.Unlock()

         select {
           case signal <- struct{}{}:
             // Ok, processor notified
           default:
             // Somebody already notified it, return
         }
    }

    func processor() {
       var temp []*Event

       for _ = range signal {
         // Swap out collector
         lock.Lock()
         temp, coll = coll, []*Event{}
         lock.Unlock()

         // Process batched events
         // ...
       }
    }


    sufficiently buffered channel should do as well:

    updates := make(chan *Event, 1000)

    func batch(e *Event) {
       updates <- e
    }

    func processor() {
       for {
         var temp []*Event
         temp = append(temp, <-update)
         deadline := time.After(time.Second)
         collectLoop: for {
           select {
           case e := <-updates:
             temp = append(temp, e)
           case <-deadline:
             break collectLoop
           }
         }
         // Process temp
         // ...
       }
    }


    this will give you much much better guarantess wrt batching

    On Tue, Mar 25, 2014 at 3:04 PM, Péter Szilágyi wrote:
    After pondering about it a bit, I could circumvent a waitgroup by
    effectively simulating it with atomic ops and a channel, but it feels odd to
    go down to atomic level to implement waiting for some operations to finish
    (pseudo code, not tested or even compiled):

    var signal chan struct{}
    var pend int32
    var lock sync.Mutex
    var coll []*Event

    func batch(e *Event) {
    // Increment a pending counter
    atomic.AddInt32(&pend, 1)

    // Store the event into a temp collector
    lock.Lock()
    coll = append(coll, e)
    lock.Unlock()

    // Decrement the pending counter and check if silent period
    if atomic.AddInt32(&pend, -1) == 0 {
    // Silent period, nobody else wants in (maybe just temporary, we don't
    care)
    select {
    case signal <- struct{}{}:
    // Ok, processor notified
    default:
    // Somebody already notified it, return
    }
    }
    }

    func processor() {
    var temp []*Event

    for {
    // Wait for a (potential) silent period
    <- signal

    // Swap out collector
    lock.Lock()
    temp, coll = coll, []*Event{}
    lock.Unlock()

    // Process batched events
    // ...
    }
    }

    Note, I don't want a synchronization guarantee that pend is strictly zero
    when my processing is run. I just want an ability to optimize system state
    exchanges by batching events.

    On Tue, Mar 25, 2014 at 12:26 PM, Péter Szilágyi wrote:

    Sorry, forgot to reply to the mailing list too.


    ---------- Forwarded message ----------
    From: Péter Szilágyi <peterke@gmail.com>
    Date: Tue, Mar 25, 2014 at 12:25 PM
    Subject: Re: [go-nuts] Data race reported in the race detection code
    within a waitgroup
    To: Dmitry Vyukov <dvyukov@google.com>


    Hi all,

    I can understand the logic in detecting such cases, as usually that
    could signal a race, but my use case actually is functioning as intended and
    uses Add/Done/Wait quite interleaved.

    Specifically, I'm maintaining a routing table which gets updated every
    time some remote connections or state updates arrive. Since my updates are
    quite expensive (they fire network state exchanges), I first collect all
    received updates into a temporary map and merge that into my routing table
    after no more is being received. Currently that is implemented by having a
    WaitGroup to count the number of threads waiting to add their updates into
    the temp collector. When all processes finish adding, a manager process
    swaps out the temporary map and merges them into the routing table. The
    manager then loops, waiting for new updates. I guess the race detector is
    bothered by this reuse of a waitgroup. But imho, this shouldn't be a
    problem.

    Generically, I have a stream of events inbound from arbitrarily many
    goroutines, and I want to batch them together and process them only when no
    more is arriving (i.e. during a silent period). Maybe a WaitGroup isn't the
    best solution, but what would be more appropriate?

    Channels aren't appropriate because they are bounded, and I want my event
    reporting routines to quickly continue (i.e. async) and not stall until the
    manager actually consumes them (buffered channels can fill up and then we're
    back to blocking).
    A collector map and mutex combo can handle the batching, but then my
    manager doesn't have any mechanism to wait for all pending updates to reach
    the collector.

    The perfect solution for my use case would be an unbounded channel. But
    since I don't have such a thing, I'm open to other design suggestions.
    WaitGroups solve the issue. Is my use case abusing it?

    Cheers,
    Peter



    On Tue, Mar 25, 2014 at 11:46 AM, Dmitry Vyukov <dvyukov@google.com>
    wrote:
    On Tue, Mar 25, 2014 at 1:43 PM, Rob Pike wrote:
    From the stack traces alone, this looks like a bug that showed up
    today in some Google code. The issue is that there is no
    happens-before relation between Waitgroup.Add and Waitgroup.Wait, so
    your code does an Add after a Wait, the race detector will tell you.
    You need to establish a synchronization to guarantee the Add does not
    pass the Wait.

    Yes, this is the situation these reads and writes in WaitGroup are
    intended to detect.
    Peter, please check that you call wg.Wait strictly after all "root"
    wg.Add's.
    --
    You received this message because you are subscribed to the Google Groups
    "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send an
    email to golang-nuts+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/d/optout.
    --
    You received this message because you are subscribed to the Google Groups "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/d/optout.
  • Dmitry Vyukov at Mar 25, 2014 at 11:05 am
    I do not see how this can possibly work:

    // Inserts a state exchange into the exchange queue
    func (o *Overlay) exch(p *peer, s *state) {
    // Insert the state exchange
    o.eventPend.Add(1)
    o.eventLock.Lock()
    o.exchSet[p] = s
    o.eventLock.Unlock()
    o.eventPend.Done()


    You do Add and Done at the same instant in time.
    If o.eventPend.Wait is called before o.exch, then there is indeed a
    race -- Wait won't wait for Add.
    If o.eventPend.Wait is called after o.exch, then Add/Done are
    senseless -- the sum of Add/Done is zero.
    If o.eventPend.Wait is called concurrently with o.exch, then you have
    a mix of these 2 bad outcomes.



    On Tue, Mar 25, 2014 at 2:26 PM, Péter Szilágyi wrote:
    Sorry, forgot to reply to the mailing list too.


    ---------- Forwarded message ----------
    From: Péter Szilágyi <peterke@gmail.com>
    Date: Tue, Mar 25, 2014 at 12:25 PM
    Subject: Re: [go-nuts] Data race reported in the race detection code within
    a waitgroup
    To: Dmitry Vyukov <dvyukov@google.com>


    Hi all,

    I can understand the logic in detecting such cases, as usually that could
    signal a race, but my use case actually is functioning as intended and uses
    Add/Done/Wait quite interleaved.

    Specifically, I'm maintaining a routing table which gets updated every
    time some remote connections or state updates arrive. Since my updates are
    quite expensive (they fire network state exchanges), I first collect all
    received updates into a temporary map and merge that into my routing table
    after no more is being received. Currently that is implemented by having a
    WaitGroup to count the number of threads waiting to add their updates into
    the temp collector. When all processes finish adding, a manager process
    swaps out the temporary map and merges them into the routing table. The
    manager then loops, waiting for new updates. I guess the race detector is
    bothered by this reuse of a waitgroup. But imho, this shouldn't be a
    problem.

    Generically, I have a stream of events inbound from arbitrarily many
    goroutines, and I want to batch them together and process them only when no
    more is arriving (i.e. during a silent period). Maybe a WaitGroup isn't the
    best solution, but what would be more appropriate?

    Channels aren't appropriate because they are bounded, and I want my event
    reporting routines to quickly continue (i.e. async) and not stall until the
    manager actually consumes them (buffered channels can fill up and then we're
    back to blocking).
    A collector map and mutex combo can handle the batching, but then my manager
    doesn't have any mechanism to wait for all pending updates to reach the
    collector.

    The perfect solution for my use case would be an unbounded channel. But
    since I don't have such a thing, I'm open to other design suggestions.
    WaitGroups solve the issue. Is my use case abusing it?

    Cheers,
    Peter


    On Tue, Mar 25, 2014 at 11:46 AM, Dmitry Vyukov wrote:
    On Tue, Mar 25, 2014 at 1:43 PM, Rob Pike wrote:
    From the stack traces alone, this looks like a bug that showed up
    today in some Google code. The issue is that there is no
    happens-before relation between Waitgroup.Add and Waitgroup.Wait, so
    your code does an Add after a Wait, the race detector will tell you.
    You need to establish a synchronization to guarantee the Add does not
    pass the Wait.

    Yes, this is the situation these reads and writes in WaitGroup are
    intended to detect.
    Peter, please check that you call wg.Wait strictly after all "root"
    wg.Add's.


    --
    You received this message because you are subscribed to the Google Groups
    "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send an
    email to golang-nuts+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/d/optout.
    --
    You received this message because you are subscribed to the Google Groups "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/d/optout.
  • Péter Szilágyi at Mar 25, 2014 at 11:11 am

    On Tue, Mar 25, 2014 at 1:03 PM, Dmitry Vyukov wrote:

    I do not see how this can possibly work:

    // Inserts a state exchange into the exchange queue
    func (o *Overlay) exch(p *peer, s *state) {
    // Insert the state exchange
    o.eventPend.Add(1)
    o.eventLock.Lock()
    o.exchSet[p] = s
    o.eventLock.Unlock()
    o.eventPend.Done()


    You do Add and Done at the same instant in time.
    If o.eventPend.Wait is called before o.exch, then there is indeed a
    race -- Wait won't wait for Add.
    Yes, but in my case that is not a problem, since I don't call Wait only a
    single time, but in a loop. The add will be caught by the next wait
    invocation.

    If o.eventPend.Wait is called after o.exch, then Add/Done are
    senseless -- the sum of Add/Done is zero.
    The point of add and done is wait until all concurrent go routines
    currently *inside* o.exch to finish. For example I receive 10 state
    updates. I want all 10 to be inserted into o.exchSet before allowing it to
    be swapped out by my manager. In the mean time more updates can arrive,
    it's not a problem, they will be processed in the next iteration.

    If o.eventPend.Wait is called concurrently with o.exch, then you have
    a mix of these 2 bad outcomes.
    The issue is that my use of the WaitGroup here is an optimization. I do not
    require guarantees that no updates arrive during processing. I just want to
    optimize on network messages by batching already arrived updates and not
    processing them separately.

    On Tue, Mar 25, 2014 at 2:26 PM, Péter Szilágyi wrote:
    Sorry, forgot to reply to the mailing list too.


    ---------- Forwarded message ----------
    From: Péter Szilágyi <peterke@gmail.com>
    Date: Tue, Mar 25, 2014 at 12:25 PM
    Subject: Re: [go-nuts] Data race reported in the race detection code within
    a waitgroup
    To: Dmitry Vyukov <dvyukov@google.com>


    Hi all,

    I can understand the logic in detecting such cases, as usually that could
    signal a race, but my use case actually is functioning as intended and uses
    Add/Done/Wait quite interleaved.

    Specifically, I'm maintaining a routing table which gets updated every
    time some remote connections or state updates arrive. Since my updates are
    quite expensive (they fire network state exchanges), I first collect all
    received updates into a temporary map and merge that into my routing table
    after no more is being received. Currently that is implemented by having a
    WaitGroup to count the number of threads waiting to add their updates into
    the temp collector. When all processes finish adding, a manager process
    swaps out the temporary map and merges them into the routing table. The
    manager then loops, waiting for new updates. I guess the race detector is
    bothered by this reuse of a waitgroup. But imho, this shouldn't be a
    problem.

    Generically, I have a stream of events inbound from arbitrarily many
    goroutines, and I want to batch them together and process them only when no
    more is arriving (i.e. during a silent period). Maybe a WaitGroup isn't the
    best solution, but what would be more appropriate?

    Channels aren't appropriate because they are bounded, and I want my event
    reporting routines to quickly continue (i.e. async) and not stall until the
    manager actually consumes them (buffered channels can fill up and then we're
    back to blocking).
    A collector map and mutex combo can handle the batching, but then my manager
    doesn't have any mechanism to wait for all pending updates to reach the
    collector.

    The perfect solution for my use case would be an unbounded channel. But
    since I don't have such a thing, I'm open to other design suggestions.
    WaitGroups solve the issue. Is my use case abusing it?

    Cheers,
    Peter


    On Tue, Mar 25, 2014 at 11:46 AM, Dmitry Vyukov wrote:
    On Tue, Mar 25, 2014 at 1:43 PM, Rob Pike wrote:
    From the stack traces alone, this looks like a bug that showed up
    today in some Google code. The issue is that there is no
    happens-before relation between Waitgroup.Add and Waitgroup.Wait, so
    your code does an Add after a Wait, the race detector will tell you.
    You need to establish a synchronization to guarantee the Add does not
    pass the Wait.

    Yes, this is the situation these reads and writes in WaitGroup are
    intended to detect.
    Peter, please check that you call wg.Wait strictly after all "root"
    wg.Add's.


    --
    You received this message because you are subscribed to the Google Groups
    "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send an
    email to golang-nuts+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/d/optout.
    --
    You received this message because you are subscribed to the Google Groups "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/d/optout.
  • Dmitry Vyukov at Mar 25, 2014 at 11:18 am

    On Tue, Mar 25, 2014 at 3:11 PM, Péter Szilágyi wrote:
    On Tue, Mar 25, 2014 at 1:03 PM, Dmitry Vyukov wrote:

    I do not see how this can possibly work:

    // Inserts a state exchange into the exchange queue
    func (o *Overlay) exch(p *peer, s *state) {
    // Insert the state exchange
    o.eventPend.Add(1)
    o.eventLock.Lock()
    o.exchSet[p] = s
    o.eventLock.Unlock()
    o.eventPend.Done()


    You do Add and Done at the same instant in time.
    If o.eventPend.Wait is called before o.exch, then there is indeed a
    race -- Wait won't wait for Add.
    Yes, but in my case that is not a problem, since I don't call Wait only a
    single time, but in a loop. The add will be caught by the next wait
    invocation.
    If o.eventPend.Wait is called after o.exch, then Add/Done are
    senseless -- the sum of Add/Done is zero.
    I believe this is incorrect usage of WaitGroup. The counter must not
    be incremented from zero while a goroutine is in wg.Wait.
    Future wg implementations can break your code.

    The point of add and done is wait until all concurrent go routines currently
    *inside* o.exch to finish. For example I receive 10 state updates. I want
    all 10 to be inserted into o.exchSet before allowing it to be swapped out by
    my manager. In the mean time more updates can arrive, it's not a problem,
    they will be processed in the next iteration.
    This looks like a very very weak (close to senseless) aggregation
    strategy to me. The region between Add and Done can be executed in
    100ns. So even if 10 updates have arrived almost simultaneously, you
    still can have 10 separate updates.

    See my previous email with a much stronger aggregation strategy,
    basically the updater collects all updates in a given time window
    after the first update.

    If o.eventPend.Wait is called concurrently with o.exch, then you have
    a mix of these 2 bad outcomes.
    The issue is that my use of the WaitGroup here is an optimization. I do not
    require guarantees that no updates arrive during processing. I just want to
    optimize on network messages by batching already arrived updates and not
    processing them separately.
    On Tue, Mar 25, 2014 at 2:26 PM, Péter Szilágyi wrote:
    Sorry, forgot to reply to the mailing list too.


    ---------- Forwarded message ----------
    From: Péter Szilágyi <peterke@gmail.com>
    Date: Tue, Mar 25, 2014 at 12:25 PM
    Subject: Re: [go-nuts] Data race reported in the race detection code
    within
    a waitgroup
    To: Dmitry Vyukov <dvyukov@google.com>


    Hi all,

    I can understand the logic in detecting such cases, as usually that
    could
    signal a race, but my use case actually is functioning as intended and
    uses
    Add/Done/Wait quite interleaved.

    Specifically, I'm maintaining a routing table which gets updated every
    time some remote connections or state updates arrive. Since my updates
    are
    quite expensive (they fire network state exchanges), I first collect all
    received updates into a temporary map and merge that into my routing
    table
    after no more is being received. Currently that is implemented by having
    a
    WaitGroup to count the number of threads waiting to add their updates
    into
    the temp collector. When all processes finish adding, a manager process
    swaps out the temporary map and merges them into the routing table. The
    manager then loops, waiting for new updates. I guess the race detector
    is
    bothered by this reuse of a waitgroup. But imho, this shouldn't be a
    problem.

    Generically, I have a stream of events inbound from arbitrarily many
    goroutines, and I want to batch them together and process them only when
    no
    more is arriving (i.e. during a silent period). Maybe a WaitGroup isn't
    the
    best solution, but what would be more appropriate?

    Channels aren't appropriate because they are bounded, and I want my
    event
    reporting routines to quickly continue (i.e. async) and not stall until
    the
    manager actually consumes them (buffered channels can fill up and then
    we're
    back to blocking).
    A collector map and mutex combo can handle the batching, but then my
    manager
    doesn't have any mechanism to wait for all pending updates to reach the
    collector.

    The perfect solution for my use case would be an unbounded channel.
    But
    since I don't have such a thing, I'm open to other design suggestions.
    WaitGroups solve the issue. Is my use case abusing it?

    Cheers,
    Peter



    On Tue, Mar 25, 2014 at 11:46 AM, Dmitry Vyukov <dvyukov@google.com>
    wrote:
    On Tue, Mar 25, 2014 at 1:43 PM, Rob Pike wrote:
    From the stack traces alone, this looks like a bug that showed up
    today in some Google code. The issue is that there is no
    happens-before relation between Waitgroup.Add and Waitgroup.Wait, so
    your code does an Add after a Wait, the race detector will tell you.
    You need to establish a synchronization to guarantee the Add does not
    pass the Wait.

    Yes, this is the situation these reads and writes in WaitGroup are
    intended to detect.
    Peter, please check that you call wg.Wait strictly after all "root"
    wg.Add's.


    --
    You received this message because you are subscribed to the Google
    Groups
    "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send
    an
    email to golang-nuts+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/d/optout.
    --
    You received this message because you are subscribed to the Google Groups "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/d/optout.
  • Péter Szilágyi at Mar 25, 2014 at 11:43 am

    On Tue, Mar 25, 2014 at 1:18 PM, Dmitry Vyukov wrote:
    On Tue, Mar 25, 2014 at 3:11 PM, Péter Szilágyi wrote:
    On Tue, Mar 25, 2014 at 1:03 PM, Dmitry Vyukov wrote:

    I do not see how this can possibly work:

    // Inserts a state exchange into the exchange queue
    func (o *Overlay) exch(p *peer, s *state) {
    // Insert the state exchange
    o.eventPend.Add(1)
    o.eventLock.Lock()
    o.exchSet[p] = s
    o.eventLock.Unlock()
    o.eventPend.Done()


    You do Add and Done at the same instant in time.
    If o.eventPend.Wait is called before o.exch, then there is indeed a
    race -- Wait won't wait for Add.
    Yes, but in my case that is not a problem, since I don't call Wait only a
    single time, but in a loop. The add will be caught by the next wait
    invocation.
    If o.eventPend.Wait is called after o.exch, then Add/Done are
    senseless -- the sum of Add/Done is zero.
    I believe this is incorrect usage of WaitGroup. The counter must not
    be incremented from zero while a goroutine is in wg.Wait.
    Future wg implementations can break your code.
    Although I could argue that this is limiting a waitgroup's capabilities, I
    can accept this reasoning. It would probably be extremely rare that such a
    case is indeed warranted and detecting potential races in non-warranted
    code is more valuable. That is why I asked if this use-case was abusing the
    wg.
    The point of add and done is wait until all concurrent go routines
    currently
    *inside* o.exch to finish. For example I receive 10 state updates. I want
    all 10 to be inserted into o.exchSet before allowing it to be swapped out by
    my manager. In the mean time more updates can arrive, it's not a problem,
    they will be processed in the next iteration.
    This looks like a very very weak (close to senseless) aggregation
    strategy to me. The region between Add and Done can be executed in
    100ns. So even if 10 updates have arrived almost simultaneously, you
    still can have 10 separate updates.
    I guess the worst case scenario would be that the first update passes and
    the rest will be collected while the processor is running. But fair enough,
    my thought flow wasn't the strongest here :))

    See my previous email with a much stronger aggregation strategy,
    basically the updater collects all updates in a given time window
    after the first update.
    I'm not too fond of time windows since they can cause significant
    latencies, but maybe it would be a good solution in the current scenario
    (alas with a much shorted window of milliseconds) as there shouldn't be too
    many cascading updates. Dunno, will sleep on it and decide afterwards :)

    Anyway, thanks for your time on clearing these up :)

    If o.eventPend.Wait is called concurrently with o.exch, then you have
    a mix of these 2 bad outcomes.
    The issue is that my use of the WaitGroup here is an optimization. I do not
    require guarantees that no updates arrive during processing. I just want to
    optimize on network messages by batching already arrived updates and not
    processing them separately.
    On Tue, Mar 25, 2014 at 2:26 PM, Péter Szilágyi wrote:
    Sorry, forgot to reply to the mailing list too.


    ---------- Forwarded message ----------
    From: Péter Szilágyi <peterke@gmail.com>
    Date: Tue, Mar 25, 2014 at 12:25 PM
    Subject: Re: [go-nuts] Data race reported in the race detection code
    within
    a waitgroup
    To: Dmitry Vyukov <dvyukov@google.com>


    Hi all,

    I can understand the logic in detecting such cases, as usually that
    could
    signal a race, but my use case actually is functioning as intended and
    uses
    Add/Done/Wait quite interleaved.

    Specifically, I'm maintaining a routing table which gets updated
    every
    time some remote connections or state updates arrive. Since my updates
    are
    quite expensive (they fire network state exchanges), I first collect
    all
    received updates into a temporary map and merge that into my routing
    table
    after no more is being received. Currently that is implemented by
    having
    a
    WaitGroup to count the number of threads waiting to add their updates
    into
    the temp collector. When all processes finish adding, a manager
    process
    swaps out the temporary map and merges them into the routing table.
    The
    manager then loops, waiting for new updates. I guess the race detector
    is
    bothered by this reuse of a waitgroup. But imho, this shouldn't be a
    problem.

    Generically, I have a stream of events inbound from arbitrarily many
    goroutines, and I want to batch them together and process them only
    when
    no
    more is arriving (i.e. during a silent period). Maybe a WaitGroup
    isn't
    the
    best solution, but what would be more appropriate?

    Channels aren't appropriate because they are bounded, and I want my
    event
    reporting routines to quickly continue (i.e. async) and not stall
    until
    the
    manager actually consumes them (buffered channels can fill up and then
    we're
    back to blocking).
    A collector map and mutex combo can handle the batching, but then my
    manager
    doesn't have any mechanism to wait for all pending updates to reach
    the
    collector.

    The perfect solution for my use case would be an unbounded channel.
    But
    since I don't have such a thing, I'm open to other design suggestions.
    WaitGroups solve the issue. Is my use case abusing it?

    Cheers,
    Peter



    On Tue, Mar 25, 2014 at 11:46 AM, Dmitry Vyukov <dvyukov@google.com>
    wrote:
    On Tue, Mar 25, 2014 at 1:43 PM, Rob Pike wrote:
    From the stack traces alone, this looks like a bug that showed up
    today in some Google code. The issue is that there is no
    happens-before relation between Waitgroup.Add and Waitgroup.Wait,
    so
    your code does an Add after a Wait, the race detector will tell
    you.
    You need to establish a synchronization to guarantee the Add does
    not
    pass the Wait.

    Yes, this is the situation these reads and writes in WaitGroup are
    intended to detect.
    Peter, please check that you call wg.Wait strictly after all "root"
    wg.Add's.


    --
    You received this message because you are subscribed to the Google
    Groups
    "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send
    an
    email to golang-nuts+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/d/optout.
    --
    You received this message because you are subscribed to the Google Groups "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/d/optout.
  • Dmitry Vyukov at Mar 25, 2014 at 12:47 pm

    On Tue, Mar 25, 2014 at 3:43 PM, Péter Szilágyi wrote:
    On Tue, Mar 25, 2014 at 1:18 PM, Dmitry Vyukov wrote:
    On Tue, Mar 25, 2014 at 3:11 PM, Péter Szilágyi wrote:

    On Tue, Mar 25, 2014 at 1:03 PM, Dmitry Vyukov <dvyukov@google.com>
    wrote:
    I do not see how this can possibly work:

    // Inserts a state exchange into the exchange queue
    func (o *Overlay) exch(p *peer, s *state) {
    // Insert the state exchange
    o.eventPend.Add(1)
    o.eventLock.Lock()
    o.exchSet[p] = s
    o.eventLock.Unlock()
    o.eventPend.Done()


    You do Add and Done at the same instant in time.
    If o.eventPend.Wait is called before o.exch, then there is indeed a
    race -- Wait won't wait for Add.
    Yes, but in my case that is not a problem, since I don't call Wait only
    a
    single time, but in a loop. The add will be caught by the next wait
    invocation.
    If o.eventPend.Wait is called after o.exch, then Add/Done are
    senseless -- the sum of Add/Done is zero.
    I believe this is incorrect usage of WaitGroup. The counter must not
    be incremented from zero while a goroutine is in wg.Wait.
    Future wg implementations can break your code.
    Although I could argue that this is limiting a waitgroup's capabilities, I
    can accept this reasoning. It would probably be extremely rare that such a
    case is indeed warranted and detecting potential races in non-warranted code
    is more valuable. That is why I asked if this use-case was abusing the wg.
    The point of add and done is wait until all concurrent go routines
    currently
    *inside* o.exch to finish. For example I receive 10 state updates. I
    want
    all 10 to be inserted into o.exchSet before allowing it to be swapped
    out by
    my manager. In the mean time more updates can arrive, it's not a
    problem,
    they will be processed in the next iteration.
    This looks like a very very weak (close to senseless) aggregation
    strategy to me. The region between Add and Done can be executed in
    100ns. So even if 10 updates have arrived almost simultaneously, you
    still can have 10 separate updates.
    I guess the worst case scenario would be that the first update passes and
    the rest will be collected while the processor is running. But fair enough,
    my thought flow wasn't the strongest here :))
    See my previous email with a much stronger aggregation strategy,
    basically the updater collects all updates in a given time window
    after the first update.
    I'm not too fond of time windows since they can cause significant latencies,

    well, in your current code you don't have any bound on latency at all,
    a tight sequence of updates can delay the batch infinitely
    with the time window you do have the worst case bound on latency. and,
    sure, you can reduce it to 10ms or whatever works for you

    if you are concerned about potential updates channel overflow, you can
    do the following:

    func processor() {
       for {
         var temp []*Event
         temp = append(temp, <-update)
         deadline := time.After(time.Second)
         collectLoop: for {
           select {
           case e := <-updates:
             temp = append(temp, e)
           case <-deadline:
             break collectLoop
           }
         }
         batchChan <- temp // another goroutine will handle it asynchronously
       }
    }

    this way you always have a goroutine reading from updates chan, so it
    won't overflow until something unexpected happens (in which case you
    do want to block producers, or discard messages).


    but maybe it would be a good solution in the current scenario (alas with a
    much shorted window of milliseconds) as there shouldn't be too many
    cascading updates. Dunno, will sleep on it and decide afterwards :)

    Anyway, thanks for your time on clearing these up :)
    If o.eventPend.Wait is called concurrently with o.exch, then you have
    a mix of these 2 bad outcomes.
    The issue is that my use of the WaitGroup here is an optimization. I do
    not
    require guarantees that no updates arrive during processing. I just want
    to
    optimize on network messages by batching already arrived updates and not
    processing them separately.
    On Tue, Mar 25, 2014 at 2:26 PM, Péter Szilágyi <peterke@gmail.com>
    wrote:
    Sorry, forgot to reply to the mailing list too.


    ---------- Forwarded message ----------
    From: Péter Szilágyi <peterke@gmail.com>
    Date: Tue, Mar 25, 2014 at 12:25 PM
    Subject: Re: [go-nuts] Data race reported in the race detection code
    within
    a waitgroup
    To: Dmitry Vyukov <dvyukov@google.com>


    Hi all,

    I can understand the logic in detecting such cases, as usually that
    could
    signal a race, but my use case actually is functioning as intended
    and
    uses
    Add/Done/Wait quite interleaved.

    Specifically, I'm maintaining a routing table which gets updated
    every
    time some remote connections or state updates arrive. Since my
    updates
    are
    quite expensive (they fire network state exchanges), I first collect
    all
    received updates into a temporary map and merge that into my routing
    table
    after no more is being received. Currently that is implemented by
    having
    a
    WaitGroup to count the number of threads waiting to add their updates
    into
    the temp collector. When all processes finish adding, a manager
    process
    swaps out the temporary map and merges them into the routing table.
    The
    manager then loops, waiting for new updates. I guess the race
    detector
    is
    bothered by this reuse of a waitgroup. But imho, this shouldn't be a
    problem.

    Generically, I have a stream of events inbound from arbitrarily
    many
    goroutines, and I want to batch them together and process them only
    when
    no
    more is arriving (i.e. during a silent period). Maybe a WaitGroup
    isn't
    the
    best solution, but what would be more appropriate?

    Channels aren't appropriate because they are bounded, and I want my
    event
    reporting routines to quickly continue (i.e. async) and not stall
    until
    the
    manager actually consumes them (buffered channels can fill up and
    then
    we're
    back to blocking).
    A collector map and mutex combo can handle the batching, but then my
    manager
    doesn't have any mechanism to wait for all pending updates to reach
    the
    collector.

    The perfect solution for my use case would be an unbounded channel.
    But
    since I don't have such a thing, I'm open to other design
    suggestions.
    WaitGroups solve the issue. Is my use case abusing it?

    Cheers,
    Peter



    On Tue, Mar 25, 2014 at 11:46 AM, Dmitry Vyukov <dvyukov@google.com>
    wrote:
    On Tue, Mar 25, 2014 at 1:43 PM, Rob Pike wrote:
    From the stack traces alone, this looks like a bug that showed up
    today in some Google code. The issue is that there is no
    happens-before relation between Waitgroup.Add and Waitgroup.Wait,
    so
    your code does an Add after a Wait, the race detector will tell
    you.
    You need to establish a synchronization to guarantee the Add does
    not
    pass the Wait.

    Yes, this is the situation these reads and writes in WaitGroup are
    intended to detect.
    Peter, please check that you call wg.Wait strictly after all "root"
    wg.Add's.


    --
    You received this message because you are subscribed to the Google
    Groups
    "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it,
    send
    an
    email to golang-nuts+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/d/optout.
    --
    You received this message because you are subscribed to the Google Groups "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/d/optout.
  • Ben Bitdiddle1861 at Mar 31, 2014 at 7:45 pm
    sup world

    --
    You received this message because you are subscribed to the Google Groups "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/d/optout.
  • Roger peppe at Mar 25, 2014 at 10:06 am

    On 25 March 2014 09:43, Rob Pike wrote:
    From the stack traces alone, this looks like a bug that showed up
    today in some Google code. The issue is that there is no
    happens-before relation between Waitgroup.Add and Waitgroup.Wait, so
    your code does an Add after a Wait, the race detector will tell you.
    You need to establish a synchronization to guarantee the Add does not
    pass the Wait.
    I hope that's not strictly true. This code does an Add after Wait
    is *called* (but not before it returns) but I've always considered it to be ok.
    It was a scenario I remember thinking about when WaitGroup was initially
    designed.

    http://play.golang.org/p/DN_uEiF-5G

    --
    You received this message because you are subscribed to the Google Groups "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/d/optout.
  • Jan Mercl at Mar 25, 2014 at 10:13 am

    On Tue, Mar 25, 2014 at 11:06 AM, roger peppe wrote:
    I hope that's not strictly true. This code does an Add after Wait
    is *called* (but not before it returns) but I've always considered it to be ok.
    It was a scenario I remember thinking about when WaitGroup was initially
    designed.

    http://play.golang.org/p/DN_uEiF-5G
    There's no way how to guarantee that concurrently calling wg.Add and
    wg.Wait will work properly. Not in this implementation and not in any
    other. The reason is that .Wait can in such scenario legitimately
    succeed before the .Add is performed. As Dmitry already stated, .Add
    must HB .Wait, there's no other option.

    -j

    --
    You received this message because you are subscribed to the Google Groups "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/d/optout.
  • Dmitry Vyukov at Mar 25, 2014 at 10:55 am

    On Tue, Mar 25, 2014 at 2:11 PM, Jan Mercl wrote:
    On Tue, Mar 25, 2014 at 11:06 AM, roger peppe wrote:
    I hope that's not strictly true. This code does an Add after Wait
    is *called* (but not before it returns) but I've always considered it to be ok.
    It was a scenario I remember thinking about when WaitGroup was initially
    designed.

    http://play.golang.org/p/DN_uEiF-5G
    There's no way how to guarantee that concurrently calling wg.Add and
    wg.Wait will work properly. Not in this implementation and not in any
    other. The reason is that .Wait can in such scenario legitimately
    succeed before the .Add is performed. As Dmitry already stated, .Add
    must HB .Wait, there's no other option.
    The key word in my reply is *root* wg.Add's.
    In the example, the first wg.Add is *root*, it ensures that the
    counter does not drop to zero while we are doing other wg.Add's
    The second wg.Add is "child". We've already ensured the counter can't
    drop to zero while we are doing this wg.Add, so it can proceed
    concurrently with wg.Done.

    --
    You received this message because you are subscribed to the Google Groups "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/d/optout.

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupgolang-nuts @
categoriesgo
postedMar 25, '14 at 8:39a
activeMar 31, '14 at 7:45p
posts15
users6
websitegolang.org

People

Translate

site design / logo © 2022 Grokbase