FAQ
I could not find anything useful Googling "unexpected fault address error".
I am suspecting a race condition.

We have a TCP server that loops over a map of large number of clients (that
frequently connect and disconnect), broadcasting a message. The code was
compiled with Go 1.3, GOARCH amd64.

This is a very rare error that happened once under a spiking load:

1405195804.885505 [debug] Go server: unexpected fault address 0x6d50
1405195804.890221 [debug] Go server: fatal error: fault
1405195804.890240 [debug] Go server: [signal 0xb code=0x1 addr=0x6d50
pc=0x4230ea]
1405195804.890254 [debug] Go server: goroutine 3927233 [running]:
1405195804.890268 [debug] Go server: runtime.throw(0x9356e2)
1405195804.890281 [debug] Go server:
  /usr/local/go/src/pkg/runtime/panic.c:520 +0x69 fp=0x2aaaad38fd00
1405195804.890349 [debug] Go server: sp=0x2aaaad38fce8
1405195804.890364 [debug] Go server: runtime.sigpanic()
1405195804.890378 [debug] Go server:
  /usr/local/go/src/pkg/runtime/os_linux.c:240 +0x13f fp=0x2aaaad38fd18
sp=0x2aaaad38fd00
1405195804.890393 [debug] Go server: hash_next(0x2aaaad38ff20)
1405195804.890406 [debug] Go server:
  /usr/local/go/src/pkg/runtime/hashmap.goc:707 +0x50a fp=0x2aaaad38fdb0
sp=0x2aaaad38fd18
1405195804.890422 [debug] Go server: runtime.mapiternext(0x2aaaad38ff20)
1405195804.890436 [debug] Go server:
  /usr/local/go/src/pkg/runtime/hashmap.goc:1048 +0x12 fp=0x2aaaad38fdd8
sp=0x2aaaad38fdb0
1405195804.890451 [debug] Go server:
_/nail/build/imbuild/work/tagservgo/server.(*Server).broadcastWorker(0xc208028380,
0xc208055183
, 0x9, 0x0, 0xc20805518d, 0x247)
*1405195804.890465 [debug] Go server:
  /nail/build/imbuild/work/tagservgo/server/Server.go:161 +0x374
fp=0x2aaaad38ff78 sp=0x2aaaad*
*38fdd8*
1405195804.890479 [debug] Go server: runtime.goexit()
1405195804.890492 [debug] Go server:
  /usr/local/go/src/pkg/runtime/proc.c:1445 fp=0x2aaaad38ff80
sp=0x2aaaad38ff78
1405195804.890505 [debug] Go server: created by
_/nail/build/imbuild/work/tagservgo/server.(*Server).WriteToGroup
1405195804.890519 [debug] Go server:
  /nail/build/imbuild/work/tagservgo/server/Server.go:184 +0xc2
1405195804.890532 [debug] Go server: goroutine 16 [runnable]:

The relevant code is:

158 func (s *Server) broadcastWorker(groupName string, shard int, msg
string) {
159 toCtr := 0
160 numClients := len(s.Groups[groupName][shard])
*161* for client := range s.Groups[groupName][shard] {
162 err := s.writeToClient(client, msg)
163 if err != nil {
164 s.Debugln("Error in broadcast: ", err)
165 if strings.Contains(err.Error(), "i/o timeout") {
166 toCtr++
167 s.removeClient(client, false)
168 }
169 }
170 }


181 func (s *Server) WriteToGroup(groupName string, msg string) {
182 numShards := len(s.Groups[groupName])
183 for shard := 0; shard < numShards; shard++ {
184 go s.broadcastWorker(groupName, shard, msg)
185 }
186 }

This map s.Groups[name][shard] is manipulated in different Go routines,
when the new clients join or leave. I wonder if this error is a consequence
of a race condition of looping over this map while it is being added to, or
removed from. If this is the case, putting s.lock() around the for loop in
  broadcastWorker leads to significant performance penalties (from previous
stress testing). Perhaps making a deep copy of s.Groups map would be a
better idea?

--
You received this message because you are subscribed to the Google Groups "golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [email protected].
For more options, visit https://groups.google.com/d/optout.

Search Discussions

  • Alex Skinner at Jul 12, 2014 at 11:27 pm
    Have you tried a RWMutex? Just RLock when you're looping/reading, and Lock
    when writing - allowing for multiple readers but only one writer. I'd be
    curious to know what you find to be faster between this and a copy.

    Thanks,
    Alex
    On Saturday, July 12, 2014 5:22:42 PM UTC-4, Alec Matusis wrote:

    I could not find anything useful Googling "unexpected fault address
    error". I am suspecting a race condition.

    We have a TCP server that loops over a map of large number of clients
    (that frequently connect and disconnect), broadcasting a message. The code
    was compiled with Go 1.3, GOARCH amd64.

    This is a very rare error that happened once under a spiking load:

    1405195804.885505 [debug] Go server: unexpected fault address 0x6d50
    1405195804.890221 [debug] Go server: fatal error: fault
    1405195804.890240 [debug] Go server: [signal 0xb code=0x1 addr=0x6d50
    pc=0x4230ea]
    1405195804.890254 [debug] Go server: goroutine 3927233 [running]:
    1405195804.890268 [debug] Go server: runtime.throw(0x9356e2)
    1405195804.890281 [debug] Go server:
    /usr/local/go/src/pkg/runtime/panic.c:520 +0x69 fp=0x2aaaad38fd00
    1405195804.890349 [debug] Go server: sp=0x2aaaad38fce8
    1405195804.890364 [debug] Go server: runtime.sigpanic()
    1405195804.890378 [debug] Go server:
    /usr/local/go/src/pkg/runtime/os_linux.c:240 +0x13f fp=0x2aaaad38fd18
    sp=0x2aaaad38fd00
    1405195804.890393 [debug] Go server: hash_next(0x2aaaad38ff20)
    1405195804.890406 [debug] Go server:
    /usr/local/go/src/pkg/runtime/hashmap.goc:707 +0x50a fp=0x2aaaad38fdb0
    sp=0x2aaaad38fd18
    1405195804.890422 [debug] Go server: runtime.mapiternext(0x2aaaad38ff20)
    1405195804.890436 [debug] Go server:
    /usr/local/go/src/pkg/runtime/hashmap.goc:1048 +0x12 fp=0x2aaaad38fdd8
    sp=0x2aaaad38fdb0
    1405195804.890451 [debug] Go server:
    _/nail/build/imbuild/work/tagservgo/server.(*Server).broadcastWorker(0xc208028380,
    0xc208055183
    , 0x9, 0x0, 0xc20805518d, 0x247)
    *1405195804.890465 [debug] Go server:
    /nail/build/imbuild/work/tagservgo/server/Server.go:161 +0x374
    fp=0x2aaaad38ff78 sp=0x2aaaad*
    *38fdd8*
    1405195804.890479 [debug] Go server: runtime.goexit()
    1405195804.890492 [debug] Go server:
    /usr/local/go/src/pkg/runtime/proc.c:1445 fp=0x2aaaad38ff80
    sp=0x2aaaad38ff78
    1405195804.890505 [debug] Go server: created by
    _/nail/build/imbuild/work/tagservgo/server.(*Server).WriteToGroup
    1405195804.890519 [debug] Go server:
    /nail/build/imbuild/work/tagservgo/server/Server.go:184 +0xc2
    1405195804.890532 [debug] Go server: goroutine 16 [runnable]:

    The relevant code is:

    158 func (s *Server) broadcastWorker(groupName string, shard int, msg
    string) {
    159 toCtr := 0
    160 numClients := len(s.Groups[groupName][shard])
    *161* for client := range s.Groups[groupName][shard] {
    162 err := s.writeToClient(client, msg)
    163 if err != nil {
    164 s.Debugln("Error in broadcast: ", err)
    165 if strings.Contains(err.Error(), "i/o timeout") {
    166 toCtr++
    167 s.removeClient(client, false)
    168 }
    169 }
    170 }


    181 func (s *Server) WriteToGroup(groupName string, msg string) {
    182 numShards := len(s.Groups[groupName])
    183 for shard := 0; shard < numShards; shard++ {
    184 go s.broadcastWorker(groupName, shard, msg)
    185 }
    186 }

    This map s.Groups[name][shard] is manipulated in different Go routines,
    when the new clients join or leave. I wonder if this error is a consequence
    of a race condition of looping over this map while it is being added to, or
    removed from. If this is the case, putting s.lock() around the for loop in
    broadcastWorker leads to significant performance penalties (from previous
    stress testing). Perhaps making a deep copy of s.Groups map would be a
    better idea?
    --
    You received this message because you are subscribed to the Google Groups "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to [email protected].
    For more options, visit https://groups.google.com/d/optout.
  • Alec Matusis at Jul 13, 2014 at 7:16 am
    We will try RWMutex: so far we only tried Mutex around the for loop, and it
    incurred significant performance penalty. What is the difference between
    RWMutex and simple Mutex? Is locking for reading like making a deep copy- I
    am not clear on this?

    Thanks
    Alec
    On Saturday, July 12, 2014 4:27:42 PM UTC-7, Alex Skinner wrote:

    Have you tried a RWMutex? Just RLock when you're looping/reading, and
    Lock when writing - allowing for multiple readers but only one writer. I'd
    be curious to know what you find to be faster between this and a copy.

    Thanks,
    Alex
    On Saturday, July 12, 2014 5:22:42 PM UTC-4, Alec Matusis wrote:

    I could not find anything useful Googling "unexpected fault address
    error". I am suspecting a race condition.

    We have a TCP server that loops over a map of large number of clients
    (that frequently connect and disconnect), broadcasting a message. The code
    was compiled with Go 1.3, GOARCH amd64.

    This is a very rare error that happened once under a spiking load:

    1405195804.885505 [debug] Go server: unexpected fault address 0x6d50
    1405195804.890221 [debug] Go server: fatal error: fault
    1405195804.890240 [debug] Go server: [signal 0xb code=0x1 addr=0x6d50
    pc=0x4230ea]
    1405195804.890254 [debug] Go server: goroutine 3927233 [running]:
    1405195804.890268 [debug] Go server: runtime.throw(0x9356e2)
    1405195804.890281 [debug] Go server:
    /usr/local/go/src/pkg/runtime/panic.c:520 +0x69 fp=0x2aaaad38fd00
    1405195804.890349 [debug] Go server: sp=0x2aaaad38fce8
    1405195804.890364 [debug] Go server: runtime.sigpanic()
    1405195804.890378 [debug] Go server:
    /usr/local/go/src/pkg/runtime/os_linux.c:240 +0x13f fp=0x2aaaad38fd18
    sp=0x2aaaad38fd00
    1405195804.890393 [debug] Go server: hash_next(0x2aaaad38ff20)
    1405195804.890406 [debug] Go server:
    /usr/local/go/src/pkg/runtime/hashmap.goc:707 +0x50a fp=0x2aaaad38fdb0
    sp=0x2aaaad38fd18
    1405195804.890422 [debug] Go server: runtime.mapiternext(0x2aaaad38ff20)
    1405195804.890436 [debug] Go server:
    /usr/local/go/src/pkg/runtime/hashmap.goc:1048 +0x12 fp=0x2aaaad38fdd8
    sp=0x2aaaad38fdb0
    1405195804.890451 [debug] Go server:
    _/nail/build/imbuild/work/tagservgo/server.(*Server).broadcastWorker(0xc208028380,
    0xc208055183
    , 0x9, 0x0, 0xc20805518d, 0x247)
    *1405195804.890465 [debug] Go server:
    /nail/build/imbuild/work/tagservgo/server/Server.go:161 +0x374
    fp=0x2aaaad38ff78 sp=0x2aaaad*
    *38fdd8*
    1405195804.890479 [debug] Go server: runtime.goexit()
    1405195804.890492 [debug] Go server:
    /usr/local/go/src/pkg/runtime/proc.c:1445 fp=0x2aaaad38ff80
    sp=0x2aaaad38ff78
    1405195804.890505 [debug] Go server: created by
    _/nail/build/imbuild/work/tagservgo/server.(*Server).WriteToGroup
    1405195804.890519 [debug] Go server:
    /nail/build/imbuild/work/tagservgo/server/Server.go:184 +0xc2
    1405195804.890532 [debug] Go server: goroutine 16 [runnable]:

    The relevant code is:

    158 func (s *Server) broadcastWorker(groupName string, shard int, msg
    string) {
    159 toCtr := 0
    160 numClients := len(s.Groups[groupName][shard])
    *161* for client := range s.Groups[groupName][shard] {
    162 err := s.writeToClient(client, msg)
    163 if err != nil {
    164 s.Debugln("Error in broadcast: ", err)
    165 if strings.Contains(err.Error(), "i/o timeout") {
    166 toCtr++
    167 s.removeClient(client, false)
    168 }
    169 }
    170 }


    181 func (s *Server) WriteToGroup(groupName string, msg string) {
    182 numShards := len(s.Groups[groupName])
    183 for shard := 0; shard < numShards; shard++ {
    184 go s.broadcastWorker(groupName, shard, msg)
    185 }
    186 }

    This map s.Groups[name][shard] is manipulated in different Go routines,
    when the new clients join or leave. I wonder if this error is a consequence
    of a race condition of looping over this map while it is being added to, or
    removed from. If this is the case, putting s.lock() around the for loop in
    broadcastWorker leads to significant performance penalties (from previous
    stress testing). Perhaps making a deep copy of s.Groups map would be a
    better idea?
    --
    You received this message because you are subscribed to the Google Groups "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to [email protected].
    For more options, visit https://groups.google.com/d/optout.
  • Alex Skinner at Jul 13, 2014 at 7:27 am
    An RWMutex enables you to specify who reads and who writes. If you have
    twenty goroutines that read from a map only, a mutex will make them block
    each other needlessly. An RWMutex allows you to use RLock. RLocks dont
    block other RLocks, so you can have multiple concurrent readers. The
    writers should call lock, which can't happen while an rlock is held(and
    vice versa). So in a system with more readers than writers, an rwmutex
    should have better performance.


    On Sunday, July 13, 2014, Alec Matusis wrote:

    We will try RWMutex: so far we only tried Mutex around the for loop, and
    it incurred significant performance penalty. What is the difference between
    RWMutex and simple Mutex? Is locking for reading like making a deep copy- I
    am not clear on this?

    Thanks
    Alec
    On Saturday, July 12, 2014 4:27:42 PM UTC-7, Alex Skinner wrote:

    Have you tried a RWMutex? Just RLock when you're looping/reading, and
    Lock when writing - allowing for multiple readers but only one writer. I'd
    be curious to know what you find to be faster between this and a copy.

    Thanks,
    Alex
    On Saturday, July 12, 2014 5:22:42 PM UTC-4, Alec Matusis wrote:

    I could not find anything useful Googling "unexpected fault address
    error". I am suspecting a race condition.

    We have a TCP server that loops over a map of large number of clients
    (that frequently connect and disconnect), broadcasting a message. The code
    was compiled with Go 1.3, GOARCH amd64.

    This is a very rare error that happened once under a spiking load:

    1405195804.885505 [debug] Go server: unexpected fault address 0x6d50
    1405195804.890221 [debug] Go server: fatal error: fault
    1405195804.890240 [debug] Go server: [signal 0xb code=0x1 addr=0x6d50
    pc=0x4230ea]
    1405195804.890254 [debug] Go server: goroutine 3927233 [running]:
    1405195804.890268 [debug] Go server: runtime.throw(0x9356e2)
    1405195804.890281 [debug] Go server: /usr/local/go/src/pkg/runtime/panic.c:520
    +0x69 fp=0x2aaaad38fd00
    1405195804.890349 [debug] Go server: sp=0x2aaaad38fce8
    1405195804.890364 [debug] Go server: runtime.sigpanic()
    1405195804.890378 [debug] Go server: /usr/local/go/src/pkg/runtime/os_linux.c:240
    +0x13f fp=0x2aaaad38fd18 sp=0x2aaaad38fd00
    1405195804.890393 [debug] Go server: hash_next(0x2aaaad38ff20)
    1405195804.890406 [debug] Go server: /usr/local/go/src/pkg/runtime/hashmap.goc:707
    +0x50a fp=0x2aaaad38fdb0 sp=0x2aaaad38fd18
    1405195804.890422 [debug] Go server: runtime.mapiternext(0x2aaaad38ff20)
    1405195804.890436 [debug] Go server: /usr/local/go/src/pkg/runtime/hashmap.goc:1048
    +0x12 fp=0x2aaaad38fdd8 sp=0x2aaaad38fdb0
    1405195804.890451 [debug] Go server: _/nail/build/imbuild/work/
    tagservgo/server.(*Server).broadcastWorker(0xc208028380, 0xc208055183
    , 0x9, 0x0, 0xc20805518d, 0x247)
    *1405195804.890465 [debug] Go server:
    /nail/build/imbuild/work/tagservgo/server/Server.go:161 +0x374
    fp=0x2aaaad38ff78 sp=0x2aaaad*
    *38fdd8*
    1405195804.890479 [debug] Go server: runtime.goexit()
    1405195804.890492 [debug] Go server: /usr/local/go/src/pkg/runtime/proc.c:1445
    fp=0x2aaaad38ff80 sp=0x2aaaad38ff78
    1405195804.890505 [debug] Go server: created by
    _/nail/build/imbuild/work/tagservgo/server.(*Server).WriteToGroup
    1405195804.890519 [debug] Go server: /nail/build/imbuild/work/tagservgo/server/Server.go:184
    +0xc2
    1405195804.890532 [debug] Go server: goroutine 16 [runnable]:

    The relevant code is:

    158 func (s *Server) broadcastWorker(groupName string, shard int, msg
    string) {
    159 toCtr := 0
    160 numClients := len(s.Groups[groupName][shard])
    *161* for client := range s.Groups[groupName][shard] {
    162 err := s.writeToClient(client, msg)
    163 if err != nil {
    164 s.Debugln("Error in broadcast: ", err)
    165 if strings.Contains(err.Error(), "i/o timeout") {
    166 toCtr++
    167 s.removeClient(client, false)
    168 }
    169 }
    170 }


    181 func (s *Server) WriteToGroup(groupName string, msg string) {
    182 numShards := len(s.Groups[groupName])
    183 for shard := 0; shard < numShards; shard++ {
    184 go s.broadcastWorker(groupName, shard, msg)
    185 }
    186 }

    This map s.Groups[name][shard] is manipulated in different Go routines,
    when the new clients join or leave. I wonder if this error is a consequence
    of a race condition of looping over this map while it is being added to, or
    removed from. If this is the case, putting s.lock() around the for loop in
    broadcastWorker leads to significant performance penalties (from previous
    stress testing). Perhaps making a deep copy of s.Groups map would be a
    better idea?
    --
    You received this message because you are subscribed to a topic in the
    Google Groups "golang-nuts" group.
    To unsubscribe from this topic, visit
    https://groups.google.com/d/topic/golang-nuts/nvRHiNxUob0/unsubscribe.
    To unsubscribe from this group and all its topics, send an email to
    golang-nuts+[email protected]
    <javascript:_e(%7B%7D,'cvml','golang-nuts%[email protected]');>
    .
    For more options, visit https://groups.google.com/d/optout.
    --
    You received this message because you are subscribed to the Google Groups "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to [email protected].
    For more options, visit https://groups.google.com/d/optout.
  • Alec Matusis at Jul 13, 2014 at 7:36 am
    Thanks, it's clear now.

    We are actually not very concerned about the accuracy of the map under a
    high rate of clients joining and leaving (i.e. clients added or removed
    from the map by other goroutines): we would be happy to loop over a map
    which is nearly accurate, but it does not have to be most current at the
    moment the for loop starts, as long as there are no errors that crash the
    server. Is there a way to achieve this without the locks at all, and
    without the associated performance penalties?
    On Jul 13, 2014 12:27 AM, "Alex Skinner" wrote:

    An RWMutex enables you to specify who reads and who writes. If you have
    twenty goroutines that read from a map only, a mutex will make them block
    each other needlessly. An RWMutex allows you to use RLock. RLocks dont
    block other RLocks, so you can have multiple concurrent readers. The
    writers should call lock, which can't happen while an rlock is held(and
    vice versa). So in a system with more readers than writers, an rwmutex
    should have better performance.


    On Sunday, July 13, 2014, Alec Matusis wrote:

    We will try RWMutex: so far we only tried Mutex around the for loop, and
    it incurred significant performance penalty. What is the difference between
    RWMutex and simple Mutex? Is locking for reading like making a deep copy- I
    am not clear on this?

    Thanks
    Alec
    On Saturday, July 12, 2014 4:27:42 PM UTC-7, Alex Skinner wrote:

    Have you tried a RWMutex? Just RLock when you're looping/reading, and
    Lock when writing - allowing for multiple readers but only one writer. I'd
    be curious to know what you find to be faster between this and a copy.

    Thanks,
    Alex
    On Saturday, July 12, 2014 5:22:42 PM UTC-4, Alec Matusis wrote:

    I could not find anything useful Googling "unexpected fault address
    error". I am suspecting a race condition.

    We have a TCP server that loops over a map of large number of clients
    (that frequently connect and disconnect), broadcasting a message. The code
    was compiled with Go 1.3, GOARCH amd64.

    This is a very rare error that happened once under a spiking load:

    1405195804.885505 [debug] Go server: unexpected fault address 0x6d50
    1405195804.890221 [debug] Go server: fatal error: fault
    1405195804.890240 [debug] Go server: [signal 0xb code=0x1 addr=0x6d50
    pc=0x4230ea]
    1405195804.890254 [debug] Go server: goroutine 3927233 [running]:
    1405195804.890268 [debug] Go server: runtime.throw(0x9356e2)
    1405195804.890281 [debug] Go server: /usr/local/go/src/pkg/runtime/panic.c:520
    +0x69 fp=0x2aaaad38fd00
    1405195804.890349 [debug] Go server: sp=0x2aaaad38fce8
    1405195804.890364 [debug] Go server: runtime.sigpanic()
    1405195804.890378 [debug] Go server: /usr/local/go/src/pkg/runtime/os_linux.c:240
    +0x13f fp=0x2aaaad38fd18 sp=0x2aaaad38fd00
    1405195804.890393 [debug] Go server: hash_next(0x2aaaad38ff20)
    1405195804.890406 [debug] Go server: /usr/local/go/src/pkg/runtime/hashmap.goc:707
    +0x50a fp=0x2aaaad38fdb0 sp=0x2aaaad38fd18
    1405195804.890422 [debug] Go server: runtime.mapiternext(
    0x2aaaad38ff20)
    1405195804.890436 [debug] Go server: /usr/local/go/src/pkg/runtime/hashmap.goc:1048
    +0x12 fp=0x2aaaad38fdd8 sp=0x2aaaad38fdb0
    1405195804.890451 [debug] Go server: _/nail/build/imbuild/work/
    tagservgo/server.(*Server).broadcastWorker(0xc208028380, 0xc208055183
    , 0x9, 0x0, 0xc20805518d, 0x247)
    *1405195804.890465 [debug] Go server:
    /nail/build/imbuild/work/tagservgo/server/Server.go:161 +0x374
    fp=0x2aaaad38ff78 sp=0x2aaaad*
    *38fdd8*
    1405195804.890479 [debug] Go server: runtime.goexit()
    1405195804.890492 [debug] Go server: /usr/local/go/src/pkg/runtime/proc.c:1445
    fp=0x2aaaad38ff80 sp=0x2aaaad38ff78
    1405195804.890505 [debug] Go server: created by
    _/nail/build/imbuild/work/tagservgo/server.(*Server).WriteToGroup
    1405195804.890519 [debug] Go server: /nail/build/imbuild/work/tagservgo/server/Server.go:184
    +0xc2
    1405195804.890532 [debug] Go server: goroutine 16 [runnable]:

    The relevant code is:

    158 func (s *Server) broadcastWorker(groupName string, shard int, msg
    string) {
    159 toCtr := 0
    160 numClients := len(s.Groups[groupName][shard])
    *161* for client := range s.Groups[groupName][shard] {
    162 err := s.writeToClient(client, msg)
    163 if err != nil {
    164 s.Debugln("Error in broadcast: ", err)
    165 if strings.Contains(err.Error(), "i/o timeout") {
    166 toCtr++
    167 s.removeClient(client, false)
    168 }
    169 }
    170 }


    181 func (s *Server) WriteToGroup(groupName string, msg string) {
    182 numShards := len(s.Groups[groupName])
    183 for shard := 0; shard < numShards; shard++ {
    184 go s.broadcastWorker(groupName, shard, msg)
    185 }
    186 }

    This map s.Groups[name][shard] is manipulated in different Go routines,
    when the new clients join or leave. I wonder if this error is a consequence
    of a race condition of looping over this map while it is being added to, or
    removed from. If this is the case, putting s.lock() around the for loop in
    broadcastWorker leads to significant performance penalties (from previous
    stress testing). Perhaps making a deep copy of s.Groups map would be a
    better idea?
    --
    You received this message because you are subscribed to a topic in the
    Google Groups "golang-nuts" group.
    To unsubscribe from this topic, visit
    https://groups.google.com/d/topic/golang-nuts/nvRHiNxUob0/unsubscribe.
    To unsubscribe from this group and all its topics, send an email to
    [email protected].
    For more options, visit https://groups.google.com/d/optout.
    --
    You received this message because you are subscribed to the Google Groups "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to [email protected].
    For more options, visit https://groups.google.com/d/optout.
  • Alec Matusis at Jul 13, 2014 at 7:17 am
    We will try RWMutex: so far we only tried Mutex around the for loop, and it
    incurred significant performance penalty. What is the difference between
    RWMutex and simple Mutex? Is locking for reading like making a deep copy- I
    am not clear on this?

    Thanks
    Alec
    On Saturday, July 12, 2014 4:27:42 PM UTC-7, Alex Skinner wrote:

    Have you tried a RWMutex? Just RLock when you're looping/reading, and
    Lock when writing - allowing for multiple readers but only one writer. I'd
    be curious to know what you find to be faster between this and a copy.

    Thanks,
    Alex
    --
    You received this message because you are subscribed to the Google Groups "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to [email protected].
    For more options, visit https://groups.google.com/d/optout.
  • Daniel Eloff at Jul 13, 2014 at 1:20 am
    Go maps are not threadsafe. If you read and write from different go
    routines with GOMAXPROCS > 1 then you need to synchronize access with a
    mutex of some kind.
    On Saturday, July 12, 2014 4:22:42 PM UTC-5, Alec Matusis wrote:

    I could not find anything useful Googling "unexpected fault address
    error". I am suspecting a race condition.

    We have a TCP server that loops over a map of large number of clients
    (that frequently connect and disconnect), broadcasting a message. The code
    was compiled with Go 1.3, GOARCH amd64.

    This is a very rare error that happened once under a spiking load:

    1405195804.885505 [debug] Go server: unexpected fault address 0x6d50
    1405195804.890221 [debug] Go server: fatal error: fault
    1405195804.890240 [debug] Go server: [signal 0xb code=0x1 addr=0x6d50
    pc=0x4230ea]
    1405195804.890254 [debug] Go server: goroutine 3927233 [running]:
    1405195804.890268 [debug] Go server: runtime.throw(0x9356e2)
    1405195804.890281 [debug] Go server:
    /usr/local/go/src/pkg/runtime/panic.c:520 +0x69 fp=0x2aaaad38fd00
    1405195804.890349 [debug] Go server: sp=0x2aaaad38fce8
    1405195804.890364 [debug] Go server: runtime.sigpanic()
    1405195804.890378 [debug] Go server:
    /usr/local/go/src/pkg/runtime/os_linux.c:240 +0x13f fp=0x2aaaad38fd18
    sp=0x2aaaad38fd00
    1405195804.890393 [debug] Go server: hash_next(0x2aaaad38ff20)
    1405195804.890406 [debug] Go server:
    /usr/local/go/src/pkg/runtime/hashmap.goc:707 +0x50a fp=0x2aaaad38fdb0
    sp=0x2aaaad38fd18
    1405195804.890422 [debug] Go server: runtime.mapiternext(0x2aaaad38ff20)
    1405195804.890436 [debug] Go server:
    /usr/local/go/src/pkg/runtime/hashmap.goc:1048 +0x12 fp=0x2aaaad38fdd8
    sp=0x2aaaad38fdb0
    1405195804.890451 [debug] Go server:
    _/nail/build/imbuild/work/tagservgo/server.(*Server).broadcastWorker(0xc208028380,
    0xc208055183
    , 0x9, 0x0, 0xc20805518d, 0x247)
    *1405195804.890465 [debug] Go server:
    /nail/build/imbuild/work/tagservgo/server/Server.go:161 +0x374
    fp=0x2aaaad38ff78 sp=0x2aaaad*
    *38fdd8*
    1405195804.890479 [debug] Go server: runtime.goexit()
    1405195804.890492 [debug] Go server:
    /usr/local/go/src/pkg/runtime/proc.c:1445 fp=0x2aaaad38ff80
    sp=0x2aaaad38ff78
    1405195804.890505 [debug] Go server: created by
    _/nail/build/imbuild/work/tagservgo/server.(*Server).WriteToGroup
    1405195804.890519 [debug] Go server:
    /nail/build/imbuild/work/tagservgo/server/Server.go:184 +0xc2
    1405195804.890532 [debug] Go server: goroutine 16 [runnable]:

    The relevant code is:

    158 func (s *Server) broadcastWorker(groupName string, shard int, msg
    string) {
    159 toCtr := 0
    160 numClients := len(s.Groups[groupName][shard])
    *161* for client := range s.Groups[groupName][shard] {
    162 err := s.writeToClient(client, msg)
    163 if err != nil {
    164 s.Debugln("Error in broadcast: ", err)
    165 if strings.Contains(err.Error(), "i/o timeout") {
    166 toCtr++
    167 s.removeClient(client, false)
    168 }
    169 }
    170 }


    181 func (s *Server) WriteToGroup(groupName string, msg string) {
    182 numShards := len(s.Groups[groupName])
    183 for shard := 0; shard < numShards; shard++ {
    184 go s.broadcastWorker(groupName, shard, msg)
    185 }
    186 }

    This map s.Groups[name][shard] is manipulated in different Go routines,
    when the new clients join or leave. I wonder if this error is a consequence
    of a race condition of looping over this map while it is being added to, or
    removed from. If this is the case, putting s.lock() around the for loop in
    broadcastWorker leads to significant performance penalties (from previous
    stress testing). Perhaps making a deep copy of s.Groups map would be a
    better idea?
    --
    You received this message because you are subscribed to the Google Groups "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to [email protected].
    For more options, visit https://groups.google.com/d/optout.
  • Lars Seipel at Jul 13, 2014 at 3:06 am

    On Sat, Jul 12, 2014 at 06:20:35PM -0700, Daniel Eloff wrote:
    Go maps are not threadsafe. If you read and write from different go
    routines with GOMAXPROCS > 1 then you need to synchronize access
    No. You need to synchronize regardless of the value of GOMAXPROCS. See
    golang.org/ref/mem for the rules governing synchronization between
    goroutines.

    Lars

    --
    You received this message because you are subscribed to the Google Groups "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to [email protected].
    For more options, visit https://groups.google.com/d/optout.
  • Jesse McNelis at Jul 13, 2014 at 7:48 am

    On Sun, Jul 13, 2014 at 11:20 AM, Daniel Eloff wrote:
    Go maps are not threadsafe. If you read and write from different go routines
    with GOMAXPROCS > 1 then you need to synchronize access with a mutex of some
    kind.
    Technically they aren't *goroutine* safe. Regardless of the value of
    GOMAXPROCS at runtime the compiler is free to remove writes to values
    that are only read by other goroutines without synchronisation.



    --
    =====================
    http://jessta.id.au

    --
    You received this message because you are subscribed to the Google Groups "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to [email protected].
    For more options, visit https://groups.google.com/d/optout.
  • Alex Skinner at Jul 13, 2014 at 7:51 am
    In short, no. Using a map like this is a race asking for disaster. Use
    locks, or use copies ( share memory by communicating). This isn't a go
    problem but a design one that just requires a bit of planning.

    On Sunday, July 13, 2014, Alec Matusis <[email protected]
    wrote:
    Thanks, it's clear now.

    We are actually not very concerned about the accuracy of the map under a
    high rate of clients joining and leaving (i.e. clients added or removed
    from the map by other goroutines): we would be happy to loop over a map
    which is nearly accurate, but it does not have to be most current at the
    moment the for loop starts, as long as there are no errors that crash the
    server. Is there a way to achieve this without the locks at all, and
    without the associated performance penalties?
    On Jul 13, 2014 12:27 AM, "Alex Skinner" wrote:

    An RWMutex enables you to specify who reads and who writes. If you have
    twenty goroutines that read from a map only, a mutex will make them block
    each other needlessly. An RWMutex allows you to use RLock. RLocks dont
    block other RLocks, so you can have multiple concurrent readers. The
    writers should call lock, which can't happen while an rlock is held(and
    vice versa). So in a system with more readers than writers, an rwmutex
    should have better performance.


    On Sunday, July 13, 2014, Alec Matusis wrote:

    We will try RWMutex: so far we only tried Mutex around the for loop, and
    it incurred significant performance penalty. What is the difference between
    RWMutex and simple Mutex? Is locking for reading like making a deep copy- I
    am not clear on this?

    Thanks
    Alec
    On Saturday, July 12, 2014 4:27:42 PM UTC-7, Alex Skinner wrote:

    Have you tried a RWMutex? Just RLock when you're looping/reading, and
    Lock when writing - allowing for multiple readers but only one writer. I'd
    be curious to know what you find to be faster between this and a copy.

    Thanks,
    Alex
    On Saturday, July 12, 2014 5:22:42 PM UTC-4, Alec Matusis wrote:

    I could not find anything useful Googling "unexpected fault address
    error". I am suspecting a race condition.

    We have a TCP server that loops over a map of large number of clients
    (that frequently connect and disconnect), broadcasting a message. The code
    was compiled with Go 1.3, GOARCH amd64.

    This is a very rare error that happened once under a spiking load:

    1405195804.885505 [debug] Go server: unexpected fault address 0x6d50
    1405195804.890221 [debug] Go server: fatal error: fault
    1405195804.890240 [debug] Go server: [signal 0xb code=0x1 addr=0x6d50
    pc=0x4230ea]
    1405195804.890254 [debug] Go server: goroutine 3927233 [running]:
    1405195804.890268 [debug] Go server: runtime.throw(0x9356e2)
    1405195804.890281 [debug] Go server: /usr/local/go/src/pkg/runtime/panic.c:520
    +0x69 fp=0x2aaaad38fd00
    1405195804.890349 [debug] Go server: sp=0x2aaaad38fce8
    1405195804.890364 [debug] Go server: runtime.sigpanic()
    1405195804.890378 [debug] Go server: /usr/local/go/src/pkg/runtime/os_linux.c:240
    +0x13f fp=0x2aaaad38fd18 sp=0x2aaaad38fd00
    1405195804.890393 [debug] Go server: hash_next(0x2aaaad38ff20)
    1405195804.890406 [debug] Go server: /usr/local/go/src/pkg/runtime/hashmap.goc:707
    +0x50a fp=0x2aaaad38fdb0 sp=0x2aaaad38fd18
    1405195804.890422 [debug] Go server: runtime.mapiternext(
    0x2aaaad38ff20)
    1405195804.890436 [debug] Go server: /usr/local/go/src/pkg/runtime/hashmap.goc:1048
    +0x12 fp=0x2aaaad38fdd8 sp=0x2aaaad38fdb0
    1405195804.890451 [debug] Go server: _/nail/build/imbuild/work/
    tagservgo/server.(*Server).broadcastWorker(0xc208028380, 0xc208055183
    , 0x9, 0x0, 0xc20805518d, 0x247)
    *1405195804.890465 [debug] Go server:
    /nail/build/imbuild/work/tagservgo/server/Server.go:161 +0x374
    fp=0x2aaaad38ff78 sp=0x2aaaad*
    *38fdd8*
    1405195804.890479 [debug] Go server: runtime.goexit()
    1405195804.890492 [debug] Go server: /usr/local/go/src/pkg/runtime/proc.c:1445
    fp=0x2aaaad38ff80 sp=0x2aaaad38ff78
    1405195804.890505 [debug] Go server: created by
    _/nail/build/imbuild/work/tagservgo/server.(*Server).WriteToGroup
    1405195804.890519 [debug] Go server: /nail/build/imbuild/work/tagservgo/server/Server.go:184
    +0xc2
    1405195804.890532 [debug] Go server: goroutine 16 [runnable]:

    The relevant code is:

    158 func (s *Server) broadcastWorker(groupName string, shard int, msg
    string) {
    159 toCtr := 0
    160 numClients := len(s.Groups[groupName][shard])
    *161* for client := range s.Groups[groupName][shard] {
    162 err := s.writeToClient(client, msg)
    163 if err != nil {
    164 s.Debugln("Error in broadcast: ", err)
    165 if strings.Contains(err.Error(), "i/o timeout") {
    166 toCtr++
    167 s.removeClient(client, false)
    168 }
    169 }
    170 }


    181 func (s *Server) WriteToGroup(groupName string, msg string) {
    182 numShards := len(s.Groups[groupName])
    183 for shard := 0; shard < numShards; shard++ {
    184 go s.broadcastWorker(groupName, shard, msg)
    185 }
    186 }

    This map s.Groups[name][shard] is manipulated in different Go
    routines, when the new clients join or leave. I wonder if this error is a
    consequence of a race condition of looping over this map while it is being
    added to, or removed from. If this is the case, putting s.lock() around the
    for loop in broadcastWorker leads to significant performance penalties
    (from previous stress testing). Perhaps making a deep copy of s.Groups map
    would be a better idea?
    --
    You received this message because you are subscribed to a topic in the
    Google Groups "golang-nuts" group.
    To unsubscribe from this topic, visit
    https://groups.google.com/d/topic/golang-nuts/nvRHiNxUob0/unsubscribe.
    To unsubscribe from this group and all its topics, send an email to
    [email protected].
    For more options, visit https://groups.google.com/d/optout.
    --
    You received this message because you are subscribed to the Google Groups "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to [email protected].
    For more options, visit https://groups.google.com/d/optout.
  • Egon at Jul 13, 2014 at 10:37 am
    Design your code so that you have a single goroutine per group...
    e.g. http://play.golang.org/p/3S8tZFyHed and then communicate with the
    group via channels.

    Also, a few ways to implement concurrent
    maps http://play.golang.org/p/8pJ7gTuuJR

    When implementing concurrent things then run it with the Race Detector
    http://blog.golang.org/race-detector

    + egon
    On Sunday, 13 July 2014 00:22:42 UTC+3, Alec Matusis wrote:

    I could not find anything useful Googling "unexpected fault address
    error". I am suspecting a race condition.

    We have a TCP server that loops over a map of large number of clients
    (that frequently connect and disconnect), broadcasting a message. The code
    was compiled with Go 1.3, GOARCH amd64.

    This is a very rare error that happened once under a spiking load:

    1405195804.885505 [debug] Go server: unexpected fault address 0x6d50
    1405195804.890221 [debug] Go server: fatal error: fault
    1405195804.890240 [debug] Go server: [signal 0xb code=0x1 addr=0x6d50
    pc=0x4230ea]
    1405195804.890254 [debug] Go server: goroutine 3927233 [running]:
    1405195804.890268 [debug] Go server: runtime.throw(0x9356e2)
    1405195804.890281 [debug] Go server:
    /usr/local/go/src/pkg/runtime/panic.c:520 +0x69 fp=0x2aaaad38fd00
    1405195804.890349 [debug] Go server: sp=0x2aaaad38fce8
    1405195804.890364 [debug] Go server: runtime.sigpanic()
    1405195804.890378 [debug] Go server:
    /usr/local/go/src/pkg/runtime/os_linux.c:240 +0x13f fp=0x2aaaad38fd18
    sp=0x2aaaad38fd00
    1405195804.890393 [debug] Go server: hash_next(0x2aaaad38ff20)
    1405195804.890406 [debug] Go server:
    /usr/local/go/src/pkg/runtime/hashmap.goc:707 +0x50a fp=0x2aaaad38fdb0
    sp=0x2aaaad38fd18
    1405195804.890422 [debug] Go server: runtime.mapiternext(0x2aaaad38ff20)
    1405195804.890436 [debug] Go server:
    /usr/local/go/src/pkg/runtime/hashmap.goc:1048 +0x12 fp=0x2aaaad38fdd8
    sp=0x2aaaad38fdb0
    1405195804.890451 [debug] Go server:
    _/nail/build/imbuild/work/tagservgo/server.(*Server).broadcastWorker(0xc208028380,
    0xc208055183
    , 0x9, 0x0, 0xc20805518d, 0x247)
    *1405195804.890465 [debug] Go server:
    /nail/build/imbuild/work/tagservgo/server/Server.go:161 +0x374
    fp=0x2aaaad38ff78 sp=0x2aaaad*
    *38fdd8*
    1405195804.890479 [debug] Go server: runtime.goexit()
    1405195804.890492 [debug] Go server:
    /usr/local/go/src/pkg/runtime/proc.c:1445 fp=0x2aaaad38ff80
    sp=0x2aaaad38ff78
    1405195804.890505 [debug] Go server: created by
    _/nail/build/imbuild/work/tagservgo/server.(*Server).WriteToGroup
    1405195804.890519 [debug] Go server:
    /nail/build/imbuild/work/tagservgo/server/Server.go:184 +0xc2
    1405195804.890532 [debug] Go server: goroutine 16 [runnable]:

    The relevant code is:

    158 func (s *Server) broadcastWorker(groupName string, shard int, msg
    string) {
    159 toCtr := 0
    160 numClients := len(s.Groups[groupName][shard])
    *161* for client := range s.Groups[groupName][shard] {
    162 err := s.writeToClient(client, msg)
    163 if err != nil {
    164 s.Debugln("Error in broadcast: ", err)
    165 if strings.Contains(err.Error(), "i/o timeout") {
    166 toCtr++
    167 s.removeClient(client, false)
    168 }
    169 }
    170 }


    181 func (s *Server) WriteToGroup(groupName string, msg string) {
    182 numShards := len(s.Groups[groupName])
    183 for shard := 0; shard < numShards; shard++ {
    184 go s.broadcastWorker(groupName, shard, msg)
    185 }
    186 }

    This map s.Groups[name][shard] is manipulated in different Go routines,
    when the new clients join or leave. I wonder if this error is a consequence
    of a race condition of looping over this map while it is being added to, or
    removed from. If this is the case, putting s.lock() around the for loop in
    broadcastWorker leads to significant performance penalties (from previous
    stress testing). Perhaps making a deep copy of s.Groups map would be a
    better idea?
    --
    You received this message because you are subscribed to the Google Groups "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to [email protected].
    For more options, visit https://groups.google.com/d/optout.

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupgolang-nuts @
categoriesgo
postedJul 12, '14 at 9:22p
activeJul 13, '14 at 10:37a
posts11
users6
websitegolang.org

People

Translate

site design / logo © 2023 Grokbase