FAQ
I have a very large set of text to analyze, more than 1.5 million lines, so
I need to do it with concurrency.

http://play.golang.org/p/e41O6WjXpV

Testing on machine and Playground look good but somehow when I test it on
virtual machine in travis-ci.org but it fails.
So wonder if the pattern is wrong.

for _, v := range words {
if len(v) < 2 && v != "I" {
continue
}
total := 0
doneslice := []interface{}{}
for i := 0; i < limit; i++ {
done := make(chan int)
doneslice = append(doneslice, done)
}
// perform with multiprocessor
for i := 0; i < limit; i++ {
ch := make(chan int)
done := doneslice[i].(chan int)
go func() {
defer close(done)
defer close(ch)
ChannelCount(text[itdx_start[i]:itdx_end[i]], v, ch)
}()
r := <-ch // I need to wait for each goroutine to return
<-done // I need to wait for each goroutine to return
total = total + r // accumulate the frequency

}
// f := <-ch
// f := strings.Count(text, v)
// without concurrency

result[v] = total
}


What do you think of this?
Do you think the design is correct? Thanks a lot!

--
You received this message because you are subscribed to the Google Groups "golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Search Discussions

  • Steve wang at Feb 12, 2014 at 9:46 am
    go func(i int) {
    defer close(done)
    defer close(ch)
    ChannelCount(text[itdx_start[i]:itdx_end[i]], v, ch)
    }(i)
    On Wednesday, February 12, 2014 5:34:27 PM UTC+8, Gyu-Ho Lee wrote:

    I have a very large set of text to analyze, more than 1.5 million lines,
    so I need to do it with concurrency.

    http://play.golang.org/p/e41O6WjXpV

    Testing on machine and Playground look good but somehow when I test it on
    virtual machine in travis-ci.org but it fails.
    So wonder if the pattern is wrong.

    for _, v := range words {
    if len(v) < 2 && v != "I" {
    continue
    }
    total := 0
    doneslice := []interface{}{}
    for i := 0; i < limit; i++ {
    done := make(chan int)
    doneslice = append(doneslice, done)
    }
    // perform with multiprocessor
    for i := 0; i < limit; i++ {
    ch := make(chan int)
    done := doneslice[i].(chan int)
    go func() {
    defer close(done)
    defer close(ch)
    ChannelCount(text[itdx_start[i]:itdx_end[i]], v, ch)
    }()
    r := <-ch // I need to wait for each goroutine to return
    <-done // I need to wait for each goroutine to return
    total = total + r // accumulate the frequency

    }
    // f := <-ch
    // f := strings.Count(text, v)
    // without concurrency

    result[v] = total
    }


    What do you think of this?
    Do you think the design is correct? Thanks a lot!
    --
    You received this message because you are subscribed to the Google Groups "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/groups/opt_out.
  • Steve wang at Feb 12, 2014 at 9:47 am
    go func(i int) {
    defer close(done)
    defer close(ch)
    ChannelCount(text[itdx_start[i]:itdx_end[i]], v, ch)
    }(i)
    On Wednesday, February 12, 2014 5:34:27 PM UTC+8, Gyu-Ho Lee wrote:

    I have a very large set of text to analyze, more than 1.5 million lines,
    so I need to do it with concurrency.

    http://play.golang.org/p/e41O6WjXpV

    Testing on machine and Playground look good but somehow when I test it on
    virtual machine in travis-ci.org but it fails.
    So wonder if the pattern is wrong.

    for _, v := range words {
    if len(v) < 2 && v != "I" {
    continue
    }
    total := 0
    doneslice := []interface{}{}
    for i := 0; i < limit; i++ {
    done := make(chan int)
    doneslice = append(doneslice, done)
    }
    // perform with multiprocessor
    for i := 0; i < limit; i++ {
    ch := make(chan int)
    done := doneslice[i].(chan int)
    go func() {
    defer close(done)
    defer close(ch)
    ChannelCount(text[itdx_start[i]:itdx_end[i]], v, ch)
    }()
    r := <-ch // I need to wait for each goroutine to return
    <-done // I need to wait for each goroutine to return
    total = total + r // accumulate the frequency

    }
    // f := <-ch
    // f := strings.Count(text, v)
    // without concurrency

    result[v] = total
    }


    What do you think of this?
    Do you think the design is correct? Thanks a lot!
    --
    You received this message because you are subscribed to the Google Groups "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/groups/opt_out.
  • Steve wang at Feb 12, 2014 at 9:49 am
    And It seems that you should wait for the end of each go routine outside
    the second limit loop.
    On Wednesday, February 12, 2014 5:34:27 PM UTC+8, Gyu-Ho Lee wrote:

    I have a very large set of text to analyze, more than 1.5 million lines,
    so I need to do it with concurrency.

    http://play.golang.org/p/e41O6WjXpV

    Testing on machine and Playground look good but somehow when I test it on
    virtual machine in travis-ci.org but it fails.
    So wonder if the pattern is wrong.

    for _, v := range words {
    if len(v) < 2 && v != "I" {
    continue
    }
    total := 0
    doneslice := []interface{}{}
    for i := 0; i < limit; i++ {
    done := make(chan int)
    doneslice = append(doneslice, done)
    }
    // perform with multiprocessor
    for i := 0; i < limit; i++ {
    ch := make(chan int)
    done := doneslice[i].(chan int)
    go func() {
    defer close(done)
    defer close(ch)
    ChannelCount(text[itdx_start[i]:itdx_end[i]], v, ch)
    }()
    r := <-ch // I need to wait for each goroutine to return
    <-done // I need to wait for each goroutine to return
    total = total + r // accumulate the frequency

    }
    // f := <-ch
    // f := strings.Count(text, v)
    // without concurrency

    result[v] = total
    }


    What do you think of this?
    Do you think the design is correct? Thanks a lot!
    --
    You received this message because you are subscribed to the Google Groups "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/groups/opt_out.
  • Egon at Feb 12, 2014 at 9:55 am
    The code is hardly readable... so I would start from there... If your code
    is hard to read, then it's bugs can quite easily hide in there. Also this
    concurrent version probably performs worse than a single threaded version.

    First step... write nice, easy to read version single threaded version.

    Then rewrite the code in a way that doesn't assume you have all of the
    content in the memory at the same time... I know it's quite easy to fit the
    1.5M lines in memory, but such design will make your code better.

    + egon
    On Wednesday, February 12, 2014 11:34:27 AM UTC+2, Gyu-Ho Lee wrote:

    I have a very large set of text to analyze, more than 1.5 million lines,
    so I need to do it with concurrency.

    http://play.golang.org/p/e41O6WjXpV

    Testing on machine and Playground look good but somehow when I test it on
    virtual machine in travis-ci.org but it fails.
    So wonder if the pattern is wrong.

    for _, v := range words {
    if len(v) < 2 && v != "I" {
    continue
    }
    total := 0
    doneslice := []interface{}{}
    for i := 0; i < limit; i++ {
    done := make(chan int)
    doneslice = append(doneslice, done)
    }
    // perform with multiprocessor
    for i := 0; i < limit; i++ {
    ch := make(chan int)
    done := doneslice[i].(chan int)
    go func() {
    defer close(done)
    defer close(ch)
    ChannelCount(text[itdx_start[i]:itdx_end[i]], v, ch)
    }()
    r := <-ch // I need to wait for each goroutine to return
    <-done // I need to wait for each goroutine to return
    total = total + r // accumulate the frequency

    }
    // f := <-ch
    // f := strings.Count(text, v)
    // without concurrency

    result[v] = total
    }


    What do you think of this?
    Do you think the design is correct? Thanks a lot!
    --
    You received this message because you are subscribed to the Google Groups "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/groups/opt_out.
  • Nico at Feb 12, 2014 at 9:55 am

    On 12/02/14 09:34, Gyu-Ho Lee wrote:
    // perform with multiprocessor
    for i := 0; i < limit; i++ {
    ch := make(chan int)
    done := doneslice[i].(chan int)
    go func() {
    defer close(done)
    defer close(ch)
    ChannelCount(text[itdx_start[i]:itdx_end[i]], v, ch)
    }()
    r := <-ch // I need to wait for each goroutine to return
    <-done // I need to wait for each goroutine to return
    I think this code runs sequentially because of the last two lines.

    I'd suggest to use sync.WaitGroup. See http://golang.org/pkg/sync/#WaitGroup

    --
    You received this message because you are subscribed to the Google Groups "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/groups/opt_out.
  • Egon at Feb 12, 2014 at 10:12 am
    e.g. http://play.golang.org/p/xno7gSd3at

    I would expect that version to perform better than your concurrent version.
    Not quite sure whether it returns exactly the same answer, but it should be
    easy to fix with changing the ScanWords function.
    On Wednesday, February 12, 2014 11:34:27 AM UTC+2, Gyu-Ho Lee wrote:

    I have a very large set of text to analyze, more than 1.5 million lines,
    so I need to do it with concurrency.

    http://play.golang.org/p/e41O6WjXpV

    Testing on machine and Playground look good but somehow when I test it on
    virtual machine in travis-ci.org but it fails.
    So wonder if the pattern is wrong.

    for _, v := range words {
    if len(v) < 2 && v != "I" {
    continue
    }
    total := 0
    doneslice := []interface{}{}
    for i := 0; i < limit; i++ {
    done := make(chan int)
    doneslice = append(doneslice, done)
    }
    // perform with multiprocessor
    for i := 0; i < limit; i++ {
    ch := make(chan int)
    done := doneslice[i].(chan int)
    go func() {
    defer close(done)
    defer close(ch)
    ChannelCount(text[itdx_start[i]:itdx_end[i]], v, ch)
    }()
    r := <-ch // I need to wait for each goroutine to return
    <-done // I need to wait for each goroutine to return
    total = total + r // accumulate the frequency

    }
    // f := <-ch
    // f := strings.Count(text, v)
    // without concurrency

    result[v] = total
    }


    What do you think of this?
    Do you think the design is correct? Thanks a lot!
    --
    You received this message because you are subscribed to the Google Groups "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/groups/opt_out.
  • Gyu-Ho Lee at Feb 12, 2014 at 10:15 am
    Thanks a lot, this is more readable. I will rewrite them all.

    --
    You received this message because you are subscribed to the Google Groups "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/groups/opt_out.
  • Carlos Castillo at Feb 12, 2014 at 6:25 pm
    Don't write a function that on it's own determines how many go-routines to
    spawn, and what to set GOMAXPROCS to, use the driving program to determine
    these values (often by using CLI flags and options), set GOMAXPROCS itself,
    and somehow pass to the function the number of goroutines it wants it to
    create.

    The way you have it now, your code obnoxiously assumes that it's the only
    thing the program wants to do, that the program is the only thing that
    matters to the user, and that setting GOMAXPROCS=NumCPU() is always the
    best way to gain performance. None of these assumptions may be true.
    On Wednesday, February 12, 2014 1:34:27 AM UTC-8, Gyu-Ho Lee wrote:

    I have a very large set of text to analyze, more than 1.5 million lines,
    so I need to do it with concurrency.

    http://play.golang.org/p/e41O6WjXpV

    Testing on machine and Playground look good but somehow when I test it on
    virtual machine in travis-ci.org but it fails.
    So wonder if the pattern is wrong.

    for _, v := range words {
    if len(v) < 2 && v != "I" {
    continue
    }
    total := 0
    doneslice := []interface{}{}
    for i := 0; i < limit; i++ {
    done := make(chan int)
    doneslice = append(doneslice, done)
    }
    // perform with multiprocessor
    for i := 0; i < limit; i++ {
    ch := make(chan int)
    done := doneslice[i].(chan int)
    go func() {
    defer close(done)
    defer close(ch)
    ChannelCount(text[itdx_start[i]:itdx_end[i]], v, ch)
    }()
    r := <-ch // I need to wait for each goroutine to return
    <-done // I need to wait for each goroutine to return
    total = total + r // accumulate the frequency

    }
    // f := <-ch
    // f := strings.Count(text, v)
    // without concurrency

    result[v] = total
    }


    What do you think of this?
    Do you think the design is correct? Thanks a lot!
    --
    You received this message because you are subscribed to the Google Groups "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/groups/opt_out.

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupgolang-nuts @
categoriesgo
postedFeb 12, '14 at 9:34a
activeFeb 12, '14 at 6:25p
posts9
users5
websitegolang.org

People

Translate

site design / logo © 2022 Grokbase