FAQ
Hello Go Nuts,

I just hit a big problem with bufio.Reader while trying to read a binary
file that is 2 GB in size.
Because the encoding.binary package is so slow I had to use Reader directly
and then use binary.ByteOrder.Uint32() to convert my bytes read,
When reading with a bufio wrapped file after a while my reading was weirdly
offset but it works when using the file's Read method directly.

I suspect this has something to do with the fact that bufio.Reader
(http://golang.org/src/pkg/bufio/bufio.go?s=830:971#L21) uses an int for r,
while it should use an int64

Can anyone confirm this suspicion so that I can report this as a bug?

Greeting Niklas Schnelle

--

Search Discussions

  • Jan Mercl at Oct 20, 2012 at 8:43 pm

    On Sat, Oct 20, 2012 at 8:22 PM, wrote:
    I just hit a big problem with bufio.Reader while trying to read a binary
    file that is 2 GB in size.
    Because the encoding.binary package is so slow I had to use Reader directly
    and then use binary.ByteOrder.Uint32()
    I would bet your (possibly premature) "optimization" actually slow things down.
    to convert my bytes read,
    When reading with a bufio wrapped file after a while my reading was weirdly
    offset but it works when using the file's Read method directly.

    I suspect this has something to do with the fact that bufio.Reader
    (http://golang.org/src/pkg/bufio/bufio.go?s=830:971#L21) uses an int for r,
    while it should use an int64
    No, int is correct there, int64 would be a bug.
    Can anyone confirm this suspicion so that I can report this as a bug?
    Not a bug AFAICS. Array/slices with more than max 'int' elements do
    not exist in Go. In all current implementations that means 2G elements
    maximum.

    -j

    --
  • Niklas Schnelle at Oct 20, 2012 at 8:57 pm
    Actually that's not a premature optimization as I first tried using
    encoding/binary but it couldn't read in my data within >15 minute.
    Now I'm at 1:45 minutes, including doing some work with the data.
    Also I'm not talking about a big slice.
    I'm using a Reader to Read chunks of ~24 bytes into struct fields, when I
    use File.Read it works perfectly, but when wrapping File in bufio.Reader so
    that I don't read such tiny chunks per syscall it breaks. I will be able to
    post the complete code in a few days when it's actually doing something
    usefull but here is the problematic part. (Tthe bufio is commented out
    obviously and yes there should be more error handling in case the file were
    bad but that's not the point)
    ///////////////////////////////////////////
    file, err := os.OpenFile(fileName, os.O_RDONLY, 0)
    if err != nil {
    fmt.Println(err)
    return nil, err
    }

    defer file.Close()
    //reader := bufio.NewReaderSize(file, 8192)
    reader := file
    var (
    nvertices, nedges int32
    magic, version uint32
    intbuf [4]byte
    edgebuf [4*6]byte
    vertexbuf [4*4]byte
    )

    order := binary.BigEndian
    _, err = reader.Read(intbuf[:])
    magic = order.Uint32(intbuf[:])
    if err != nil {
    fmt.Println(err)
    return nil, err
    }
    fmt.Println("Magic:", magic)
    _, err = reader.Read(intbuf[:])
    version = order.Uint32(intbuf[:])
    if err != nil {
    fmt.Println(err)
    return nil, err
    }
    fmt.Println("Version:", version)

    _, err = reader.Read(intbuf[:])
    nvertices = int32(order.Uint32(intbuf[:]))
    if err != nil {
    fmt.Println(err)
    return nil, err
    }
    _, err = reader.Read(intbuf[:])
    nedges = int32(order.Uint32(intbuf[:]))
    if err != nil {
    fmt.Println(err)
    return nil, err
    }

    fmt.Println(nvertices, " Vertices")
    fmt.Println(nedges, " Edges")

    graph := NewGraph(nvertices)
    for i := 0; i < int(nvertices); i++ {
    _, err = reader.Read(vertexbuf[0:4*4])
    graph.Vertices[i].Lat = int32(order.Uint32(vertexbuf[:5]))
    graph.Vertices[i].Lon = int32(order.Uint32(vertexbuf[4:9]))
    graph.Vertices[i].Height = int32(order.Uint32(vertexbuf[8:13]))
    graph.Vertices[i].Level = int32(order.Uint32(vertexbuf[12:]))
    }
    fmt.Println("Read ", len(graph.Vertices), " vertices")

    var tempEdge Edge
    for i := 0; i < int(nedges); i++ {
    _, err = reader.Read(edgebuf[0:4*6])
    tempEdge.Source = int32(order.Uint32(edgebuf[:5]))
    tempEdge.Target = int32(order.Uint32(edgebuf[4:9]))
    tempEdge.Weight = int32(order.Uint32(edgebuf[8:13]))
    tempEdge.Length = int32(order.Uint32(edgebuf[12:17]))
    tempEdge.SkipA = int32(order.Uint32(edgebuf[16:21]))
    tempEdge.SkipB = int32(order.Uint32(edgebuf[20:]))
    graph.AddEdge(tempEdge, i)
    }
    fmt.Println("Read ", nedges, " edges")

    On Saturday, October 20, 2012 10:43:51 PM UTC+2, Jan Mercl wrote:

    On Sat, Oct 20, 2012 at 8:22 PM, <niklas....@gmail.com <javascript:>>
    wrote:
    I just hit a big problem with bufio.Reader while trying to read a binary
    file that is 2 GB in size.
    Because the encoding.binary package is so slow I had to use Reader directly
    and then use binary.ByteOrder.Uint32()
    I would bet your (possibly premature) "optimization" actually slow things
    down.
    to convert my bytes read,
    When reading with a bufio wrapped file after a while my reading was weirdly
    offset but it works when using the file's Read method directly.

    I suspect this has something to do with the fact that bufio.Reader
    (http://golang.org/src/pkg/bufio/bufio.go?s=830:971#L21) uses an int for r,
    while it should use an int64
    No, int is correct there, int64 would be a bug.
    Can anyone confirm this suspicion so that I can report this as a bug?
    Not a bug AFAICS. Array/slices with more than max 'int' elements do
    not exist in Go. In all current implementations that means 2G elements
    maximum.

    -j
    --
  • Rémy Oudompheng at Oct 20, 2012 at 9:03 pm

    On 2012/10/20 Niklas Schnelle wrote:
    for i := 0; i < int(nvertices); i++ {
    _, err = reader.Read(vertexbuf[0:4*4])
    graph.Vertices[i].Lat = int32(order.Uint32(vertexbuf[:5]))
    graph.Vertices[i].Lon = int32(order.Uint32(vertexbuf[4:9]))
    graph.Vertices[i].Height = int32(order.Uint32(vertexbuf[8:13]))
    graph.Vertices[i].Level = int32(order.Uint32(vertexbuf[12:]))
    }
    You are ignoring the error here, and your slice indices are really
    weird (although they are not incorrect).

    Rémy.

    --
  • Rémy Oudompheng at Oct 20, 2012 at 9:37 pm

    On 2012/10/20 Niklas Schnelle wrote:
    Is there some idiomatic way to check for each consecutive error in a less
    verbose way?
    You are mixing two things in the same question: "idiomatic" and "less
    verbose". What is idiomatic is to check errors. Verbosity is a matter
    of style: you can write more concise code but it will not necessarily
    be more readable.

    Rémy.

    --
  • Niklas Schnelle at Oct 21, 2012 at 1:06 am
    Ok first, thanks again for your help with io.ReadFull thinks work nicely. I
    should have thought about Read() not reading all of it at once but I was so
    confused that bufio acted differently than reading on the file directly. It
    makes sense though, the buffer is so small that a read() syscall can nearly
    always fill it directly but it's simply no guarantee.
    I'm now also checking the erros, actually I had a check in their earlier
    but removed it because I was afraid that I had broken out of the vertex
    loop early and that might have caused the offset I was seeing.
    Ah and about verbose vs idiomatic, most of the time good idiomatic code is
    less verbose than non idiomatic code of the same quality, right?
    On Saturday, October 20, 2012 11:37:32 PM UTC+2, Rémy Oudompheng wrote:
    On 2012/10/20 Niklas Schnelle <niklas....@gmail.com <javascript:>> wrote:
    Is there some idiomatic way to check for each consecutive error in a less
    verbose way?
    You are mixing two things in the same question: "idiomatic" and "less
    verbose". What is idiomatic is to check errors. Verbosity is a matter
    of style: you can write more concise code but it will not necessarily
    be more readable.

    Rémy.
    --
  • Maxim Khitrov at Oct 20, 2012 at 9:11 pm
    Replace your "_, err = reader.Read(buf)" calls with "_, err =
    io.ReadFull(reader, buf)". That should fix the problem. A Read call is
    not guaranteed to fill the provided buffer. In particular, bufio will
    not issue a new Read call to the underlying io.Reader until the buffer
    is empty. If there is just 1 byte available, it will return that byte
    immediately without waiting for additional data.

    - Max

    --
  • Jan Mercl at Oct 20, 2012 at 9:11 pm

    On Sat, Oct 20, 2012 at 10:57 PM, Niklas Schnelle wrote:
    Actually that's not a premature optimization as I first tried using
    encoding/binary but it couldn't read in my data within >15 minute.
    Now I'm at 1:45 minutes, including doing some work with the data.
    Must be rooted elsewhere. encoding/binary uses reflection and
    interface{} typed args to communicate. That is a slowdown compared to
    any other way which avoids that two things.
    Also I'm not talking about a big slice.
    Then it's my fault. Got that impression mistakenly from "trying to
    read a binary file that is 2 GB in size."

    -j

    --
  • Evan Shaw at Oct 20, 2012 at 8:51 pm

    On Sun, Oct 21, 2012 at 7:22 AM, wrote:
    Hello Go Nuts,

    I just hit a big problem with bufio.Reader while trying to read a binary
    file that is 2 GB in size.
    Because the encoding.binary package is so slow I had to use Reader directly
    and then use binary.ByteOrder.Uint32() to convert my bytes read,
    When reading with a bufio wrapped file after a while my reading was weirdly
    offset but it works when using the file's Read method directly.

    I suspect this has something to do with the fact that bufio.Reader
    (http://golang.org/src/pkg/bufio/bufio.go?s=830:971#L21) uses an int for r,
    while it should use an int64

    Can anyone confirm this suspicion so that I can report this as a bug?
    Please share code that reproduces the possible bug. Then it will be
    easier to give an answer.

    - Evan

    --

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupgolang-nuts @
categoriesgo
postedOct 20, '12 at 8:41p
activeOct 21, '12 at 1:06a
posts9
users5
websitegolang.org

People

Translate

site design / logo © 2021 Grokbase