Actually that's not a premature optimization as I first tried using
encoding/binary but it couldn't read in my data within >15 minute.
Now I'm at 1:45 minutes, including doing some work with the data.
Also I'm not talking about a big slice.
I'm using a Reader to Read chunks of ~24 bytes into struct fields, when I
use File.Read it works perfectly, but when wrapping File in bufio.Reader so
that I don't read such tiny chunks per syscall it breaks. I will be able to
post the complete code in a few days when it's actually doing something
usefull but here is the problematic part. (Tthe bufio is commented out
obviously and yes there should be more error handling in case the file were
bad but that's not the point)
///////////////////////////////////////////
file, err := os.OpenFile(fileName, os.O_RDONLY, 0)
if err != nil {
fmt.Println(err)
return nil, err
}
defer file.Close()
//reader := bufio.NewReaderSize(file, 8192)
reader := file
var (
nvertices, nedges int32
magic, version uint32
intbuf [4]byte
edgebuf [4*6]byte
vertexbuf [4*4]byte
)
order := binary.BigEndian
_, err = reader.Read(intbuf[:])
magic = order.Uint32(intbuf[:])
if err != nil {
fmt.Println(err)
return nil, err
}
fmt.Println("Magic:", magic)
_, err = reader.Read(intbuf[:])
version = order.Uint32(intbuf[:])
if err != nil {
fmt.Println(err)
return nil, err
}
fmt.Println("Version:", version)
_, err = reader.Read(intbuf[:])
nvertices = int32(order.Uint32(intbuf[:]))
if err != nil {
fmt.Println(err)
return nil, err
}
_, err = reader.Read(intbuf[:])
nedges = int32(order.Uint32(intbuf[:]))
if err != nil {
fmt.Println(err)
return nil, err
}
fmt.Println(nvertices, " Vertices")
fmt.Println(nedges, " Edges")
graph := NewGraph(nvertices)
for i := 0; i < int(nvertices); i++ {
_, err = reader.Read(vertexbuf[0:4*4])
graph.Vertices[i].Lat = int32(order.Uint32(vertexbuf[:5]))
graph.Vertices[i].Lon = int32(order.Uint32(vertexbuf[4:9]))
graph.Vertices[i].Height = int32(order.Uint32(vertexbuf[8:13]))
graph.Vertices[i].Level = int32(order.Uint32(vertexbuf[12:]))
}
fmt.Println("Read ", len(graph.Vertices), " vertices")
var tempEdge Edge
for i := 0; i < int(nedges); i++ {
_, err = reader.Read(edgebuf[0:4*6])
tempEdge.Source = int32(order.Uint32(edgebuf[:5]))
tempEdge.Target = int32(order.Uint32(edgebuf[4:9]))
tempEdge.Weight = int32(order.Uint32(edgebuf[8:13]))
tempEdge.Length = int32(order.Uint32(edgebuf[12:17]))
tempEdge.SkipA = int32(order.Uint32(edgebuf[16:21]))
tempEdge.SkipB = int32(order.Uint32(edgebuf[20:]))
graph.AddEdge(tempEdge, i)
}
fmt.Println("Read ", nedges, " edges")
On Saturday, October 20, 2012 10:43:51 PM UTC+2, Jan Mercl wrote:On Sat, Oct 20, 2012 at 8:22 PM, <niklas....@gmail.com <javascript:>>
wrote:
I just hit a big problem with bufio.Reader while trying to read a binary
file that is 2 GB in size.
Because the encoding.binary package is so slow I had to use Reader directly
and then use binary.ByteOrder.Uint32()
I would bet your (possibly premature) "optimization" actually slow things
down.
to convert my bytes read,
When reading with a bufio wrapped file after a while my reading was weirdly
offset but it works when using the file's Read method directly.
I suspect this has something to do with the fact that bufio.Reader
(http://golang.org/src/pkg/bufio/bufio.go?s=830:971#L21) uses an int for r,
while it should use an int64
No, int is correct there, int64 would be a bug.
Can anyone confirm this suspicion so that I can report this as a bug?
Not a bug AFAICS. Array/slices with more than max 'int' elements do
not exist in Go. In all current implementations that means 2G elements
maximum.
-j
--