Thanks for taking the time for a thoughtful reply. Comments below.
On Wed, Jan 21, 2015 at 10:10 AM, Ian Lance Taylor wrote:
It's going to be impossible to do this in general. The Go language
has an unsafe.Offsetof function, similar to the C offsetof macro.
Reflection information has a StructField type, and that type has an
Offset field. So it is possible, albeit awkward, to write Go code
that takes a pointer to a struct and accesses the fields using pointer
offsets. We can't break that code. The same issues arise in C, of
I could introduce this as either a new type (other than struct) or as some
sort of option or hint about the struct so that existing code uses the
existing struct layout.
Thanks for that link. That paper describes what I do as "peeling".
However, this statement in the paper is wrong:
"Structure peeling however is not always possible.
For example, for data structures interconnected
by pointers, like link lists and trees, connecting
pointers cannot be separated from the
rest of the data in the struct. In this case the
structure can still be decomposed using additional
pointers to keep its original unity, as described
For peeling to work on data structures with pointers, we have to globally
replace the pointers to the peeled structures with integer indexes, and
replace all star operators with inline function accessor methods (which are
just as fast). This saves space if the index is 32 bits while pointers are
64, and generally runs faster due to cache performance. We also have to
allocate all these fields together in arrays while C structures tend to be
allocated sequentially out of larger blocks of memory that can be scattered
throughout the heap, rather than fully sequential. If Go currently just
allocates structs in a single heap where structs of different types get
intermixed, then this change will also speed it up, even when using the
regular struct layout, since cache performance generally improves when
like-type data is contiguous in memory.
However, that work did not really pan out in GCC. At least for C/C++
code, it turns into a benchmark game: it's applicable to so few real
world cases that you wind up optimizing for benchmarks rather than
actual programs. The result is an optimization that is relatively
expensive in terms of compile time that never actually helps.
They only peeled structures that no one points to. It is a pretty uncommon
for such structures to dominate memory and thrash the cache. Replacing
pointers with integer indexes globally is going to be quite challenging in
a C compiler. In a code generator, it is comparatively easy.
It's never going to make sense to apply this to all structs. It only
makes sense when programs contain loops over large slices of structs
where the loop bodies only access a subset of the fields in the
struct. That is a small subset of real programs.
Programs where speed matters tend to fall into two categories: CPU bound
and cache bound. CPU bound programs are not typically going to be effected
by this change. Cache bound programs, which are very common now days,
generally benefit a great deal from full structure peeling.
I can prove this through benchmarks, and not just a chosen few. Most large
Go programs that uses a lot of memory will do. At my previous job, we
auto-placed and routed large chips. I could switch from regular C
structures and pointers to fully peeled structs and integer indexes with a
compiler flag. Our place and route code on modern Intel hardware ran 60%
faster with fully peeled data. This is not some hand picked benchmark, but
a full 600K line EDA tool flow compiled with gcc.
Given the constraints on when the optimization applies, and the
constraints imposed by type reflection and Offsetof on struct layout
in Go, I don't think this will work as a compiler change.
I agree that this change would break some existing reflection code. Some
sort of addition in syntax, pragmas, flags, or some such is needed to keep
from breaking this code.
My feeling about this kind of thing has long been that it is much more
effective to not try to do this as a compiler optimization, but
instead to do this as a tool that analyzes a program and suggests
areas where it can be improved. In this case, the tool would look for
loop bodies over slices of structs where only a subset of the fields
are accessed. Then it can report a potential optimization: break up
the struct into smaller structs and use slices of those. The
programmer can see this advice and, if appropriate, apply it to the
program. My experience is that it is approximately ten thousand times
easier for the programmer to transform the program safely and
efficiently than it is for the compiler to do so.
Rather than a complex compiler optimization, this style of peeling is a
change in the code generator to use a different data layout. It does not
add much complexity in my experience. It is nothing like what was
described in that paper, where we need to formally prove that struct layout
information has not escaped. This is a simple memory layout, allocation,
and accessor change. It likely would effect many lines of code throughout
the code base, but in a uniform way. Hopefully some awk/sed/grep scripts
can do the job.
My don't I go ahead and hack up a version of Go implementing this change,
so we can test the impact on real-world benchmarks? Which would be a
better compiler to hack for this change, the GCC Go front end or the gc
compiler? I'm such a noob, I don't know where to look for documentation on
these compilers. Is there a wiki I should read, or should I just go read
You received this message because you are subscribed to the Google Groups "golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to email@example.com.
For more options, visit https://groups.google.com/d/optout.