across ~200,000 files (adding up to ~4gb atm). Currently this is written as
a shell script using git grep + perl + xargs + lockfile.
The source is here: https://github.com/daaku/pie
The current implementation has a simple model:
spawn runtime.NumCPU() worker goroutines all listening on work chan
filewalk tree, sending each file on the worker channel
Additionally since most files don't match, I'm optimizing for that case by
using gommap.
I'm running this on a beefy 72gb/24 core machine, and while the process
starts off well using all cores split across the threads, it falls off
after some time, slows down and ends up using just 1-2 cores for a long
period of time. After some time, it is again using all the cores, and
eventually the cycle repeats.
My guess was this is because it's IO bound, but that seems shouldn't be the
case because reading _all_ of those 4gb of files takes less time than one
of the cycles where it's only using 1-2 cores. This makes sense too as
running 1,000s of regexps seems like it should end up being CPU/memory
bound.
In profiling it I see the regexp logic dominating. That's what confuses me
though, because if that is the case then shouldn't it be using all the
cores most of the time?
Any insight or pointers would be much appreciated.
Thanks,
--
-Naitik
--