That method takes an io.RuneReader and therefore has no way to "push back"
unread runes. Since, by construction, we know that the index in the reader
is already "gone" by the time it's found, unless you have something
seekable or fully buffered, the index can't be usefully reused with respect
to that same "reader".
Also keep in mind that Go doesn't treat regexp's as the "magic bullet" that
many other languages do. For example, in interpreted languages like Perl,
Python, or Ruby, it's generally going to be much much faster to use regexps
for anything more complex than exact substring searching, whereas in Go,
it's always faster (and sometimes even simpler) to use competently-written
custom algorithms, even for tasks that regexps are "good at." Therefore,
aside from cases where a user supplies search algorithms at runtime,
regexps in Go are just a "convenience," not a "necessity."
On Thursday, September 13, 2012 6:04:58 AM UTC-6, Toni Cárdenas wrote:
Here is a more illustrative test: http://play.golang.org/p/toGyzf5toG
Examining the output, FindReaderIndex seems to take a certain amount of
runes from the buffer and to consume from it until the pattern is found,
but then it doesn't put back the remaining taken runes onto the buffer. I
don't know if this behaviour is to be expected, but if it is, which would
be a more proper way of parsing a file?
I can remake the buffer on each iteration like this:http://play.golang.org/p/fYkIYBlPdx
, but certainly it doesn't look nice.
On Thursday, September 13, 2012 2:54:45 AM UTC+2, speter wrote:
I can't solve your problem, but it seems to me that the issue is that it
consumes more runes, rather than less (otherwise it should give three
matches). (Backquote for regexp doesn't seem to be the problem, `\n` is
legal RE2 syntax, and changing it to double quotes doesn't help either.)
I put it on the Playground for easier testing:http://play.golang.org/p/TFcpAVfy-1
On Thu, Sep 13, 2012 at 8:30 AM, Toni Cárdenas wrote:
I'm trying to parse a file through a regexp, iterating over it using
regexp.FindReaderIndex, but I get unexpected behaviour.
source := bufio.NewReader(sourceFile)
re := regexp.MustCompile(`abc\n`)
Now, I would expect that to output [0 4][0 4][0 4] but instead I get [0
4][1 5] and that's it. It seems that FindReaderIndex doesn't consume that
last \n character from the source stream. Why is that?