FAQ
Reviewers: golang-dev_googlegroups.com,

Message:
Hello golang-dev@googlegroups.com,

I'd like you to review this change to
https://code.google.com/p/go


Description:
bufio: add ReadLines function

API addition proposal, similar to filepath.Walk.

Common enough pattern that I get bored of writing all the
time, reading lines from a file or other input.

Please review this at https://codereview.appspot.com/6870052/

Affected files:
M src/pkg/bufio/bufio.go
M src/pkg/bufio/bufio_test.go


Index: src/pkg/bufio/bufio.go
===================================================================
--- a/src/pkg/bufio/bufio.go
+++ b/src/pkg/bufio/bufio.go
@@ -610,3 +610,29 @@
func NewReadWriter(r *Reader, w *Writer) *ReadWriter {
return &ReadWriter{r, w}
}
+
+// ReadLines reads lines from r and calls fn with each read line.
+// Each line will end in '\n' with the possible of the exception
+// of the final line, which may not end in a newline.
+// The length of line is always at least one.
+// If fn returns a non-nil error, reading ends and the error is
+// returned from ReadLines.
+// When EOF is encountered, ReadLines returns nil.
+func ReadLines(r io.Reader, fn func(line string) error) error {
+ br := NewReader(r)
+ for {
+ line, err := br.ReadString('\n')
+ if err != nil && err != io.EOF {
+ return err
+ }
+ if len(line) > 0 {
+ if err := fn(line); err != nil {
+ return err
+ }
+ }
+ if err == io.EOF {
+ break
+ }
+ }
+ return nil
+}
Index: src/pkg/bufio/bufio_test.go
===================================================================
--- a/src/pkg/bufio/bufio_test.go
+++ b/src/pkg/bufio/bufio_test.go
@@ -7,6 +7,7 @@
import (
. "bufio"
"bytes"
+ "errors"
"fmt"
"io"
"io/ioutil"
@@ -930,6 +931,53 @@
}
}

+type readLinesTest struct {
+ in string
+ want string
+ wanterr string
+}
+
+var readLinesTests = []readLinesTest{
+ {
+ in: "foo\nbar\n",
+ want: "1[foo\n]2[bar\n]",
+ },
+ {
+ in: "foo\nbar\npartial",
+ want: "1[foo\n]2[bar\n]3[partial]",
+ },
+ {
+ in: "foo\nERR:myerr\nlast\n",
+ want: "1[foo\n]2[ERR:myerr\n]",
+ wanterr: "myerr",
+ },
+}
+
+func TestReadLines(t *testing.T) {
+ for _, tt := range readLinesTests {
+ var buf bytes.Buffer
+ var n = 0
+ err := ReadLines(strings.NewReader(tt.in), func(line string) error {
+ n++
+ fmt.Fprintf(&buf, "%d[%s]", n, line)
+ if strings.HasPrefix(line, "ERR:") {
+ return errors.New(strings.TrimSpace(line[len("ERR:"):]))
+ }
+ return nil
+ })
+ if got := buf.String(); got != tt.want {
+ t.Errorf("for %q got %q; want %q", tt.in, got, tt.want)
+ }
+ var errstr string
+ if err != nil {
+ errstr = err.Error()
+ }
+ if errstr != tt.wanterr {
+ t.Errorf("for %q got error %q; want error %q", tt.in, errstr,
tt.wanterr)
+ }
+ }
+}
+
// A writeCountingDiscard is like ioutil.Discard and counts the number of
times
// Write is called on it.
type writeCountingDiscard int

Search Discussions

  • Dave at Dec 3, 2012 at 9:43 pm

    On 2012/12/03 20:08:42, bradfitz wrote:
    Hello mailto:golang-dev@googlegroups.com,
    I'd like you to review this change to
    https://code.google.com/p/go
    This looks reasonable to me. Are you stuck on the name ? Maybe EachLine
    or WithLines might be a better name as this isn't reading anything, in
    the way an io.Reader reads things.

    https://codereview.appspot.com/6870052/
  • Brad Fitzpatrick at Dec 3, 2012 at 9:45 pm

    On Mon, Dec 3, 2012 at 1:43 PM, wrote:
    On 2012/12/03 20:08:42, bradfitz wrote:

    Hello mailto:golang-dev@**googlegroups.com <golang-dev@googlegroups.com>,
    I'd like you to review this change to
    https://code.google.com/p/go
    This looks reasonable to me. Are you stuck on the name ? Maybe EachLine
    or WithLines might be a better name as this isn't reading anything, in
    the way an io.Reader reads things

    I have no preference on naming.
  • Nigeltao at Dec 3, 2012 at 10:38 pm
    FWIW, I like the general idea.


    https://codereview.appspot.com/6870052/diff/2001/src/pkg/bufio/bufio.go
    File src/pkg/bufio/bufio.go (right):

    https://codereview.appspot.com/6870052/diff/2001/src/pkg/bufio/bufio.go#newcode621
    src/pkg/bufio/bufio.go:621: func ReadLines(r io.Reader, fn func(line
    string) error) error {
    s/string/[]byte/ ??

    https://codereview.appspot.com/6870052/diff/2001/src/pkg/bufio/bufio.go#newcode629
    src/pkg/bufio/bufio.go:629: if err := fn(line); err != nil {
    Maybe s/err/err1/ to be clear that it's a different variable.

    https://codereview.appspot.com/6870052/
  • Bradfitz at Dec 3, 2012 at 10:49 pm
    https://codereview.appspot.com/6870052/diff/2001/src/pkg/bufio/bufio.go
    File src/pkg/bufio/bufio.go (right):

    https://codereview.appspot.com/6870052/diff/2001/src/pkg/bufio/bufio.go#newcode621
    src/pkg/bufio/bufio.go:621: func ReadLines(r io.Reader, fn func(line
    string) error) error {
    On 2012/12/03 22:38:00, nigeltao wrote:
    s/string/[]byte/ ??
    I started with []byte but then once I saw that ReadBytes() was already
    doing copies, I stopped caring.

    I thought ReadBytes' common case was to return internal memory like
    ReadSlice and ReadLine, and only allocate when a line was too long.
    Once I saw that it was always allocating, I figured returning a string
    was fine, and required less docs.

    I'd be fine making it []byte, but I'd want to document it to be more
    like ReadSlice: just a view of internal data. And then make that be the
    common case, but only allocate when lines are too long.

    Thoughts?

    https://codereview.appspot.com/6870052/diff/2001/src/pkg/bufio/bufio.go#newcode629
    src/pkg/bufio/bufio.go:629: if err := fn(line); err != nil {
    On 2012/12/03 22:38:00, nigeltao wrote:
    Maybe s/err/err1/ to be clear that it's a different variable.
    Sure, will do if others like this proposal.

    https://codereview.appspot.com/6870052/
  • Nigeltao at Dec 3, 2012 at 10:52 pm
    https://codereview.appspot.com/6870052/diff/2001/src/pkg/bufio/bufio.go
    File src/pkg/bufio/bufio.go (right):

    https://codereview.appspot.com/6870052/diff/2001/src/pkg/bufio/bufio.go#newcode621
    src/pkg/bufio/bufio.go:621: func ReadLines(r io.Reader, fn func(line
    string) error) error {
    On 2012/12/03 22:49:32, bradfitz wrote:
    I'd be fine making it []byte, but I'd want to document it to be more like
    ReadSlice: just a view of internal data. And then make that be the
    common case,
    but only allocate when lines are too long.
    Thoughts?
    SGTM.

    https://codereview.appspot.com/6870052/
  • Adg at Dec 3, 2012 at 11:22 pm
    I don't personally care about this API. No objections, though.


    https://codereview.appspot.com/6870052/diff/2001/src/pkg/bufio/bufio.go
    File src/pkg/bufio/bufio.go (right):

    https://codereview.appspot.com/6870052/diff/2001/src/pkg/bufio/bufio.go#newcode628
    src/pkg/bufio/bufio.go:628: if len(line) > 0 {
    Why not report blank lines? some users might care.

    https://codereview.appspot.com/6870052/
  • Bradfitz at Dec 3, 2012 at 11:25 pm
    https://codereview.appspot.com/6870052/diff/2001/src/pkg/bufio/bufio.go
    File src/pkg/bufio/bufio.go (right):

    https://codereview.appspot.com/6870052/diff/2001/src/pkg/bufio/bufio.go#newcode628
    src/pkg/bufio/bufio.go:628: if len(line) > 0 {
    On 2012/12/03 23:22:27, adg wrote:
    Why not report blank lines? some users might care.
    That's not what this is about. "\n" is returned, but not a "" in ("",
    io.EOF).

    https://codereview.appspot.com/6870052/
  • Andrew Gerrand at Dec 4, 2012 at 12:37 am
    Ah, I see. Thanks.

    On 4 December 2012 10:25, wrote:


    https://codereview.appspot.**com/6870052/diff/2001/src/pkg/**
    bufio/bufio.go<https://codereview.appspot.com/6870052/diff/2001/src/pkg/bufio/bufio.go>
    File src/pkg/bufio/bufio.go (right):

    https://codereview.appspot.**com/6870052/diff/2001/src/pkg/**
    bufio/bufio.go#newcode628<https://codereview.appspot.com/6870052/diff/2001/src/pkg/bufio/bufio.go#newcode628>
    src/pkg/bufio/bufio.go:628: if len(line) > 0 {
    On 2012/12/03 23:22:27, adg wrote:

    Why not report blank lines? some users might care.
    That's not what this is about. "\n" is returned, but not a "" in ("",
    io.EOF).

    https://codereview.appspot.**com/6870052/<https://codereview.appspot.com/6870052/>
  • Rogpeppe at Dec 4, 2012 at 11:43 am

    On 2012/12/03 20:08:42, bradfitz wrote:
    Hello mailto:golang-dev@googlegroups.com,
    I'd like you to review this change to
    https://code.google.com/p/go
    I am not keen on this.

    - ReadLines sounds like it might use ReadLine to get its lines (and
    hence understand about \r too) but it doesn't.

    - You almost always want to strip off the final newline, and this
    doesn't do that for you.

    - As with any iterator function, breaking out of the loop early is
    awkward - you can't just call "break" or "return".


    The main reason for this to exist is, as far as I can tell, the fact
    that ReadLine is so awkward to use.

    If reading lines was as simple as:

    for {
    line, err := b.ReadLine()
    if err != nil {
    break
    }
    doSomething(line)
    }

    then would we consider ReadLines worth it?

    Here are some aspects of the problem that the various methods in
    bufio.Reader try to address:

    1) Efficiency: can we read lines without incurring an allocation and
    copy for each line?
    2) Correctness: do we know if the final line is correctly terminated?
    Can we deal with very long lines? Do we interpret \r\n terminators
    correctly?
    3) Ease of use: how hard do we have to think when writing a loop that
    reads lines?
    4) Robustness: can a file with a very long line exhaust our memory?

    For instance, ReadSlice does well on 1) and 4), but fails on 2) (it
    can't deal with \r\n terminated lines) and 3) (it's awkward to use
    correctly).

    I suggest that we can't have all these things together. I believe one
    particular awkwardness comes from trying to deal with an unterminated
    final newline correctly. ReadLine doesn't do so (there's no way of
    telling that the final newline is unterminated) and I think that's a
    reasonable approach.

    The other awkwardness comes from desiring efficiency in the usual case,
    but not falling over on very long lines. We've got ReadSlice (and
    ReadLine) but if we want to deal with long lines, we need to make the
    buffer at least as big as the longest line we want to be able to read,
    and that's unnecessary overhead in the common case. But if we use
    ReadString or ReadBytes, we're vulnerable to maliciousness via large
    files with no line terminators.

    Thinking about it a bit more, perhaps ReadLines *is* the way forward.
    Because the line is an argument to the function, there is no expectation
    that it should be ok to use after the function returns, so passing it a
    []byte returned from ReadSlice or ReadLine seems just fine.

    How about something like this?

    // ReadLines reads lines from r and calls fn with each read line, not
    // including the line terminator. If a line exceeds the given maximum
    // size, it will be truncated and the rest of the line discarded. If fn
    // returns a non-nil error, reading ends and the error is returned from
    // ReadLines. When EOF is encountered, ReadLines returns nil.
    func ReadLines(r io.Reader, maxSize int, fn func(line []byte) error)
    error

    http://play.golang.org/p/Y65r-cKEDX

    It seems to tick most of the above boxes (you can't tell if you've got
    an unterminated newline, but that fits with ReadLine's behaviour).

    Open question: should a too-long line be discarded, or trigger an error?


    https://codereview.appspot.com/6870052/
  • Brad Fitzpatrick at Dec 4, 2012 at 2:55 pm
    It's been requested that we revisit this issue (too awkward to read lines)
    in a couple months. Let's focus on Go 1.1 bugs first...
    On Tue, Dec 4, 2012 at 3:43 AM, wrote:
    On 2012/12/03 20:08:42, bradfitz wrote:

    Hello mailto:golang-dev@**googlegroups.com <golang-dev@googlegroups.com>,
    I'd like you to review this change to
    https://code.google.com/p/go
    I am not keen on this.

    - ReadLines sounds like it might use ReadLine to get its lines (and
    hence understand about \r too) but it doesn't.

    - You almost always want to strip off the final newline, and this
    doesn't do that for you.

    - As with any iterator function, breaking out of the loop early is
    awkward - you can't just call "break" or "return".


    The main reason for this to exist is, as far as I can tell, the fact
    that ReadLine is so awkward to use.

    If reading lines was as simple as:

    for {
    line, err := b.ReadLine()
    if err != nil {
    break
    }
    doSomething(line)
    }

    then would we consider ReadLines worth it?

    Here are some aspects of the problem that the various methods in
    bufio.Reader try to address:

    1) Efficiency: can we read lines without incurring an allocation and
    copy for each line?
    2) Correctness: do we know if the final line is correctly terminated?
    Can we deal with very long lines? Do we interpret \r\n terminators
    correctly?
    3) Ease of use: how hard do we have to think when writing a loop that
    reads lines?
    4) Robustness: can a file with a very long line exhaust our memory?

    For instance, ReadSlice does well on 1) and 4), but fails on 2) (it
    can't deal with \r\n terminated lines) and 3) (it's awkward to use
    correctly).

    I suggest that we can't have all these things together. I believe one
    particular awkwardness comes from trying to deal with an unterminated
    final newline correctly. ReadLine doesn't do so (there's no way of
    telling that the final newline is unterminated) and I think that's a
    reasonable approach.

    The other awkwardness comes from desiring efficiency in the usual case,
    but not falling over on very long lines. We've got ReadSlice (and
    ReadLine) but if we want to deal with long lines, we need to make the
    buffer at least as big as the longest line we want to be able to read,
    and that's unnecessary overhead in the common case. But if we use
    ReadString or ReadBytes, we're vulnerable to maliciousness via large
    files with no line terminators.

    Thinking about it a bit more, perhaps ReadLines *is* the way forward.
    Because the line is an argument to the function, there is no expectation
    that it should be ok to use after the function returns, so passing it a
    []byte returned from ReadSlice or ReadLine seems just fine.

    How about something like this?

    // ReadLines reads lines from r and calls fn with each read line, not
    // including the line terminator. If a line exceeds the given maximum
    // size, it will be truncated and the rest of the line discarded. If fn
    // returns a non-nil error, reading ends and the error is returned from
    // ReadLines. When EOF is encountered, ReadLines returns nil.
    func ReadLines(r io.Reader, maxSize int, fn func(line []byte) error)
    error

    http://play.golang.org/p/Y65r-**cKEDX<http://play.golang.org/p/Y65r-cKEDX>

    It seems to tick most of the above boxes (you can't tell if you've got
    an unterminated newline, but that fits with ReadLine's behaviour).

    Open question: should a too-long line be discarded, or trigger an error?


    https://codereview.appspot.**com/6870052/<https://codereview.appspot.com/6870052/>

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupgolang-dev @
categoriesgo
postedDec 3, '12 at 8:08p
activeDec 4, '12 at 2:55p
posts11
users5
websitegolang.org

People

Translate

site design / logo © 2022 Grokbase