FAQ
Hello
Today I wrote this code

     if wantPretty {
         // Advantage: output is pretty (human readable) thanks to
indentation.
         // Drawback: the whole data transit through a byte buffer.
         buffer, err := json.MarshalIndent(myObjects, "", " ")
         if err != nil {
             return err
         }
         _, err = w.Write(buffer)
     } else {
         // Advantage: encodes (potentially voluminous) data "on-the-fly"
         // Drawback: output is ugly.
         encoder := json.NewEncoder(w)
         err = encoder.Encode(myObjects)
     }
     ...



As I undertand it, *json.MarshalIndent* works on a byte buffer, not on a
stream.
Is there a way to have "the best of both worlds", i.e. print indented JSON
to the *io.Writer*, without needing to store the whole output in a temp
buffer?
If not in standard lib, maybe a third-party?
If not, is there a strong theorical reason making indenting "on-the-fly"
difficult or impossible?

My intuition is that it is possible to handle indentation while streaming,
but I may be wrong.
Thanks for advice!
Valentin

(I know that in my contrived snippet, myObjects are already stored in
memory, but that's not the point and I address this separately).

--
You received this message because you are subscribed to the Google Groups "golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Search Discussions

  • James Bardin at Jan 16, 2015 at 5:31 pm
    The Encoder marshals what you give it all at once. The "streaming" is that
    multiple calls new Encode will write to the same io.Writer (with newlines
    in between).
    // Advantage: encodes (potentially voluminous) data "on-the-fly"
    Nope. (at least for now) the overhead is the same as calling
    Marshal[Indent] yourself.
    http://golang.org/src/encoding/json/stream.go?s=3688:3735#L145

    --
    You received this message because you are subscribed to the Google Groups "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/d/optout.
  • Klaus Post at Jan 16, 2015 at 5:47 pm
    Hi!

    Mostly - the encoded content for a single Encode() call is put in memory,
    and using Indent creates a copy of that, so you have both the un-indented
    and the indented version in memory at the same time.

    Using the "streaming indenter" I posted above you only need the un-indented
    version in memory, since the indentation output is streamed, it will not
    take more than 4k - the default size of a bufio.Writer.

    So at least that cuts memory use in half.

    /Klaus

    On Friday, 16 January 2015 18:31:45 UTC+1, James Bardin wrote:

    The Encoder marshals what you give it all at once. The "streaming" is that
    multiple calls new Encode will write to the same io.Writer (with newlines
    in between).
    // Advantage: encodes (potentially voluminous) data "on-the-fly"
    Nope. (at least for now) the overhead is the same as calling
    Marshal[Indent] yourself.
    http://golang.org/src/encoding/json/stream.go?s=3688:3735#L145
    --
    You received this message because you are subscribed to the Google Groups "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/d/optout.
  • Klaus Post at Jan 16, 2015 at 8:26 pm
    For even more fun, I created an all-streaming version of the encoder:

    https://github.com/klauspost/json/tree/all-stream

    This will enable streaming encoding to any writer, and the content will not
    be kept in memory while it is being encoded (unless of course you use
    Marshall, which returns a buffer).
    All tests pass, except one I had to disable because it used internal state,
    but that is as far as I have gone with the testing.

    Internally it now writes to a bufio.Writer instead of a byte buffer. There
    can be a file or a memory buffer or whatever "behind" that that will
    receive the data. The indenter is in a separate goroutine and uses an
    io.Pipe to get its data from the encoder, so they don't block eachother.


    /Klaus
    On Friday, January 16, 2015 at 6:31:45 PM UTC+1, James Bardin wrote:

    The Encoder marshals what you give it all at once. The "streaming" is that
    multiple calls new Encode will write to the same io.Writer (with newlines
    in between).
    // Advantage: encodes (potentially voluminous) data "on-the-fly"
    Nope. (at least for now) the overhead is the same as calling
    Marshal[Indent] yourself.
    http://golang.org/src/encoding/json/stream.go?s=3688:3735#L145
    --
    You received this message because you are subscribed to the Google Groups "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/d/optout.
  • Val at Jan 16, 2015 at 8:57 pm
    Klaus, thank you for your fantastic reply.
    I was not aware indeed that the data was buffered in all cases as James
    said.
    As soon as I have a couple of hours, out of curiosity I will measure the
    memory consumption of these various encoders.

    To go even further in this topic, I must explain that my code is intended
    to dump my whole database in JSON.
    Very concise as it is a one-liner :
    json.NewEncoder(w).Encode(allMyObjects)
    This is fine for now because the whole data is like 200KB, but it wouldn't
    be practical for, say, 1GB.
    How would you advise me to stream a very large number of objects, not all
    residing in memory at the same time?

    a) Obviously I could chunk myself* by encoding small parts in a loop. Just
    having to make sure to produce valid JSON syntax like [ *chunk1*, *chunk2*,
    ... ]
    b) Or I could use klauspost all-stream
    <https://github.com/klauspost/json/tree/all-stream> for a single encoding
    call, but for objects already in memory (because channels are not supported
    <https://github.com/klauspost/json/blob/master/encode.go#L125>)
    c) Or we could add the channel support to all-stream
    <https://github.com/klauspost/json/tree/all-stream>, as it would be
    basically the same as iterating on a slice?

    * "chunk myself": not sure about the english phrase. Oh, well.
    On Friday, January 16, 2015 at 9:26:51 PM UTC+1, Klaus Post wrote:

    For even more fun, I created an all-streaming version of the encoder:

    https://github.com/klauspost/json/tree/all-stream

    This will enable streaming encoding to any writer, and the content will
    not be kept in memory while it is being encoded (unless of course you use
    Marshall, which returns a buffer).
    All tests pass, except one I had to disable because it used internal
    state, but that is as far as I have gone with the testing.

    Internally it now writes to a bufio.Writer instead of a byte buffer.
    There can be a file or a memory buffer or whatever "behind" that that
    will receive the data. The indenter is in a separate goroutine and uses an
    io.Pipe to get its data from the encoder, so they don't block eachother.


    /Klaus
    On Friday, January 16, 2015 at 6:31:45 PM UTC+1, James Bardin wrote:

    The Encoder marshals what you give it all at once. The "streaming" is
    that multiple calls new Encode will write to the same io.Writer (with
    newlines in between).
    // Advantage: encodes (potentially voluminous) data "on-the-fly"
    Nope. (at least for now) the overhead is the same as calling
    Marshal[Indent] yourself.
    http://golang.org/src/encoding/json/stream.go?s=3688:3735#L145
    --
    You received this message because you are subscribed to the Google Groups "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/d/optout.
  • Klaus Post at Jan 18, 2015 at 7:38 pm
    Hi!
    How would you advise me to stream a very large number of objects, not all
    residing in memory at the same time?

    You would have to do some work manually. Say if you have a Mongo database
    (use mgo), it would be something like this:

    Your output would be like:

    {
      "object type": [ ... array of objects ...],
    }

    For each type, you would need some manual work. It should be fairly easy to
    do:

    func WriteType(name string,db_iter *mgo.Iter, data inteface{}, w io.Writer)
    error {
       // We manually write the type in quotes
       out.Write("\""+type+"\": [")
       var res interface{}
       // Create the encoder to use.
       e := json.NewEncoder(w)

       first := true

       // Iter is a database iterator that returns results one at the time.
       // Calling Next will return the next result.
       for iter.Next(data) {
         if !first {
          out.Write(",")
         }
         json.Encode(data)
         first = false
       }
       out.Write("}")
       // The caller should handle the inserting comma between each.
    }

    Thinking about it, it should be possible to make a library that wrapped
    most of this work.

    An important point is that one solution will not be the best for all use
    cases either way. For 200K, at pure in-memory will be much faster and
    easier than a 1GB case. However, with the streaming, you are lowering that
    by at least a factor of 3-4, since you only need to have your source
    objects in memory, and not a JSON representation, and if you do a much
    smaller split, as described above, you should not be at any risk
    whatsoever, however:

    * It will be slower.
    * It will likely require writes to disk.
    * It takes much more effort to set up.

    I don't see channels contributing in any significant amount - if you write
    a library the receives on a channel you still have to send individual
    items, which you might as well do on a function call.

    If I get some more time, it could be fun to look at a streaming JSON
    library - at least for convenience.


    /Klaus

    On Friday, January 16, 2015 at 9:57:03 PM UTC+1, Val wrote:

    Klaus, thank you for your fantastic reply.
    I was not aware indeed that the data was buffered in all cases as James
    said.
    As soon as I have a couple of hours, out of curiosity I will measure the
    memory consumption of these various encoders.

    To go even further in this topic, I must explain that my code is intended
    to dump my whole database in JSON.
    Very concise as it is a one-liner :
    json.NewEncoder(w).Encode(allMyObjects)
    This is fine for now because the whole data is like 200KB, but it wouldn't
    be practical for, say, 1GB.
    How would you advise me to stream a very large number of objects, not all
    residing in memory at the same time?

    a) Obviously I could chunk myself* by encoding small parts in a loop.
    Just having to make sure to produce valid JSON syntax like [ *chunk1*,
    *chunk2*, ... ]
    b) Or I could use klauspost all-stream
    <https://github.com/klauspost/json/tree/all-stream> for a single
    encoding call, but for objects already in memory (because channels are
    not supported
    <https://github.com/klauspost/json/blob/master/encode.go#L125>)
    c) Or we could add the channel support to all-stream
    <https://github.com/klauspost/json/tree/all-stream>, as it would be
    basically the same as iterating on a slice?

    * "chunk myself": not sure about the english phrase. Oh, well.
    On Friday, January 16, 2015 at 9:26:51 PM UTC+1, Klaus Post wrote:

    For even more fun, I created an all-streaming version of the encoder:

    https://github.com/klauspost/json/tree/all-stream

    This will enable streaming encoding to any writer, and the content will
    not be kept in memory while it is being encoded (unless of course you use
    Marshall, which returns a buffer).
    All tests pass, except one I had to disable because it used internal
    state, but that is as far as I have gone with the testing.

    Internally it now writes to a bufio.Writer instead of a byte buffer.
    There can be a file or a memory buffer or whatever "behind" that that
    will receive the data. The indenter is in a separate goroutine and uses an
    io.Pipe to get its data from the encoder, so they don't block eachother.


    /Klaus
    On Friday, January 16, 2015 at 6:31:45 PM UTC+1, James Bardin wrote:

    The Encoder marshals what you give it all at once. The "streaming" is
    that multiple calls new Encode will write to the same io.Writer (with
    newlines in between).
    // Advantage: encodes (potentially voluminous) data "on-the-fly"
    Nope. (at least for now) the overhead is the same as calling
    Marshal[Indent] yourself.
    http://golang.org/src/encoding/json/stream.go?s=3688:3735#L145
    --
    You received this message because you are subscribed to the Google Groups "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/d/optout.
  • Klaus Post at Jan 16, 2015 at 5:37 pm
    Hi!

    As far as I can tell the indentation can only be done on a byte buffer, so
    there doesn't seem to be any builtin function to do so:

    Just because it seemed fun, I made a fork of the official golang json
    library:

    https://github.com/klauspost/json

    This lets you do:

    err = encoder.EncodeIndent(myObjects, "", " ")

    Extremely untested, so use at your own risk, but I hope it helps you.

    For the people interested, the changes are at :

    https://github.com/klauspost/json/blob/master/indent.go#L158

    https://github.com/klauspost/json/blob/master/stream.go#L185


    /Klaus

    /Klaus


    On Friday, 16 January 2015 16:41:39 UTC+1, Val wrote:

    Hello
    Today I wrote this code

    if wantPretty {
    // Advantage: output is pretty (human readable) thanks to
    indentation.
    // Drawback: the whole data transit through a byte buffer.
    buffer, err := json.MarshalIndent(myObjects, "", " ")
    if err != nil {
    return err
    }
    _, err = w.Write(buffer)
    } else {
    // Advantage: encodes (potentially voluminous) data "on-the-fly"
    // Drawback: output is ugly.
    encoder := json.NewEncoder(w)
    err = encoder.Encode(myObjects)
    }
    ...



    As I undertand it, *json.MarshalIndent* works on a byte buffer, not on a
    stream.
    Is there a way to have "the best of both worlds", i.e. print indented JSON
    to the *io.Writer*, without needing to store the whole output in a temp
    buffer?
    If not in standard lib, maybe a third-party?
    If not, is there a strong theorical reason making indenting "on-the-fly"
    difficult or impossible?

    My intuition is that it is possible to handle indentation while streaming,
    but I may be wrong.
    Thanks for advice!
    Valentin

    (I know that in my contrived snippet, myObjects are already stored in
    memory, but that's not the point and I address this separately).
    --
    You received this message because you are subscribed to the Google Groups "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/d/optout.
  • Larry Clapp at Jan 16, 2015 at 10:11 pm
    Just by the way, note that Indent() (which MarshalIndent() calls) erases
    its output from the buffer if it encounters a parse error. A streaming
    indenter can't do that.

    One would assume that Marshal() would not produce invalid output, so as
    long as you're streaming output from Marshal, you're safe. Streaming
    arbitrary JSON would be dangerous.

    -- L

    On Friday, January 16, 2015 at 10:41:39 AM UTC-5, Val wrote:

    Hello
    Today I wrote this code

    if wantPretty {
    // Advantage: output is pretty (human readable) thanks to
    indentation.
    // Drawback: the whole data transit through a byte buffer.
    buffer, err := json.MarshalIndent(myObjects, "", " ")
    if err != nil {
    return err
    }
    _, err = w.Write(buffer)
    } else {
    // Advantage: encodes (potentially voluminous) data "on-the-fly"
    // Drawback: output is ugly.
    encoder := json.NewEncoder(w)
    err = encoder.Encode(myObjects)
    }
    ...



    As I undertand it, *json.MarshalIndent* works on a byte buffer, not on a
    stream.
    Is there a way to have "the best of both worlds", i.e. print indented JSON
    to the *io.Writer*, without needing to store the whole output in a temp
    buffer?
    If not in standard lib, maybe a third-party?
    If not, is there a strong theorical reason making indenting "on-the-fly"
    difficult or impossible?

    My intuition is that it is possible to handle indentation while streaming,
    but I may be wrong.
    Thanks for advice!
    Valentin

    (I know that in my contrived snippet, myObjects are already stored in
    memory, but that's not the point and I address this separately).
    --
    You received this message because you are subscribed to the Google Groups "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/d/optout.

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupgolang-nuts @
categoriesgo
postedJan 16, '15 at 3:41p
activeJan 18, '15 at 7:38p
posts8
users4
websitegolang.org

People

Translate

site design / logo © 2022 Grokbase