FAQ
Hi,

I get a large JSON array data dump for one of my projects and am trying to
process it in Go. Due to the size of the file, I was hoping to not load the
entire doc into memory.

[
   {"k": "v"},
   {"k": "v"},

   ...
]

Is there a good way to read in these objects one by one? I was hoping to
use json.Decoder, I tried following the example but get errors since my
dataset has is preceded by [ and delimited by ,.

Is there a stdlib option I’m missing to do this or is my best bet to
eliminate all [ and commas preceded by } and then parse it with a
json.decoder?

Nate


--
You received this message because you are subscribed to the Google Groups "golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Search Discussions

  • Henry Bubert at Aug 3, 2014 at 12:48 am
    if the json data is correctly formatted (i.e. the last comma before the `]`
    is left out) you should be fine with `encoding/json`. To not read all the
    data into ram, youse `json.NewDecoder` and pass it the file/network
    connection you are getting the json from.

    this might help to get started http://play.golang.org/p/J4IiG4MYfQ


    kind regards,

    Henry

    Am Sonntag, 3. August 2014 01:11:04 UTC+2 schrieb Nate Brennand:
    Hi,

    I get a large JSON array data dump for one of my projects and am trying to
    process it in Go. Due to the size of the file, I was hoping to not load the
    entire doc into memory.

    [
    {"k": "v"},
    {"k": "v"},

    ...
    ]

    Is there a good way to read in these objects one by one? I was hoping to
    use json.Decoder, I tried following the example but get errors since my
    dataset has is preceded by [ and delimited by ,.

    Is there a stdlib option I’m missing to do this or is my best bet to
    eliminate all [ and commas preceded by } and then parse it with a
    json.decoder?

    Nate
    --
    You received this message because you are subscribed to the Google Groups "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/d/optout.
  • Nate Brennand at Aug 3, 2014 at 3:32 am
    This solution still seems to create a long array object containing every
    object in the file/gob.
    I'm hoping to find a solution where I'm processing them one a time. As in,
    neither the whole file nor the whole dataset is ever held in memory at one
    point.

    On Sat, Aug 2, 2014 at 5:47 PM, Henry Bubert wrote:

    if the json data is correctly formatted (i.e. the last comma before the
    `]` is left out) you should be fine with `encoding/json`. To not read all
    the data into ram, youse `json.NewDecoder` and pass it the file/network
    connection you are getting the json from.

    this might help to get started http://play.golang.org/p/J4IiG4MYfQ


    kind regards,

    Henry

    Am Sonntag, 3. August 2014 01:11:04 UTC+2 schrieb Nate Brennand:
    Hi,

    I get a large JSON array data dump for one of my projects and am trying
    to process it in Go. Due to the size of the file, I was hoping to not load
    the entire doc into memory.

    [
    {"k": "v"},
    {"k": "v"},

    ...
    ]

    Is there a good way to read in these objects one by one? I was hoping to
    use json.Decoder, I tried following the example but get errors since my
    dataset has is preceded by [ and delimited by ,.

    Is there a stdlib option I’m missing to do this or is my best bet to
    eliminate all [ and commas preceded by } and then parse it with a
    json.decoder?

    Nate
    --
    You received this message because you are subscribed to the Google Groups
    "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send an
    email to golang-nuts+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/d/optout.
    --
    You received this message because you are subscribed to the Google Groups "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/d/optout.
  • Dan Kortschak at Aug 3, 2014 at 3:50 am
    Try making the elements json.Unmarshaler satisfying, then in that UnmarshalJSON method do the processing that you need to do and discard the contents.

    On 03/08/2014, at 1:03 PM, "Nate Brennand" wrote:

    This solution still seems to create a long array object containing every object in the file/gob.
    I'm hoping to find a solution where I'm processing them one a time. As in, neither the whole file nor the whole dataset is ever held in memory at one point.


    On Sat, Aug 2, 2014 at 5:47 PM, Henry Bubert wrote:
    if the json data is correctly formatted (i.e. the last comma before the `]` is left out) you should be fine with `encoding/json`. To not read all the data into ram, youse `json.NewDecoder` and pass it the file/network connection you are getting the json from.

    this might help to get started http://play.golang.org/p/J4IiG4MYfQ


    kind regards,

    Henry

    Am Sonntag, 3. August 2014 01:11:04 UTC+2 schrieb Nate Brennand:

    Hi,

    I get a large JSON array data dump for one of my projects and am trying to process it in Go. Due to the size of the file, I was hoping to not load the entire doc into memory.

    [
       {"k": "v"},
       {"k": "v"},

       ...
    ]


    Is there a good way to read in these objects one by one? I was hoping to use json.Decoder, I tried following the example but get errors since my dataset has is preceded by [ and delimited by ,.

    Is there a stdlib option I’m missing to do this or is my best bet to eliminate all [ and commas preceded by } and then parse it with a json.decoder?

    Nate



    --
    You received this message because you are subscribed to the Google Groups "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscribe@googlegroups.com .
    For more options, visit https://groups.google.com/d/optout.


    --
    You received this message because you are subscribed to the Google Groups "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscribe@googlegroups.com .
    For more options, visit https://groups.google.com/d/optout.

    --
    You received this message because you are subscribed to the Google Groups "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/d/optout.
  • Matt Harden at Aug 3, 2014 at 8:33 pm
    The json package will always decode an entire JSON value at once. That
    means that even if you use json.Decoder or implement json.Unmarshaler, it
    will still read the entire array and then process it. So the only way to
    make the reading and processing of the array elements concurrent is to
    implement the parsing of the JSON array yourself. Here is an example.
    http://play.golang.org/p/mOmFZ-YLqE

    On Sat, Aug 2, 2014 at 10:50 PM, Dan Kortschak wrote:

    Try making the elements json.Unmarshaler satisfying, then in that
    UnmarshalJSON method do the processing that you need to do and discard the
    contents.

    On 03/08/2014, at 1:03 PM, "Nate Brennand" wrote:

    This solution still seems to create a long array object containing
    every object in the file/gob.
    I'm hoping to find a solution where I'm processing them one a time. As in,
    neither the whole file nor the whole dataset is ever held in memory at one
    point.

    On Sat, Aug 2, 2014 at 5:47 PM, Henry Bubert wrote:

    if the json data is correctly formatted (i.e. the last comma before the
    `]` is left out) you should be fine with `encoding/json`. To not read all
    the data into ram, youse `json.NewDecoder` and pass it the file/network
    connection you are getting the json from.

    this might help to get started http://play.golang.org/p/J4IiG4MYfQ


    kind regards,

    Henry

    Am Sonntag, 3. August 2014 01:11:04 UTC+2 schrieb Nate Brennand:
    Hi,

    I get a large JSON array data dump for one of my projects and am trying
    to process it in Go. Due to the size of the file, I was hoping to not load
    the entire doc into memory.

    [
    {"k": "v"},
    {"k": "v"},

    ...
    ]

    Is there a good way to read in these objects one by one? I was hoping to
    use json.Decoder, I tried following the example but get errors since my
    dataset has is preceded by [ and delimited by ,.

    Is there a stdlib option I’m missing to do this or is my best bet to
    eliminate all [ and commas preceded by } and then parse it with a
    json.decoder?

    Nate
    --
    You received this message because you are subscribed to the Google Groups
    "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send an
    email to golang-nuts+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/d/optout.
    --
    You received this message because you are subscribed to the Google Groups
    "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send an
    email to golang-nuts+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/d/optout.

    --
    You received this message because you are subscribed to the Google Groups
    "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send an
    email to golang-nuts+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/d/optout.
    --
    You received this message because you are subscribed to the Google Groups "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/d/optout.
  • Nate Brennand at Aug 3, 2014 at 10:03 pm
    Thanks Matt. That’s a clever way to read the extra byte to get the comma
    and works just as desired. I should’ve examined Decoder‘s methods more.

    What is the reasoning behind combining the decoder’s buffer and the input
    reader with MultiReader? It seems to work as desired by just passing the
    decoder into ReadByteSkippingString().


    On Sun, Aug 3, 2014 at 1:33 PM, Matt Harden wrote:

    The json package will always decode an entire JSON value at once. That
    means that even if you use json.Decoder or implement json.Unmarshaler, it
    will still read the entire array and then process it. So the only way to
    make the reading and processing of the array elements concurrent is to
    implement the parsing of the JSON array yourself. Here is an example.
    http://play.golang.org/p/mOmFZ-YLqE


    On Sat, Aug 2, 2014 at 10:50 PM, Dan Kortschak <
    dan.kortschak@adelaide.edu.au> wrote:
    Try making the elements json.Unmarshaler satisfying, then in that
    UnmarshalJSON method do the processing that you need to do and discard the
    contents.

    On 03/08/2014, at 1:03 PM, "Nate Brennand" <natebrennand@gmail.com>
    wrote:

    This solution still seems to create a long array object containing
    every object in the file/gob.
    I'm hoping to find a solution where I'm processing them one a time. As
    in, neither the whole file nor the whole dataset is ever held in memory at
    one point.


    On Sat, Aug 2, 2014 at 5:47 PM, Henry Bubert <el.rey.de.wonns@gmail.com>
    wrote:
    if the json data is correctly formatted (i.e. the last comma before the
    `]` is left out) you should be fine with `encoding/json`. To not read all
    the data into ram, youse `json.NewDecoder` and pass it the file/network
    connection you are getting the json from.

    this might help to get started http://play.golang.org/p/J4IiG4MYfQ


    kind regards,

    Henry

    Am Sonntag, 3. August 2014 01:11:04 UTC+2 schrieb Nate Brennand:
    Hi,

    I get a large JSON array data dump for one of my projects and am trying
    to process it in Go. Due to the size of the file, I was hoping to not load
    the entire doc into memory.

    [
    {"k": "v"},
    {"k": "v"},

    ...
    ]

    Is there a good way to read in these objects one by one? I was hoping
    to use json.Decoder, I tried following the example but get errors since my
    dataset has is preceded by [ and delimited by ,.

    Is there a stdlib option I’m missing to do this or is my best bet to
    eliminate all [ and commas preceded by } and then parse it with a
    json.decoder?

    Nate
    --
    You received this message because you are subscribed to the Google
    Groups "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send
    an email to golang-nuts+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/d/optout.
    --
    You received this message because you are subscribed to the Google Groups
    "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send an
    email to golang-nuts+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/d/optout.

    --
    You received this message because you are subscribed to the Google Groups
    "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send an
    email to golang-nuts+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/d/optout.
    --
    You received this message because you are subscribed to the Google Groups "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/d/optout.
  • Nate Brennand at Aug 4, 2014 at 12:55 am
    I tried implementing your proposed solution but ran into issues where
    something (I assume readByteSkippingSpace()?) seems to be advancing the
    buffer past the start of the json. It’s consistent where it occurs, but it
    only occurs if the list is long enough.

    I stripped down the code as much as I could to this:
    http://play.golang.org/p/YxvxMZi1tc


    On Sun, Aug 3, 2014 at 3:03 PM, Nate Brennand wrote:

    Thanks Matt. That’s a clever way to read the extra byte to get the comma
    and works just as desired. I should’ve examined Decoder‘s methods more.

    What is the reasoning behind combining the decoder’s buffer and the input
    reader with MultiReader? It seems to work as desired by just passing the
    decoder into ReadByteSkippingString().


    On Sun, Aug 3, 2014 at 1:33 PM, Matt Harden wrote:

    The json package will always decode an entire JSON value at once. That
    means that even if you use json.Decoder or implement json.Unmarshaler, it
    will still read the entire array and then process it. So the only way to
    make the reading and processing of the array elements concurrent is to
    implement the parsing of the JSON array yourself. Here is an example.
    http://play.golang.org/p/mOmFZ-YLqE


    On Sat, Aug 2, 2014 at 10:50 PM, Dan Kortschak <
    dan.kortschak@adelaide.edu.au> wrote:
    Try making the elements json.Unmarshaler satisfying, then in that
    UnmarshalJSON method do the processing that you need to do and discard the
    contents.

    On 03/08/2014, at 1:03 PM, "Nate Brennand" <natebrennand@gmail.com>
    wrote:

    This solution still seems to create a long array object containing
    every object in the file/gob.
    I'm hoping to find a solution where I'm processing them one a time. As
    in, neither the whole file nor the whole dataset is ever held in memory at
    one point.


    On Sat, Aug 2, 2014 at 5:47 PM, Henry Bubert <el.rey.de.wonns@gmail.com>
    wrote:
    if the json data is correctly formatted (i.e. the last comma before the
    `]` is left out) you should be fine with `encoding/json`. To not read all
    the data into ram, youse `json.NewDecoder` and pass it the file/network
    connection you are getting the json from.

    this might help to get started http://play.golang.org/p/J4IiG4MYfQ


    kind regards,

    Henry

    Am Sonntag, 3. August 2014 01:11:04 UTC+2 schrieb Nate Brennand:
    Hi,

    I get a large JSON array data dump for one of my projects and am
    trying to process it in Go. Due to the size of the file, I was hoping to
    not load the entire doc into memory.

    [
    {"k": "v"},
    {"k": "v"},

    ...
    ]

    Is there a good way to read in these objects one by one? I was hoping
    to use json.Decoder, I tried following the example but get errors since my
    dataset has is preceded by [ and delimited by ,.

    Is there a stdlib option I’m missing to do this or is my best bet to
    eliminate all [ and commas preceded by } and then parse it with a
    json.decoder?

    Nate
    --
    You received this message because you are subscribed to the Google
    Groups "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send
    an email to golang-nuts+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/d/optout.
    --
    You received this message because you are subscribed to the Google
    Groups "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send
    an email to golang-nuts+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/d/optout.

    --
    You received this message because you are subscribed to the Google
    Groups "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send
    an email to golang-nuts+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/d/optout.
    --
    You received this message because you are subscribed to the Google Groups "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/d/optout.
  • Frits van Bommel at Aug 4, 2014 at 5:40 am

    On Monday, August 4, 2014 2:56:18 AM UTC+2, Nate Brennand wrote:
    I tried implementing your proposed solution but ran into issues where
    something (I assume readByteSkippingSpace()?) seems to be advancing the
    buffer past the start of the json. It’s consistent where it occurs, but it
    only occurs if the list is long enough.

    I stripped down the code as much as I could to this:
    http://play.golang.org/p/YxvxMZi1tc
    Replace the "rd" in the MultiReader call with "r".
    http://play.golang.org/p/_5gKhhs4LS
    This makes sure it keeps working even if the decoder doesn't exhaust the
    buffer of the previous decoder.
    However, this allocates a chain of multireaders, so perhaps it's not very
    memory-efficient when decoding a large stream.
    If that turns out to be a problem, the best option may be to copy-paste the
    io.MultiReader and supporting code and change it to recognize when one of
    its arguments is the result of a multireader call, so that it can use the
    remaining readers from that instead of using the multireader itself.

    --
    You received this message because you are subscribed to the Google Groups "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/d/optout.
  • Nate Brennand at Aug 4, 2014 at 7:10 am
    I don’t know how I missed that, thank you for the help.

    I implemented a small wrapper of the decoder’s buffer and the src’s
    io.Reader to replicate io.MultiReader‘s Read method. Here’s the code if
    anyone’s curious, http://play.golang.org/p/qpJx6Lp5jl.
    The performance difference is minimal on the test set, but it reduces the
    time for my 16k dataset from 6.99 seconds to .99. Thanks again for the
    help, and tip on where to improve it more.


    On Sun, Aug 3, 2014 at 10:39 PM, Frits van Bommel wrote:


    On Monday, August 4, 2014 2:56:18 AM UTC+2, Nate Brennand wrote:

    I tried implementing your proposed solution but ran into issues where
    something (I assume readByteSkippingSpace()?) seems to be advancing the
    buffer past the start of the json. It’s consistent where it occurs, but it
    only occurs if the list is long enough.

    I stripped down the code as much as I could to this:
    http://play.golang.org/p/YxvxMZi1tc
    Replace the "rd" in the MultiReader call with "r".
    http://play.golang.org/p/_5gKhhs4LS
    This makes sure it keeps working even if the decoder doesn't exhaust the
    buffer of the previous decoder.
    However, this allocates a chain of multireaders, so perhaps it's not very
    memory-efficient when decoding a large stream.
    If that turns out to be a problem, the best option may be to copy-paste
    the io.MultiReader and supporting code and change it to recognize when one
    of its arguments is the result of a multireader call, so that it can use
    the remaining readers from that instead of using the multireader itself.

    --
    You received this message because you are subscribed to the Google Groups
    "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send an
    email to golang-nuts+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/d/optout.
    --
    You received this message because you are subscribed to the Google Groups "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/d/optout.
  • Matt Harden at Aug 5, 2014 at 2:04 pm
    This implies that a decoder when reading item #n will sometimes read
    farther from the underlying reader than a decoder reading item #n+1.
    Intuitively that surprises me.

    On Sun, Aug 3, 2014 at 7:55 PM, Nate Brennand wrote:

    I tried implementing your proposed solution but ran into issues where
    something (I assume readByteSkippingSpace()?) seems to be advancing the
    buffer past the start of the json. It’s consistent where it occurs, but it
    only occurs if the list is long enough.

    I stripped down the code as much as I could to this:
    http://play.golang.org/p/YxvxMZi1tc


    On Sun, Aug 3, 2014 at 3:03 PM, Nate Brennand wrote:

    Thanks Matt. That’s a clever way to read the extra byte to get the comma
    and works just as desired. I should’ve examined Decoder‘s methods more.

    What is the reasoning behind combining the decoder’s buffer and the input
    reader with MultiReader? It seems to work as desired by just passing the
    decoder into ReadByteSkippingString().



    On Sun, Aug 3, 2014 at 1:33 PM, Matt Harden <matt.harden@gmail.com>
    wrote:
    The json package will always decode an entire JSON value at once. That
    means that even if you use json.Decoder or implement json.Unmarshaler, it
    will still read the entire array and then process it. So the only way to
    make the reading and processing of the array elements concurrent is to
    implement the parsing of the JSON array yourself. Here is an example.
    http://play.golang.org/p/mOmFZ-YLqE


    On Sat, Aug 2, 2014 at 10:50 PM, Dan Kortschak <
    dan.kortschak@adelaide.edu.au> wrote:
    Try making the elements json.Unmarshaler satisfying, then in that
    UnmarshalJSON method do the processing that you need to do and discard the
    contents.

    On 03/08/2014, at 1:03 PM, "Nate Brennand" <natebrennand@gmail.com>
    wrote:

    This solution still seems to create a long array object containing
    every object in the file/gob.
    I'm hoping to find a solution where I'm processing them one a time. As
    in, neither the whole file nor the whole dataset is ever held in memory at
    one point.


    On Sat, Aug 2, 2014 at 5:47 PM, Henry Bubert <el.rey.de.wonns@gmail.com
    wrote:
    if the json data is correctly formatted (i.e. the last comma before
    the `]` is left out) you should be fine with `encoding/json`. To not read
    all the data into ram, youse `json.NewDecoder` and pass it the file/network
    connection you are getting the json from.

    this might help to get started http://play.golang.org/p/J4IiG4MYfQ


    kind regards,

    Henry

    Am Sonntag, 3. August 2014 01:11:04 UTC+2 schrieb Nate Brennand:
    Hi,

    I get a large JSON array data dump for one of my projects and am
    trying to process it in Go. Due to the size of the file, I was hoping to
    not load the entire doc into memory.

    [
    {"k": "v"},
    {"k": "v"},

    ...
    ]

    Is there a good way to read in these objects one by one? I was hoping
    to use json.Decoder, I tried following the example but get errors since my
    dataset has is preceded by [ and delimited by ,.

    Is there a stdlib option I’m missing to do this or is my best bet to
    eliminate all [ and commas preceded by } and then parse it with a
    json.decoder?

    Nate
    --
    You received this message because you are subscribed to the Google
    Groups "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send
    an email to golang-nuts+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/d/optout.
    --
    You received this message because you are subscribed to the Google
    Groups "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send
    an email to golang-nuts+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/d/optout.

    --
    You received this message because you are subscribed to the Google
    Groups "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send
    an email to golang-nuts+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/d/optout.
    --
    You received this message because you are subscribed to the Google Groups "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/d/optout.

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupgolang-nuts @
categoriesgo
postedAug 2, '14 at 11:10p
activeAug 5, '14 at 2:04p
posts10
users5
websitegolang.org

People

Translate

site design / logo © 2021 Grokbase