FAQ
I have to parse and process large json file 60MB plus size. What should be
the fastest way to processes?
In the sample I am parsing small json from memory while in actually it will
be file I/O

http://play.golang.org/p/MzIaAR8Z-P

There is a function ProcessMap that calls itself if it is a map. How can I
concurrently invoke the recursive call when there is a map inside a map. I
have used *waitgroup* however it doesn't work and the functions just exits

--
You received this message because you are subscribed to the Google Groups "golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Search Discussions

  • C Banning at Apr 24, 2014 at 10:44 am
    http://play.golang.org/p/uHcn_w24Dx - but you aren't guaranteed sequential
    processing.

    First, however, you need to handle the correct value types in the switch -
    see code behind http://godoc.org/github.com/clbanning/mxj#Map.StringIndent.

    If your file is a collection of JSON strings perhaps something like
    http://godoc.org/github.com/clbanning/mxj#HandleJsonReader would work,
    where you can hand each JSON string off to a go routine.
    On Thursday, April 24, 2014 1:52:00 AM UTC-5, Abhijit Kadam wrote:

    I have to parse and process large json file 60MB plus size. What should be
    the fastest way to processes?
    In the sample I am parsing small json from memory while in actually it
    will be file I/O

    http://play.golang.org/p/MzIaAR8Z-P

    There is a function ProcessMap that calls itself if it is a map. How can I
    concurrently invoke the recursive call when there is a map inside a map. I
    have used *waitgroup* however it doesn't work and the functions just exits
    --
    You received this message because you are subscribed to the Google Groups "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/d/optout.
  • Kyle Wolfe at Apr 24, 2014 at 2:20 pm
    Is this file going to contain one large json document or many? My questino
    on top of his would be, can you have the io spool each document in rather
    than parse after the whole file is in memory?

    From this sample data, I'd think the flow would be:

    Read file into memory (or spool?)
    Read the headers
    Go func to spool each data element into channel
    Go create n workers to read from channel
    wait on group
    On Thursday, April 24, 2014 2:52:00 AM UTC-4, Abhijit Kadam wrote:

    I have to parse and process large json file 60MB plus size. What should be
    the fastest way to processes?
    In the sample I am parsing small json from memory while in actually it
    will be file I/O

    http://play.golang.org/p/MzIaAR8Z-P

    There is a function ProcessMap that calls itself if it is a map. How can I
    concurrently invoke the recursive call when there is a map inside a map. I
    have used *waitgroup* however it doesn't work and the functions just exits
    --
    You received this message because you are subscribed to the Google Groups "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/d/optout.
  • Abhijit Kadam at Apr 24, 2014 at 2:49 pm
    It contains many huge documents like in the sample format "section1" &
    "section2" can be thought of documents. However the format is not I can
    change or decide. With as is sample format is it possible to spool or
    buffer it concurrently?

    --
    You received this message because you are subscribed to the Google Groups "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/d/optout.
  • Kyle Wolfe at Apr 24, 2014 at 3:17 pm
    I think this is a documented example of it against a string.
    http://golang.org/pkg/encoding/json/#example_Decoder

    So I'd say to start main, fire off a go routine that reads the stream into
    a channel, after that create x workers to read from said channel. After
    that I'm not sure what your doing with each document to help you make it
    more concurrent
    On Thursday, April 24, 2014 10:49:43 AM UTC-4, Abhijit Kadam wrote:

    It contains many huge documents like in the sample format "section1" &
    "section2" can be thought of documents. However the format is not I can
    change or decide. With as is sample format is it possible to spool or
    buffer it concurrently?
    --
    You received this message because you are subscribed to the Google Groups "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/d/optout.
  • Henrik Johansson at Apr 24, 2014 at 3:18 pm
    What Kyle suggests would probably work just fine for your sections.

    Feed sections one by one into the channel and have the channel readers
    (maybe 12? have to try it i guess) consume and process them as you need to.
    It could speed things up for you but it really depends on many factors I
    guess.



    On Thu, Apr 24, 2014 at 4:49 PM, Abhijit Kadam wrote:

    It contains many huge documents like in the sample format "section1" &
    "section2" can be thought of documents. However the format is not I can
    change or decide. With as is sample format is it possible to spool or
    buffer it concurrently?

    --
    You received this message because you are subscribed to the Google Groups
    "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send an
    email to golang-nuts+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/d/optout.
    --
    You received this message because you are subscribed to the Google Groups "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/d/optout.
  • Abhijit Kadam at Apr 24, 2014 at 3:29 pm
    I knew that example however the objects appear in the form of stream in
    that document. In my case they are enclosed inside "{ ... }" which is one
    big object not stream of objects.

    --
    You received this message because you are subscribed to the Google Groups "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/d/optout.
  • C Banning at Apr 24, 2014 at 3:25 pm
    Something might this might get you there as
    well: http://play.golang.org/p/Li8sOr7o7Q
    On Thursday, April 24, 2014 9:49:43 AM UTC-5, Abhijit Kadam wrote:

    It contains many huge documents like in the sample format "section1" &
    "section2" can be thought of documents. However the format is not I can
    change or decide. With as is sample format is it possible to spool or
    buffer it concurrently?
    --
    You received this message because you are subscribed to the Google Groups "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/d/optout.
  • Abhijit Kadam at Apr 24, 2014 at 3:33 pm
    Thanks will look into mxj lib. I hope that files that I have, they have
    some convention like "section1", "section2".
    On Thursday, April 24, 2014 8:55:30 PM UTC+5:30, C Banning wrote:

    Something might this might get you there as well:
    http://play.golang.org/p/Li8sOr7o7Q
    On Thursday, April 24, 2014 9:49:43 AM UTC-5, Abhijit Kadam wrote:

    It contains many huge documents like in the sample format "section1" &
    "section2" can be thought of documents. However the format is not I can
    change or decide. With as is sample format is it possible to spool or
    buffer it concurrently?
    --
    You received this message because you are subscribed to the Google Groups "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/d/optout.
  • C Banning at Apr 24, 2014 at 3:51 pm
    if you know the key's location ValuesForPath() can be much faster than
    ValuesForKey().
    On Thursday, April 24, 2014 10:33:45 AM UTC-5, Abhijit Kadam wrote:

    Thanks will look into mxj lib. I hope that files that I have, they have
    some convention like "section1", "section2".
    On Thursday, April 24, 2014 8:55:30 PM UTC+5:30, C Banning wrote:

    Something might this might get you there as well:
    http://play.golang.org/p/Li8sOr7o7Q
    On Thursday, April 24, 2014 9:49:43 AM UTC-5, Abhijit Kadam wrote:

    It contains many huge documents like in the sample format "section1" &
    "section2" can be thought of documents. However the format is not I can
    change or decide. With as is sample format is it possible to spool or
    buffer it concurrently?
    --
    You received this message because you are subscribed to the Google Groups "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/d/optout.
  • Abhijit Kadam at Apr 24, 2014 at 3:57 pm
    OK. How these functions work? Do they perform find and return from already
    parsed file's map or they parse the file as we ask for values by key or
    path?
    On Apr 24, 2014 9:21 PM, "C Banning" wrote:

    if you know the key's location ValuesForPath() can be much faster than
    ValuesForKey().
    On Thursday, April 24, 2014 10:33:45 AM UTC-5, Abhijit Kadam wrote:

    Thanks will look into mxj lib. I hope that files that I have, they have
    some convention like "section1", "section2".
    On Thursday, April 24, 2014 8:55:30 PM UTC+5:30, C Banning wrote:

    Something might this might get you there as well:
    http://play.golang.org/p/Li8sOr7o7Q
    On Thursday, April 24, 2014 9:49:43 AM UTC-5, Abhijit Kadam wrote:

    It contains many huge documents like in the sample format "section1" &
    "section2" can be thought of documents. However the format is not I can
    change or decide. With as is sample format is it possible to spool or
    buffer it concurrently?
    --
    You received this message because you are subscribed to a topic in the
    Google Groups "golang-nuts" group.
    To unsubscribe from this topic, visit
    https://groups.google.com/d/topic/golang-nuts/wObmEq1MAps/unsubscribe.
    To unsubscribe from this group and all its topics, send an email to
    golang-nuts+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/d/optout.
    --
    You received this message because you are subscribed to the Google Groups "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/d/optout.
  • C Banning at Apr 24, 2014 at 4:20 pm
    OK - here's some commentary on what's going on in the
    example, http://play.golang.org/p/l-juBvIdK_.

        1. A JSON object is read off the io.Reader and unmarshaled into a Map
        (map[string]interface{}).
        2. ValuesForKey() and ValuesForPath() are methods that operate on the
        Map you've created. (You can read the code for details.)
        3. As mentioned the library is a shorthand for working with anonymous
        JSON objects (or XML documents) where you may want to access particular
        key/tag values.
        4. If you know the object structure, you'll get faster processing if you
        declare a structure and use json.Unmarshal() per encoding/json package
        documentation.

    Gotta run.
    On Thursday, April 24, 2014 10:57:07 AM UTC-5, Abhijit Kadam wrote:

    OK. How these functions work? Do they perform find and return from already
    parsed file's map or they parse the file as we ask for values by key or
    path?
    On Apr 24, 2014 9:21 PM, "C Banning" <clba...@gmail.com <javascript:>>
    wrote:
    if you know the key's location ValuesForPath() can be much faster than
    ValuesForKey().
    On Thursday, April 24, 2014 10:33:45 AM UTC-5, Abhijit Kadam wrote:

    Thanks will look into mxj lib. I hope that files that I have, they have
    some convention like "section1", "section2".
    On Thursday, April 24, 2014 8:55:30 PM UTC+5:30, C Banning wrote:

    Something might this might get you there as well:
    http://play.golang.org/p/Li8sOr7o7Q
    On Thursday, April 24, 2014 9:49:43 AM UTC-5, Abhijit Kadam wrote:

    It contains many huge documents like in the sample format "section1" &
    "section2" can be thought of documents. However the format is not I can
    change or decide. With as is sample format is it possible to spool or
    buffer it concurrently?
    --
    You received this message because you are subscribed to a topic in the
    Google Groups "golang-nuts" group.
    To unsubscribe from this topic, visit
    https://groups.google.com/d/topic/golang-nuts/wObmEq1MAps/unsubscribe.
    To unsubscribe from this group and all its topics, send an email to
    golang-nuts...@googlegroups.com <javascript:>.
    For more options, visit https://groups.google.com/d/optout.
    --
    You received this message because you are subscribed to the Google Groups "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/d/optout.
  • Abhijit Kadam at Apr 25, 2014 at 3:20 pm
    Charles,
    I tried your xml lib with "ValuesForPath" and it is very convenient to use
    like ValuesForPath("section1.details.F1") and it gives the desired value.
    Great! However got some things to discuss if you do not mind.

    I used mxj.NewMapJsonReader to read my 50 MB plus file. This function kind
    of hung up for long. Then I just used golangs ioutil.ReadFile to read the
    contents and then used "mxj.NewMapJson" to parse into map. Then used
    functions like ValuesForPath(). I did not get chance to further look into
    NewMapJsonReader. As right now I am focusing on processing. The file format
    : '{ {sections...1},...{sections ...n} }' not '{section..1}
    {section..2}....{section..n}'

    Another thing if I used ValuesForPath and when accessing array is there a
    way to access array element?
    data = m.ValuesForPath("section1.data.F1") returns array and then using
    data[0] I can get the desired value. However it is fetching array and will
    not be efficient always.
    something like data = m.ValuesForPath("section1.data[0]F1") will be useful

    --
    You received this message because you are subscribed to the Google Groups "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/d/optout.
  • Abhijit Kadam at Apr 25, 2014 at 3:26 pm
    Some Clarification. In the above post In second point I mean
    "section1.data" is an array and each element of array has field F1. However
    the return is another array with F1 value from data[0] and data[1]. If it
    is copy then it may be efficient to access like that.

    --
    You received this message because you are subscribed to the Google Groups "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/d/optout.
  • C Banning at Apr 28, 2014 at 2:37 pm
    Abhijit, Thanks for you suggestion. It's now
    available: http://godoc.org/github.com/clbanning/mxj#Map.ValuesForPath
    On Friday, April 25, 2014 10:26:15 AM UTC-5, Abhijit Kadam wrote:

    Some Clarification. In the above post In second point I mean
    "section1.data" is an array and each element of array has field F1. However
    the return is another array with F1 value from data[0] and data[1]. If it
    is copy then it may be efficient to access like that.
    --
    You received this message because you are subscribed to the Google Groups "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/d/optout.
  • C Banning at Apr 24, 2014 at 3:53 pm
    I routinely use it to process large data sets containing multiple 8-10 MB
    documents (XML). If the individual JSON objects are large and you know the
    key's location, ValuesForPath() can be much faster than ValuesForKey().
    On Thursday, April 24, 2014 10:33:45 AM UTC-5, Abhijit Kadam wrote:

    Thanks will look into mxj lib. I hope that files that I have, they have
    some convention like "section1", "section2".
    On Thursday, April 24, 2014 8:55:30 PM UTC+5:30, C Banning wrote:

    Something might this might get you there as well:
    http://play.golang.org/p/Li8sOr7o7Q
    On Thursday, April 24, 2014 9:49:43 AM UTC-5, Abhijit Kadam wrote:

    It contains many huge documents like in the sample format "section1" &
    "section2" can be thought of documents. However the format is not I can
    change or decide. With as is sample format is it possible to spool or
    buffer it concurrently?
    --
    You received this message because you are subscribed to the Google Groups "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/d/optout.
  • Kevin Gillette at Apr 28, 2014 at 3:32 pm
    Your data would have to be very deeply nested to reach any recursion
    limits. If that does happen, you can design your system to simulate
    recursion using a trampoline. Essentially it means have a for loop that
    calls functions which themselves return functions (which are often
    closures). This flattens the call stack.
    On Thursday, April 24, 2014 12:52:00 AM UTC-6, Abhijit Kadam wrote:

    I have to parse and process large json file 60MB plus size. What should be
    the fastest way to processes?
    In the sample I am parsing small json from memory while in actually it
    will be file I/O

    http://play.golang.org/p/MzIaAR8Z-P

    There is a function ProcessMap that calls itself if it is a map. How can I
    concurrently invoke the recursive call when there is a map inside a map. I
    have used *waitgroup* however it doesn't work and the functions just exits
    --
    You received this message because you are subscribed to the Google Groups "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/d/optout.

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupgolang-nuts @
categoriesgo
postedApr 24, '14 at 6:52a
activeApr 28, '14 at 3:32p
posts17
users5
websitegolang.org

People

Translate

site design / logo © 2022 Grokbase