FAQ
Hi!!
I have a lot of files containing data from numerical simulations.
They consist in a lot (~tens of thousands) lines containing records, both
numbers in different formats (int, float, exp, positive and negative) and
strings or voids.

NOTE: it's not guaranteed the format in which a float is written.

Some examples:

* 0, 1, --, 0.514488965035739598, 0.194670802686332967,
-0.222068317660608999, 0.5932199880910123, 0.204271715528226705,
0.105948237834844639, -0.437641911596323208, , , , 5.23946321410425484e-05

* 010, 30, Z010n030idsa4599b672, 53, 9.550759, 4599|672, S, bh|bh,
38.370809082, 16.635107274, 0.157101, 0.78333703491, 0.94843

* 010 024 Z010n024idsa100980b980 0 0.0
                 980|100980 H ns++|--
               23.202346940299996 10.0876388294
  2.43164e-05 1.9389955506e-06 0.910007

* c16n24a100980b980,16,24,0.1,1,0.05,5,no,0,0,980|100980,H,ns++|--,23.202346940299996,10.0876388294,2.43164e-05,1.9389955506e-06,0.910007,54953.135187521

Each file has its own format (no mixed lines).

In python I use np.genfromtxt, in go, until now, the best way I found
to load these lines is to read the files line by line and REGEXP the lines
structs appended to a slice.

To read the lines I use
this: https://github.com/brunetto/goutils/blob/master/readfile/readLine.go

Essentially is something like:

for isPrefix && err == nil {
  line, isPrefix, err = nReader.ReadLine()
  ln = append(ln, line...)
}
return string(ln), err



I have two questions:

1 - is there a better way to load these files?
     better meaning faster, safer enough, ...
     is it ok to load them into slices of structs?

2 - do you have any suggestion of a better way to store these data?
      I had a look into:
      * HDF5 but the https://github.com/sbinet/go-hdf5 can't handle strings
and I don't have time at the moment to learn how to fix this
      * databases (sqlite) but I did't find an easy and fast to learn way to
do this

Thanks,

brunetto

















--
You received this message because you are subscribed to the Google Groups "golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Search Discussions

  • Egon at Jan 22, 2015 at 11:49 am

    On Thursday, 22 January 2015 13:02:58 UTC+2, brunetto wrote:
    Hi!!
    I have a lot of files containing data from numerical simulations.
    They consist in a lot (~tens of thousands) lines containing records, both
    numbers in different formats (int, float, exp, positive and negative) and
    strings or voids.

    NOTE: it's not guaranteed the format in which a float is written.

    Some examples:

    * 0, 1, --, 0.514488965035739598, 0.194670802686332967,
    -0.222068317660608999, 0.5932199880910123, 0.204271715528226705,
    0.105948237834844639, -0.437641911596323208, , , , 5.23946321410425484e-05

    * 010, 30, Z010n030idsa4599b672, 53, 9.550759, 4599|672, S, bh|bh,
    38.370809082, 16.635107274, 0.157101, 0.78333703491, 0.94843

    * 010 024 Z010n024idsa100980b980 0
    0.0 980|100980 H ns++|--
    23.202346940299996 10.0876388294
    2.43164e-05 1.9389955506e-06 0.910007


    * c16n24a100980b980,16,24,0.1,1,0.05,5,no,0,0,980|100980,H,ns++|--,23.202346940299996,10.0876388294,2.43164e-05,1.9389955506e-06,0.910007,54953.135187521

    Each file has its own format (no mixed lines).

    In python I use np.genfromtxt, in go, until now, the best way I found
    to load these lines is to read the files line by line and REGEXP the lines
    structs appended to a slice.

    To read the lines I use this:
    https://github.com/brunetto/goutils/blob/master/readfile/readLine.go

    Essentially is something like:

    for isPrefix && err == nil {
    line, isPrefix, err = nReader.ReadLine()
    ln = append(ln, line...)
    }
    return string(ln), err

    There is Scanner http://golang.org/pkg/bufio/#Scanner

    + Egon

    I have two questions:

    1 - is there a better way to load these files?
    better meaning faster, safer enough, ...
    is it ok to load them into slices of structs?

    2 - do you have any suggestion of a better way to store these data?
    I had a look into:
    * HDF5 but the https://github.com/sbinet/go-hdf5 can't handle
    strings and I don't have time at the moment to learn how to fix this
    * databases (sqlite) but I did't find an easy and fast to learn way
    to do this

    Thanks,

    brunetto














    --
    You received this message because you are subscribed to the Google Groups "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/d/optout.
  • Brunetto at Jan 22, 2015 at 11:57 am

    On Thursday, January 22, 2015 at 12:49:26 PM UTC+1, Egon wrote:

    [...]
    to load these lines is to read the files line by line and REGEXP the
    lines structs appended to a slice.

    To read the lines I use this:
    https://github.com/brunetto/goutils/blob/master/readfile/readLine.go

    Essentially is something like:

    for isPrefix && err == nil {
    line, isPrefix, err = nReader.ReadLine()
    ln = append(ln, line...)
    }
    return string(ln), err

    There is Scanner http://golang.org/pkg/bufio/#Scanner
    but I still need to regexp the line after have read it , isn't it?

    brunetto

    --
    You received this message because you are subscribed to the Google Groups "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/d/optout.
  • Egon at Jan 22, 2015 at 11:55 am

    On Thursday, 22 January 2015 13:02:58 UTC+2, brunetto wrote:
    Hi!!
    I have a lot of files containing data from numerical simulations.
    They consist in a lot (~tens of thousands) lines containing records, both
    numbers in different formats (int, float, exp, positive and negative) and
    strings or voids.

    NOTE: it's not guaranteed the format in which a float is written.

    Some examples:

    * 0, 1, --, 0.514488965035739598, 0.194670802686332967,
    -0.222068317660608999, 0.5932199880910123, 0.204271715528226705,
    0.105948237834844639, -0.437641911596323208, , , , 5.23946321410425484e-05

    * 010, 30, Z010n030idsa4599b672, 53, 9.550759, 4599|672, S, bh|bh,
    38.370809082, 16.635107274, 0.157101, 0.78333703491, 0.94843

    * 010 024 Z010n024idsa100980b980 0
    0.0 980|100980 H ns++|--
    23.202346940299996 10.0876388294
    2.43164e-05 1.9389955506e-06 0.910007


    * c16n24a100980b980,16,24,0.1,1,0.05,5,no,0,0,980|100980,H,ns++|--,23.202346940299996,10.0876388294,2.43164e-05,1.9389955506e-06,0.910007,54953.135187521

    Each file has its own format (no mixed lines).
    Do you know the format before reading the file?
    Do you have a syntax descriptions for different formats?
    How many different formats do you have?
    Do the formats change often?

    In python I use np.genfromtxt, in go, until now, the best way I found
    to load these lines is to read the files line by line and REGEXP the lines
    structs appended to a slice.

    To read the lines I use this:
    https://github.com/brunetto/goutils/blob/master/readfile/readLine.go

    Essentially is something like:

    for isPrefix && err == nil {
    line, isPrefix, err = nReader.ReadLine()
    ln = append(ln, line...)
    }
    return string(ln), err



    I have two questions:

    1 - is there a better way to load these files?
    better meaning faster, safer enough, ...
    is it ok to load them into slices of structs?
    Yes it should be fine.

    2 - do you have any suggestion of a better way to store these data?
    I had a look into:
    * HDF5 but the https://github.com/sbinet/go-hdf5 can't handle
    strings and I don't have time at the moment to learn how to fix this
    * databases (sqlite) but I did't find an easy and fast to learn way
    to do this
    What do you intend to do with the data after you have stored them somewhere?

    Thanks,

    brunetto














    --
    You received this message because you are subscribed to the Google Groups "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/d/optout.
  • Brunetto at Jan 22, 2015 at 12:01 pm

    On Thursday, January 22, 2015 at 12:55:10 PM UTC+1, Egon wrote:

    Do you know the format before reading the file?
    yes

    Do you have a syntax descriptions for different formats?
    what do you mean? usually I use a regexp, where, for example, a float is
      `-*\d*\.*\d*e*-*\d*`

    How many different formats do you have?
    >

    four or five, but I can convert everything once I have a good solution

    Do the formats change often?
    not very often but it can happen


    [...]

    2 - do you have any suggestion of a better way to store these data?
    I had a look into:
    * HDF5 but the https://github.com/sbinet/go-hdf5 can't handle
    strings and I don't have time at the moment to learn how to fix this
    * databases (sqlite) but I did't find an easy and fast to learn way
    to do this
    What do you intend to do with the data after you have stored them
    somewhere?
    read it back and then analyze the data (and it may happen more than once)

    brunetto
    --
    You received this message because you are subscribed to the Google Groups "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/d/optout.
  • Egon at Jan 22, 2015 at 12:22 pm

    On Thursday, 22 January 2015 14:01:17 UTC+2, brunetto wrote:

    On Thursday, January 22, 2015 at 12:55:10 PM UTC+1, Egon wrote:


    Do you know the format before reading the file?
    yes

    Do you have a syntax descriptions for different formats?
    what do you mean? usually I use a regexp, where, for example, a float is
    `-*\d*\.*\d*e*-*\d*`

    I would probably start with something like:
    http://play.golang.org/p/72CWT6-kBU

    How many different formats do you have?
    four or five, but I can convert everything once I have a good solution

    Do the formats change often?
    not very often but it can happen


    [...]

    2 - do you have any suggestion of a better way to store these data?
    I had a look into:
    * HDF5 but the https://github.com/sbinet/go-hdf5 can't handle
    strings and I don't have time at the moment to learn how to fix this
    * databases (sqlite) but I did't find an easy and fast to learn way
    to do this
    What do you intend to do with the data after you have stored them
    somewhere?
    read it back and then analyze the data (and it may happen more than once)
    Use gob to save/load the whole array of structs into a file
    http://golang.org/pkg/encoding/gob/

    + Egon

    --
    You received this message because you are subscribed to the Google Groups "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/d/optout.
  • Brunetto at Jan 22, 2015 at 1:28 pm

    On Thursday, January 22, 2015 at 1:22:54 PM UTC+1, Egon wrote:
    [...]
    I would probably start with something like:
    http://play.golang.org/p/72CWT6-kBU
    Thanks a lot, it seems a very good solution!!
    [...]

    Use gob to save/load the whole array of structs into a file
    http://golang.org/pkg/encoding/gob/
    I though about it but I'm looking for something that can be read also by
    other people with other languages
    (python, c++, ...), that's the reason I tried hdf5 and sqlite

    brunetto

    --
    You received this message because you are subscribed to the Google Groups "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/d/optout.
  • Michael Jones at Jan 22, 2015 at 1:33 pm
    Regex is not the solution to all of life's problems.
    On Thu, Jan 22, 2015 at 5:28 AM, brunetto wrote:


    On Thursday, January 22, 2015 at 1:22:54 PM UTC+1, Egon wrote:

    [...]
    I would probably start with something like: http://play.golang.org/p/
    72CWT6-kBU
    Thanks a lot, it seems a very good solution!!
    [...]

    Use gob to save/load the whole array of structs into a file
    http://golang.org/pkg/encoding/gob/
    I though about it but I'm looking for something that can be read also by
    other people with other languages
    (python, c++, ...), that's the reason I tried hdf5 and sqlite

    brunetto

    --
    You received this message because you are subscribed to the Google Groups
    "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send an
    email to golang-nuts+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/d/optout.


    --
    *Michael T. Jones | Chief Technology Advocate | mtj@google.com
    <mtj@google.com> | +1 650-335-5765*

    --
    You received this message because you are subscribed to the Google Groups "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/d/optout.
  • Brunetto at Jan 22, 2015 at 1:38 pm

    On Thursday, January 22, 2015 at 2:33:41 PM UTC+1, Michael Jones wrote:
    Regex is not the solution to all of life's problems.
    Yes, but sometimes, if you're not a pro, it's the shortest, fastest (to
    implement) and easiest (to handle) one...
    (if "easy" can be an adjective for regex syntax!:P)

    brunetto


    On Thu, Jan 22, 2015 at 5:28 AM, brunetto <brunett...@gmail.com
    <javascript:>> wrote:
    On Thursday, January 22, 2015 at 1:22:54 PM UTC+1, Egon wrote:

    [...]
    I would probably start with something like: http://play.golang.org/p/
    72CWT6-kBU
    Thanks a lot, it seems a very good solution!!
    [...]

    Use gob to save/load the whole array of structs into a file
    http://golang.org/pkg/encoding/gob/
    I though about it but I'm looking for something that can be read also by
    other people with other languages
    (python, c++, ...), that's the reason I tried hdf5 and sqlite

    brunetto

    --
    You received this message because you are subscribed to the Google Groups
    "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send an
    email to golang-nuts...@googlegroups.com <javascript:>.
    For more options, visit https://groups.google.com/d/optout.


    --
    *Michael T. Jones | Chief Technology Advocate | m...@google.com
    <javascript:> | +1 650-335-5765*
    --
    You received this message because you are subscribed to the Google Groups "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/d/optout.

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupgolang-nuts @
categoriesgo
postedJan 22, '15 at 11:03a
activeJan 22, '15 at 1:38p
posts9
users3
websitegolang.org

3 users in discussion

Brunetto: 5 posts Egon: 3 posts Michael Jones: 1 post

People

Translate

site design / logo © 2021 Grokbase