FAQ
Dear list,

I have a question to anyone who has experience with the following situation.
I have a very complex array build from hashes containing arrays that contain
hashes etc. I would like to store the whole thing on a disk, both to save on
memory usage and to make it easier to separate this structure from the
actual code.

The trick is that I would not like to have to read back the entire array in
one go (it is multiple megabytes already and will only grow in the future).
So I would like to be able to access the array elements and only read one
element at a time rather then the entire thing in one go, I would also like
to be able to store the file in an encrypted format but that is more of a
fun thing then a real need.

Could someone please advise me what module would be best for
this resulting in the smallest possible data file while still offering a
reasonable speed. A direct tie between the disk and the
applications operation is not fast enough... as an 'root' element in the
array gets read once but the data structure it contains gets used hundreds
or thousands of times during the execution of the program.

So far the code has been using Storable which does a reasonable job but it
does not allow me to read only a single element of the array rather then the
whole thing in one go...
I have tried using DBM:Deep but the Optimize() metode caused the data
structure to get altered in some inexplicable way resulting in the program
failing every time I used this option, with out the option the data file
grows with every single execution of the script.

Looking forward to your suggestions,

Rob

Search Discussions

  • Rene Schickbauer at Nov 16, 2009 at 9:09 pm

    Rob Coops wrote:
    Dear list,

    I have a question to anyone who has experience with the following situation.
    I have a very complex array build from hashes containing arrays that contain
    hashes etc. I would like to store the whole thing on a disk, both to save on
    memory usage and to make it easier to separate this structure from the
    actual code.
    First off: Sounds like a good idea ;-)
    The trick is that I would not like to have to read back the entire array in
    one go (it is multiple megabytes already and will only grow in the future).
    So I would like to be able to access the array elements and only read one
    element at a time rather then the entire thing in one go, I would also like
    to be able to store the file in an encrypted format but that is more of a
    fun thing then a real need.
    Sounds like this is not a one-off script. I know thats not the easiest
    thing to do, but you should look into using a database with a few
    well-defined tables.

    I recommend PostgreSQL (personal favourite), but MySQL or even maybe
    SQLite would do the trick.

    Perls DBI/DBD modules are very good and very well documented. If you
    choose PostgreSQL (www.postgresql.org), all the documentation is online
    as well.

    With a database, you don't have to handle all that data storage and
    retrieval, can run multiple programs on the data simultaniously. And
    best of all: Unless you made a really *big* blunder in your table
    definitions, you probably get better performance than a
    reinvent-the-wheel solution could get you.

    If you choose to implement the data handling on a file base, you also
    have to worry about broken files if your program crashes during write
    operations (databases already have rollback-logging/write-ahead-logs
    implemented).
    Could someone please advise me what module would be best for
    this resulting in the smallest possible data file while still offering a
    reasonable speed. A direct tie between the disk and the
    applications operation is not fast enough... as an 'root' element in the
    array gets read once but the data structure it contains gets used hundreds
    or thousands of times during the execution of the program.
    This *really* depends on your data structure. Can you give us an example?

    BTW, you can also use Storable to store small to medium data structures
    in a database text field by freezing/thawing and base64-encoding them,
    something like this:

    use Storable qw(freeze thaw);
    use MIME::Base64;

    sub dbfreeze {
    my ($data) = @_;

    return encode_base64(freeze($data), "");
    }

    sub dbthaw {
    my ($data) = @_;

    return thaw(decode_base64($data));
    }

    LLAP & LG
    Rene
  • Rob Coops at Nov 18, 2009 at 11:00 am

    On Mon, Nov 16, 2009 at 10:09 PM, Rene Schickbauer wrote:

    Rob Coops wrote:
    Dear list,

    I have a question to anyone who has experience with the following
    situation.
    I have a very complex array build from hashes containing arrays that
    contain
    hashes etc. I would like to store the whole thing on a disk, both to save
    on
    memory usage and to make it easier to separate this structure from the
    actual code.
    First off: Sounds like a good idea ;-)


    The trick is that I would not like to have to read back the entire array
    in
    one go (it is multiple megabytes already and will only grow in the
    future).
    So I would like to be able to access the array elements and only read one
    element at a time rather then the entire thing in one go, I would also
    like
    to be able to store the file in an encrypted format but that is more of a
    fun thing then a real need.
    Sounds like this is not a one-off script. I know thats not the easiest
    thing to do, but you should look into using a database with a few
    well-defined tables.

    I recommend PostgreSQL (personal favourite), but MySQL or even maybe SQLite
    would do the trick.

    Perls DBI/DBD modules are very good and very well documented. If you choose
    PostgreSQL (www.postgresql.org), all the documentation is online as well.

    With a database, you don't have to handle all that data storage and
    retrieval, can run multiple programs on the data simultaniously. And best of
    all: Unless you made a really *big* blunder in your table definitions, you
    probably get better performance than a reinvent-the-wheel solution could get
    you.

    If you choose to implement the data handling on a file base, you also have
    to worry about broken files if your program crashes during write operations
    (databases already have rollback-logging/write-ahead-logs implemented).


    Could someone please advise me what module would be best for
    this resulting in the smallest possible data file while still offering a
    reasonable speed. A direct tie between the disk and the
    applications operation is not fast enough... as an 'root' element in the
    array gets read once but the data structure it contains gets used hundreds
    or thousands of times during the execution of the program.
    This *really* depends on your data structure. Can you give us an example?

    BTW, you can also use Storable to store small to medium data structures in
    a database text field by freezing/thawing and base64-encoding them,
    something like this:

    use Storable qw(freeze thaw);
    use MIME::Base64;

    sub dbfreeze {
    my ($data) = @_;

    return encode_base64(freeze($data), "");
    }

    sub dbthaw {
    my ($data) = @_;

    return thaw(decode_base64($data));
    }

    LLAP & LG
    Rene

    --
    To unsubscribe, e-mail: beginners-unsubscribe@perl.org
    For additional commands, e-mail: beginners-help@perl.org
    http://learn.perl.org/


    Thank you Rene,
    A DB certainly crossed my mind, the problem is that the whole thing needs to
    be as portable as possible, for the simple reason that it will have to be
    used by various different people in several countries without them having to
    rely on a machine that might not always be working. (It is a tool for a
    development group to easily and quickly test EDI message structures. Since
    none of the development machines have a support team maintaining them I can
    not with certainty say that a machine will be available.

    My solution to this is to provide them all with the script that verifies the
    message format, based on a given data structure, this data structure will be
    the only thing that gets distributed to them, and will enable them to test
    various versions of the messages etc...

    I have started using Storable as way to write it all to disk, but added
    bzip2 compression to it rather then MIME encoding, simply because this
    reduces the size of the data file to a more easy to handle size, and makes
    it difficult enough for them not to try and "fix" the datafile them self's.

    Thank you for the advise, I'll certainly be using a DB should this tool ever
    get upgraded to a web based one.

    Regards,

    Rob
  • Dermot at Nov 18, 2009 at 11:08 am

    2009/11/18 Rob Coops <rcoops@gmail.com>:
    On Mon, Nov 16, 2009 at 10:09 PM, Rene Schickbauer <
    rene.schickbauer@magnapowertrain.com> wrote:
    Rob Coops wrote:
    Thank you for the advise, I'll certainly be using a DB should this tool ever
    get upgraded to a web based one.
    You'd be hard pressed to find a *nix machine without SQLite
    pre-installed. It's often used by the package manager. It's bloody
    fast too.

    Just a thought.
    Dp.
  • Jeff Pang at Nov 18, 2009 at 12:53 pm
    For SQLite, just perl -MCPAN -e 'install DBD::SQLite', then use the general DBI to create and access SQLite database.
    you even don't need to install SQLite binary program.

    On Nov 18, 2009, Dermot wrote:

    2009/11/18 Rob Coops <rcoops@gmail.com>:
    On Mon, Nov 16, 2009 at 10:09 PM, Rene Schickbauer <
    rene.schickbauer@magnapowertrain.com> wrote:
    Rob Coops wrote:
    Thank you for the advise, I'll certainly be using a DB should this tool ever
    get upgraded to a web based one.
    You'd be hard pressed to find a *nix machine without SQLite
    pre-installed. It's often used by the package manager. It's bloody
    fast too.
  • Rob Coops at Nov 18, 2009 at 3:46 pm

    On Wed, Nov 18, 2009 at 1:53 PM, Jeff Pang wrote:

    For SQLite, just perl -MCPAN -e 'install DBD::SQLite', then use the general
    DBI to create and access SQLite database.
    you even don't need to install SQLite binary program.

    On Nov 18, 2009, Dermot wrote:

    2009/11/18 Rob Coops <rcoops@gmail.com>:
    On Mon, Nov 16, 2009 at 10:09 PM, Rene Schickbauer <
    rene.schickbauer@magnapowertrain.com> wrote:
    Rob Coops wrote:
    Thank you for the advise, I'll certainly be using a DB should this tool ever
    get upgraded to a web based one.
    You'd be hard pressed to find a *nix machine without SQLite
    pre-installed. It's often used by the package manager. It's bloody
    fast too.





    --
    To unsubscribe, e-mail: beginners-unsubscribe@perl.org
    For additional commands, e-mail: beginners-help@perl.org
    http://learn.perl.org/

    Now that is a handy one... I never knew that. I might just spend a bit of
    time and move the data to SQLite then as it certainly would work well in a
    DB. Just need to structure the DB right as mentioned before which might be
    the hardest part of the operation, because it should be future proof and
    looking at how much these messages have changed over the past 15 years of
    their use I expect that the future holds pretty much the same thing where
    the trend is more complexity at every single change.

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupbeginners @
categoriesperl
postedNov 13, '09 at 12:12p
activeNov 18, '09 at 3:46p
posts6
users4
websiteperl.org

People

Translate

site design / logo © 2022 Grokbase