FAQ
I'm using Fixtures in my app, and am overall pleased with it. But I
now have a couple thousand fixture items which are imported, and
that's taking upwards of 1min+. Out of curiosity I commented out the
creation of the tmp_fixtures_dir and just read from fixtures_dir. I
shaved my populate() time in approximately half. What's the reasoning
for the tmp dir? Safety in case anything goes wrong? For me, it's just
a lot of wasted IOs and time. :-) Luke, any objections if I remove
this?

Also, I was able to shave off a few more seconds by switching to
gathering all the row data for a given source and then calling
$schema->populate(\@rows) so the insert_bulk functionality can be
used. castaway++

Here's the combined patch:

svn diff
Index: lib/DBIx/Class/Fixtures.pm
===================================================================
--- lib/DBIx/Class/Fixtures.pm (revision 4288)
+++ lib/DBIx/Class/Fixtures.pm (working copy)
@@ -792,20 +792,12 @@

my $schema = $self->_generate_schema({ ddl => $ddl_file,
connection_details => delete $params->{connection_details}, %{$params}
});
$self->msg("\nimporting fixtures");
- my $tmp_fixture_dir = dir($fixture_dir, "-~populate~-" . $<);

my $version_file = file($fixture_dir, '_dumper_version');
unless (-e $version_file) {
# return DBIx::Class::Exception->throw('no version file found');
}

- if (-e $tmp_fixture_dir) {
- $self->msg("- deleting existing temp directory $tmp_fixture_dir");
- $tmp_fixture_dir->rmtree;
- }
- $self->msg("- creating temp dir");
- dircopy(dir($fixture_dir, $schema->source($_)->from),
dir($tmp_fixture_dir, $schema->source($_)->from)) for grep { -e
dir($fixture_dir, $schema->source($_)->from) } $schema->sources;
-
eval { $schema->storage->dbh->do('SET foreign_key_checks=0') };

my $fixup_visitor;
@@ -829,6 +821,7 @@
my $rs = $schema->resultset($source);
my $source_dir = dir($tmp_fixture_dir, lc($rs->result_source->from));
next unless (-e $source_dir);
+ my @rows;
while (my $file = $source_dir->next) {
next unless ($file =~ /\.fix$/);
next if $file->is_dir;
@@ -836,8 +829,9 @@
my $HASH1;
eval($contents);
$HASH1 = $fixup_visitor->visit($HASH1) if $fixup_visitor;
- $rs->create($HASH1);
+ push @rows, $HASH1;
}
+ $rs->populate(\@rows);
}

if ($params->{post_ddl}) {
@@ -851,7 +845,6 @@

$self->msg("- fixtures imported");
$self->msg("- cleaning up");
- $tmp_fixture_dir->rmtree;
eval { $schema->storage->dbh->do('SET foreign_key_checks=1') };

return 1;


Drew
--
----------------------------------------------------------------
Drew Taylor * Web development & consulting
Email: drew@drewtaylor.com * Site implementation & hosting
Web : www.drewtaylor.com * perl/mod_perl/DBI/mysql/postgres
----------------------------------------------------------------

Search Discussions

  • Drew Taylor at Apr 23, 2008 at 12:20 pm

    On Wed, Apr 23, 2008 at 12:14 PM, Drew Taylor wrote:
    I'm using Fixtures in my app, and am overall pleased with it. But I
    now have a couple thousand fixture items which are imported, and
    that's taking upwards of 1min+. Out of curiosity I commented out the
    creation of the tmp_fixtures_dir and just read from fixtures_dir. I
    shaved my populate() time in approximately half. What's the reasoning
    for the tmp dir? Safety in case anything goes wrong? For me, it's just
    a lot of wasted IOs and time. :-) Luke, any objections if I remove
    this?
    Well, ignore my previous patch because I forgot to run make test first
    so I missed a typo. Also, after running the test I get an error doing
    svn diff, so I'm assuming that a directory is being deleted which svn
    expects to be there. If you agree with the concept of removing the tmp
    dir, then I'll figure out the test too.

    Drew
    --
    ----------------------------------------------------------------
    Drew Taylor * Web development & consulting
    Email: drew@drewtaylor.com * Site implementation & hosting
    Web : www.drewtaylor.com * perl/mod_perl/DBI/mysql/postgres
    ----------------------------------------------------------------
  • Luke saunders at Apr 23, 2008 at 12:43 pm

    On Wed, Apr 23, 2008 at 12:14 PM, Drew Taylor wrote:
    I'm using Fixtures in my app, and am overall pleased with it. But I
    now have a couple thousand fixture items which are imported, and
    that's taking upwards of 1min+. Out of curiosity I commented out the
    creation of the tmp_fixtures_dir and just read from fixtures_dir. I
    shaved my populate() time in approximately half. What's the reasoning
    for the tmp dir? Safety in case anything goes wrong? For me, it's just
    a lot of wasted IOs and time. :-) Luke, any objections if I remove
    this?
    The tmp directory is used so that if someone dumps while you are
    populating the populate is unaffected. Since this is an issue for some
    people I'd suggest making the tmp directory thing configurable,
    probably in the call to ->new. Normally if the fixtures are on a local
    drive using a tmp directory doesn't take too long so I'd prefer to
    keep it the default.
    Also, I was able to shave off a few more seconds by switching to
    gathering all the row data for a given source and then calling
    $schema->populate(\@rows) so the insert_bulk functionality can be
    used. castaway++
    Good idea.
    Here's the combined patch:
    Thanks, if you make the tmp directory thing configurable and make the
    tests pass I can apply.

    Cheers,
    Luke.
  • Drew Taylor at Apr 23, 2008 at 1:03 pm

    On Wed, Apr 23, 2008 at 12:43 PM, luke saunders wrote:
    On Wed, Apr 23, 2008 at 12:14 PM, Drew Taylor wrote:

    I'm using Fixtures in my app, and am overall pleased with it. But I
    now have a couple thousand fixture items which are imported, and
    that's taking upwards of 1min+. Out of curiosity I commented out the
    creation of the tmp_fixtures_dir and just read from fixtures_dir. I
    shaved my populate() time in approximately half. What's the reasoning
    for the tmp dir? Safety in case anything goes wrong? For me, it's just
    a lot of wasted IOs and time. :-) Luke, any objections if I remove
    this?
    The tmp directory is used so that if someone dumps while you are
    populating the populate is unaffected. Since this is an issue for some
    people I'd suggest making the tmp directory thing configurable,
    probably in the call to ->new. Normally if the fixtures are on a local
    drive using a tmp directory doesn't take too long so I'd prefer to
    keep it the default.
    Ahhh, that makes sense. The copy doesn't take _that_ long (I'm
    guessing around 30-60 seconds for my current record set), but when I
    want to do quick, successive test runs it adds up rather quickly.
    Or better still you can commit it yourself. Do you have a commit bit?
    Yes I do.

    Drew
    --
    ----------------------------------------------------------------
    Drew Taylor * Web development & consulting
    Email: drew@drewtaylor.com * Site implementation & hosting
    Web : www.drewtaylor.com * perl/mod_perl/DBI/mysql/postgres
    ----------------------------------------------------------------
  • Matt S Trout at Apr 23, 2008 at 2:43 pm

    On Wed, Apr 23, 2008 at 01:03:22PM +0100, Drew Taylor wrote:
    On Wed, Apr 23, 2008 at 12:43 PM, luke saunders
    wrote:
    On Wed, Apr 23, 2008 at 12:14 PM, Drew Taylor wrote:

    I'm using Fixtures in my app, and am overall pleased with it. But I
    now have a couple thousand fixture items which are imported, and
    that's taking upwards of 1min+. Out of curiosity I commented out the
    creation of the tmp_fixtures_dir and just read from fixtures_dir. I
    shaved my populate() time in approximately half. What's the reasoning
    for the tmp dir? Safety in case anything goes wrong? For me, it's just
    a lot of wasted IOs and time. :-) Luke, any objections if I remove
    this?
    The tmp directory is used so that if someone dumps while you are
    populating the populate is unaffected. Since this is an issue for some
    people I'd suggest making the tmp directory thing configurable,
    probably in the call to ->new. Normally if the fixtures are on a local
    drive using a tmp directory doesn't take too long so I'd prefer to
    keep it the default.
    Ahhh, that makes sense. The copy doesn't take _that_ long (I'm
    guessing around 30-60 seconds for my current record set), but when I
    want to do quick, successive test runs it adds up rather quickly.
    What if you used link() instead of a copy?

    Or just make the dump code make the tmp dir, and populate open a file handle
    to everything in advance so you don't have to worry about paths changing
    under you?

    Either should be no more prone to races than the current approach and much
    much faster for everybody :)

    --
    Matt S Trout Need help with your Catalyst or DBIx::Class project?
    Technical Director http://www.shadowcat.co.uk/catalyst/
    Shadowcat Systems Ltd. Want a managed development or deployment platform?
    http://chainsawblues.vox.com/ http://www.shadowcat.co.uk/servers/
  • Drew Taylor at Apr 23, 2008 at 4:37 pm

    On Wed, Apr 23, 2008 at 2:43 PM, Matt S Trout wrote:
    On Wed, Apr 23, 2008 at 01:03:22PM +0100, Drew Taylor wrote:
    On Wed, Apr 23, 2008 at 12:43 PM, luke saunders
    wrote:
    On Wed, Apr 23, 2008 at 12:14 PM, Drew Taylor wrote:

    I'm using Fixtures in my app, and am overall pleased with it. But I
    now have a couple thousand fixture items which are imported, and
    that's taking upwards of 1min+. Out of curiosity I commented out the
    creation of the tmp_fixtures_dir and just read from fixtures_dir. I
    shaved my populate() time in approximately half. What's the reasoning
    for the tmp dir? Safety in case anything goes wrong? For me, it's just
    a lot of wasted IOs and time. :-) Luke, any objections if I remove
    this?
    The tmp directory is used so that if someone dumps while you are
    populating the populate is unaffected. Since this is an issue for some
    people I'd suggest making the tmp directory thing configurable,
    probably in the call to ->new. Normally if the fixtures are on a local
    drive using a tmp directory doesn't take too long so I'd prefer to
    keep it the default.
    Ahhh, that makes sense. The copy doesn't take _that_ long (I'm
    guessing around 30-60 seconds for my current record set), but when I
    want to do quick, successive test runs it adds up rather quickly.
    What if you used link() instead of a copy?

    Or just make the dump code make the tmp dir, and populate open a file handle
    to everything in advance so you don't have to worry about paths changing
    under you?
    As I understand your suggestion, dumping to the tmp dir wouldn't work
    for me because I make a dump only every so often, but populate from it
    many times per day as I run my tests. I just want to zip through the
    existing files w/o any IO other than reading/slurping the .fix files.

    Drew
    --
    ----------------------------------------------------------------
    Drew Taylor * Web development & consulting
    Email: drew@drewtaylor.com * Site implementation & hosting
    Web : www.drewtaylor.com * perl/mod_perl/DBI/mysql/postgres
    ----------------------------------------------------------------
  • Matt S Trout at Apr 23, 2008 at 8:42 pm

    On Wed, Apr 23, 2008 at 04:37:54PM +0100, Drew Taylor wrote:
    What if you used link() instead of a copy?

    Or just make the dump code make the tmp dir, and populate open a file handle
    to everything in advance so you don't have to worry about paths changing
    under you?
    As I understand your suggestion, dumping to the tmp dir wouldn't work
    for me because I make a dump only every so often, but populate from it
    many times per day as I run my tests. I just want to zip through the
    existing files w/o any IO other than reading/slurping the .fix files.
    If dumping uses a temp dir, and populate just opens the filehandles,
    that's exactly what you'd get.

    --
    Matt S Trout Need help with your Catalyst or DBIx::Class project?
    Technical Director http://www.shadowcat.co.uk/catalyst/
    Shadowcat Systems Ltd. Want a managed development or deployment platform?
    http://chainsawblues.vox.com/ http://www.shadowcat.co.uk/servers/
  • Luke saunders at Apr 24, 2008 at 11:54 am

    On Wed, Apr 23, 2008 at 8:42 PM, Matt S Trout wrote:
    On Wed, Apr 23, 2008 at 04:37:54PM +0100, Drew Taylor wrote:
    What if you used link() instead of a copy?

    Or just make the dump code make the tmp dir, and populate open a file handle
    to everything in advance so you don't have to worry about paths changing
    under you?
    As I understand your suggestion, dumping to the tmp dir wouldn't work
    for me because I make a dump only every so often, but populate from it
    many times per day as I run my tests. I just want to zip through the
    existing files w/o any IO other than reading/slurping the .fix files.
    If dumping uses a temp dir, and populate just opens the filehandles,
    that's exactly what you'd get.
    Seems like a reasonable idea. My understanding here is that if a
    filehandle is opened and that file is then removed, the filehandle can
    still be read from. However I'm not clear on how that works,
    presumably Perl doesn't read the file into memory when you first open
    the handle, so what happens?

    Also I don't think link() works on all platforms.
  • Jason Kohles at Apr 25, 2008 at 5:00 am

    On Apr 24, 2008, at 6:54 AM, luke saunders wrote:

    On Wed, Apr 23, 2008 at 8:42 PM, Matt S Trout <dbix-
    class@trout.me.uk> wrote:
    On Wed, Apr 23, 2008 at 04:37:54PM +0100, Drew Taylor wrote:
    What if you used link() instead of a copy?

    Or just make the dump code make the tmp dir, and populate open a
    file handle
    to everything in advance so you don't have to worry about paths
    changing
    under you?
    As I understand your suggestion, dumping to the tmp dir wouldn't
    work
    for me because I make a dump only every so often, but populate
    from it
    many times per day as I run my tests. I just want to zip through the
    existing files w/o any IO other than reading/slurping the .fix
    files.
    If dumping uses a temp dir, and populate just opens the filehandles,
    that's exactly what you'd get.
    Seems like a reasonable idea. My understanding here is that if a
    filehandle is opened and that file is then removed, the filehandle can
    still be read from. However I'm not clear on how that works,
    presumably Perl doesn't read the file into memory when you first open
    the handle, so what happens?
    What happens (on reasonable OSes anyway) is that the original file
    still exists, even if there are no references to it in the directory
    structure. In perl terms, you can think of both the directory entry
    and the filehandle as references, pointing to the same underlying
    data. The file doesn't actually go away until it's refcount drops to
    0, which means that all the references to it in the directory
    structure are gone, and any filehandles that reference it are closed.
    If you delete the original directory entry that pointed to it, and
    create a new file in it's place, that reference doesn't point at the
    original data, it's a new reference pointing to a new chunk of data on
    the disk.
    Also I don't think link() works on all platforms.
    This isn't entirely portable either, but I don't have any recent
    enough windows experience to say if it will explode on versions of
    windows released in this century or not...

    --
    Jason Kohles, RHCA RHCDS RHCE
    email@jasonkohles.com - http://www.jasonkohles.com/
    "A witty saying proves nothing." -- Voltaire
  • Luke saunders at Apr 25, 2008 at 6:05 pm

    On Fri, Apr 25, 2008 at 5:00 AM, Jason Kohles wrote:
    On Apr 24, 2008, at 6:54 AM, luke saunders wrote:

    On Wed, Apr 23, 2008 at 8:42 PM, Matt S Trout wrote:
    On Wed, Apr 23, 2008 at 04:37:54PM +0100, Drew Taylor wrote:

    What if you used link() instead of a copy?

    Or just make the dump code make the tmp dir, and populate open a
    file handle
    to everything in advance so you don't have to worry about paths
    changing
    under you?
    As I understand your suggestion, dumping to the tmp dir wouldn't work
    for me because I make a dump only every so often, but populate from it
    many times per day as I run my tests. I just want to zip through the
    existing files w/o any IO other than reading/slurping the .fix files.
    If dumping uses a temp dir, and populate just opens the filehandles,
    that's exactly what you'd get.
    Seems like a reasonable idea. My understanding here is that if a
    filehandle is opened and that file is then removed, the filehandle can
    still be read from. However I'm not clear on how that works,
    presumably Perl doesn't read the file into memory when you first open
    the handle, so what happens?
    What happens (on reasonable OSes anyway) is that the original file still
    exists, even if there are no references to it in the directory structure.
    In perl terms, you can think of both the directory entry and the filehandle
    as references, pointing to the same underlying data. The file doesn't
    actually go away until it's refcount drops to 0, which means that all the
    references to it in the directory structure are gone, and any filehandles
    that reference it are closed. If you delete the original directory entry
    that pointed to it, and create a new file in it's place, that reference
    doesn't point at the original data, it's a new reference pointing to a new
    chunk of data on the disk.


    Also I don't think link() works on all platforms.
    This isn't entirely portable either, but I don't have any recent enough
    windows experience to say if it will explode on versions of windows released
    in this century or not...
    Okay, thanks. Given this we should leave it for this release but
    implement it or some other solution for the next release.

    Drew, I have applied the insert_bulk part of your patch and will
    release 1.001000 (http://xrl.us/bjtjq) over the weekend unless people
    shout.

    Cheers,
    Luke.
  • Drew Taylor at Apr 30, 2008 at 3:05 pm

    On Fri, Apr 25, 2008 at 6:05 PM, luke saunders wrote:

    Drew, I have applied the insert_bulk part of your patch and will
    release 1.001000 (http://xrl.us/bjtjq) over the weekend unless people
    shout.
    Sorry for disappearing from the thread (we were visiting the Scottish
    highlands this weekend - beautiful!) but that sounds fine. I'll try to
    find the time in the next week or so to implement the other things
    discussed.

    Drew
    --
    ----------------------------------------------------------------
    Drew Taylor * Web development & consulting
    Email: drew@drewtaylor.com * Site implementation & hosting
    Web : www.drewtaylor.com * perl/mod_perl/DBI/mysql/postgres
    ----------------------------------------------------------------

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupdbix-class @
categoriesperl, catalyst
postedApr 23, '08 at 12:14p
activeApr 30, '08 at 3:05p
posts11
users4
websitedbix-class.org
irc#dbix-class

People

Translate

site design / logo © 2021 Grokbase