FAQ

PRD parser expontentially slower with larger files

Jan Sundberg
Nov 30, 2006 at 11:11 am
Hello.

I have a simple grammar (used in a POE Filter) like:

================
dictionary : '{' key_value(s /\|/) '}'
key_value : key '=' value
key : /[A-Za-z0-9_ ]+/
value : dictionary | string
string : /[^|\}]*/
================

with a valid test string like:

{reply_to={message_type=portfolio_get|private=QQQ}|portfolio_name=P|ordervalidate=OK}

which dumps like:

{
'ordervalidate' => 'OK',
'portfolio_name' => 'P',
'reply_to' => {
'message_type' => 'portfolio_get',
'private' => 'QQQ'
}
}



And a parser:

================
#!/usr/bin/perl

use strict;
use warnings;

use Parse::RecDescent;

$::RD_HINT = 1;
$::RD_ERRORS = 1;
$::RD_WARN = 1;

my $parser = new Parse::RecDescent(q{
dictionary : '{' key_value(s /\|/) '}'
{ $return = { map { $_->[0] => $_->[1] } @{$item[2]} } }
key_value : key '=' value
{ [ @item{qw(key value)} ] }
key : /[A-Za-z0-9_ ]+/
value : dictionary | string
string : /[^|\}]*/
});

while (<>) {
my $msg = $parser->dictionary($_);
# print Dumper($msg);
}
==============





However, it gets very slow with larger files. If I generate data files with adjustable size with:

==============
#!/usr/bin/perl

use strict;
use warnings;

print "{reply_to={message_type=portfolio_get|private=PP}|portfolio_name=PP|ordervalidate=OK|portfolio_positions={";

print join "|", map "portfolio_position$_={instrument_id={instrument_tag=$_|market=MMMM|feedcode=FFFF|isincode=IIII|underlying=UUUU|kind=Spot|currency=CUR|multiplier=2|assettype=Equities}|volume=$_|invested=$_|accrued=$_|fee=$_|commission=$_|change_in_volume=$_|change_in_invested=$_|change_in_accrued=$_|change_in_fee=$_|change_in_commission=$_|currency=CUR}", 1 .. shift;

print "}}|error=0}";
==============



I get execution times like

size filesiz time
10 3398 .25
20 6788 .42
50 16958 1.17
100 33920 3.29
200 69020 10.9
300 104120 24.1
400 139220 42.0
500 174320 63.8
1000 349832 264
1500 531332 605

and these are kernel user time, not wallclock - it's eating all CPU a for a long time!
And the files aren't *that* big.

How can I get it faster ? It seems already simple enough. Please help!


Sincerely.

/ Jan Sundberg
reply

Search Discussions

2 responses

  • Sean O'Rourke at Nov 30, 2006 at 5:21 pm
    Inlining some rules into their parents and tossing in a (probably
    unnecessary) <commit> yields something about 1/3 faster:

    $parser2 = new Parse::RecDescent(<<'EOS');
    dictionary : '{' key_value(s /\|/) '}'
    { $return = { map { $_->[0] => $_->[1] } @{$item[2]} } }
    dictionary2 : key_value(s /\|/) '}'
    { $return = { map { $_->[0] => $_->[1] } @{$item[1]} } }
    key_value : /[A-Za-z0-9_ ]+/ '=' value
    { $return = [ @item[1,3] ] }
    value : '{' <commit> dictionary2 { $return = $item[3] }
    /[^|\}]*/
    EOS

    Calling new rules is slow, as is backtracking, but P::RD can't be
    speedy. You might try Perl6::Rules, which compiles to a regex.
    If you're lucky, it won't tickle any of the bugs in /(??{ ... })/
    for your particular grammar, and should be faster.

    /s
  • Randal L. Schwartz at Dec 2, 2006 at 12:46 pm
    "Jan" == Jan Sundberg writes:
    Jan> However, it gets very slow with larger files. If I generate data files with adjustable size with:

    It's a known problem, unfixable without breaking something, unless
    you gave a pragma to stop supporting some of the features.

    The problem is that PRD uses "nibbling" to scan the string, which looks good
    on paper, but doesn't scale when you start getting larger data. The proper
    approach ("inchworming") wasn't available when Damian started writing PRD
    (perl 5.003), so we're stuck with the current implementation.

    See <http://www.stonehenge.com/merlyn/UnixReview/col55.html> for
    a better description of the terms.

    --
    Randal L. Schwartz - Stonehenge Consulting Services, Inc. - +1 503 777 0095
    <merlyn@stonehenge.com> <URL:http://www.stonehenge.com/merlyn/>
    Perl/Unix/security consulting, Technical writing, Comedy, etc. etc.
    See PerlTraining.Stonehenge.com for onsite and open-enrollment Perl training!

Related Discussions

Discussion Navigation
viewthread | post