Hello.
I have a simple grammar (used in a POE Filter) like:
================
dictionary : '{' key_value(s /\|/) '}'
key_value : key '=' value
key : /[A-Za-z0-9_ ]+/
value : dictionary | string
string : /[^|\}]*/
================
with a valid test string like:
{reply_to={message_type=portfolio_get|private=QQQ}|portfolio_name=P|ordervalidate=OK}
which dumps like:
{
'ordervalidate' => 'OK',
'portfolio_name' => 'P',
'reply_to' => {
'message_type' => 'portfolio_get',
'private' => 'QQQ'
}
}
And a parser:
================
#!/usr/bin/perl
use strict;
use warnings;
use Parse::RecDescent;
$::RD_HINT = 1;
$::RD_ERRORS = 1;
$::RD_WARN = 1;
my $parser = new Parse::RecDescent(q{
dictionary : '{' key_value(s /\|/) '}'
{ $return = { map { $_->[0] => $_->[1] } @{$item[2]} } }
key_value : key '=' value
{ [ @item{qw(key value)} ] }
key : /[A-Za-z0-9_ ]+/
value : dictionary | string
string : /[^|\}]*/
});
while (<>) {
my $msg = $parser->dictionary($_);
# print Dumper($msg);
}
==============
However, it gets very slow with larger files. If I generate data files with adjustable size with:
==============
#!/usr/bin/perl
use strict;
use warnings;
print "{reply_to={message_type=portfolio_get|private=PP}|portfolio_name=PP|ordervalidate=OK|portfolio_positions={";
print join "|", map "portfolio_position$_={instrument_id={instrument_tag=$_|market=MMMM|feedcode=FFFF|isincode=IIII|underlying=UUUU|kind=Spot|currency=CUR|multiplier=2|assettype=Equities}|volume=$_|invested=$_|accrued=$_|fee=$_|commission=$_|change_in_volume=$_|change_in_invested=$_|change_in_accrued=$_|change_in_fee=$_|change_in_commission=$_|currency=CUR}", 1 .. shift;
print "}}|error=0}";
==============
I get execution times like
size filesiz time
10 3398 .25
20 6788 .42
50 16958 1.17
100 33920 3.29
200 69020 10.9
300 104120 24.1
400 139220 42.0
500 174320 63.8
1000 349832 264
1500 531332 605
and these are kernel user time, not wallclock - it's eating all CPU a for a long time!
And the files aren't *that* big.
How can I get it faster ? It seems already simple enough. Please help!
Sincerely.
/ Jan Sundberg
PRD parser expontentially slower with larger files
| Tweet |
|
Search Discussions
-
Sean O'Rourke at Nov 30, 2006 at 5:21 pm ⇧
Inlining some rules into their parents and tossing in a (probably
unnecessary) <commit> yields something about 1/3 faster:
$parser2 = new Parse::RecDescent(<<'EOS');
dictionary : '{' key_value(s /\|/) '}'
{ $return = { map { $_->[0] => $_->[1] } @{$item[2]} } }
dictionary2 : key_value(s /\|/) '}'
{ $return = { map { $_->[0] => $_->[1] } @{$item[1]} } }
key_value : /[A-Za-z0-9_ ]+/ '=' value
{ $return = [ @item[1,3] ] }
value : '{' <commit> dictionary2 { $return = $item[3] }/[^|\}]*/EOS
Calling new rules is slow, as is backtracking, but P::RD can't be
speedy. You might try Perl6::Rules, which compiles to a regex.
If you're lucky, it won't tickle any of the bugs in /(??{ ... })/
for your particular grammar, and should be faster.
/s
-
Randal L. Schwartz at Dec 2, 2006 at 12:46 pm ⇧
Jan> However, it gets very slow with larger files. If I generate data files with adjustable size with:"Jan" == Jan Sundberg writes:
It's a known problem, unfixable without breaking something, unless
you gave a pragma to stop supporting some of the features.
The problem is that PRD uses "nibbling" to scan the string, which looks good
on paper, but doesn't scale when you start getting larger data. The proper
approach ("inchworming") wasn't available when Damian started writing PRD
(perl 5.003), so we're stuck with the current implementation.
See <http://www.stonehenge.com/merlyn/UnixReview/col55.html> for
a better description of the terms.
--
Randal L. Schwartz - Stonehenge Consulting Services, Inc. - +1 503 777 0095
<merlyn@stonehenge.com> <URL:http://www.stonehenge.com/merlyn/>
Perl/Unix/security consulting, Technical writing, Comedy, etc. etc.
See PerlTraining.Stonehenge.com for onsite and open-enrollment Perl training!
Related Discussions
Discussion Navigation
| view | thread | post |
Discussion Overview
| group | recdescent
|
| categories | perl |
| posted | Nov 30, '06 at 11:11a |
| active | Dec 2, '06 at 12:46p |
| posts | 3 |
| users | 3 |
| website | metacpan.org... |
