FAQ
Hey all,
I'm a recdescent newbie, so please cut me some slack ;)

I've got a ~1.5Mb file that I'm parsing. The grammar is pretty well
established, in such that it's from a formal paper, and has EBNF
notation written about it. I've looked at the EBNF notation, and done
my best to simplify it. In other words, EBNF says some number should be
from 0-65535, so I just specify /\d{1,5}/ to simplify & speed up the
processing.

W/ the first set of working grammar (tested using a subset of the file),
and it has about 85 separate rules.
I tried running it on the "full" file, but it just took too damn long.

So, I went about creating a much simpler parser (even dumber), so I
could do some pre-parsing, to speed things up.

The file looks like:

(foo bar)
(foo (bar baz))
(foo "bar")
(foo (bar "baz")

And these levels of data could be several levels deep w/ data. E.g.:

(foo (bar baz)
(baz baz)
(baz (baz (baz(baz "bar")))))

So, I dumbed down my grammar (as can be seen below) but it still takes
longer than I have patience for ( > 10 minutes) to parse.

Am I SOL with parsing this file use RecDescent or is something glaringly
bad w/ the below syntax?

TIA

--dw

############################################################
# The main file has a header, and one or more object models
File : Header Model(s)

# Define what the header is
Header:
"(" /Header[\s]v[\d]+\.[\d]+\.[\d]+\.[\d]+/ ")"
<error: Invalid Header>
# Define what the object model is
Model:
"(Model"
Item(s)
")"
<error: Invalid parse of the ObjectModel>
Item:
"(" /\b[^\s]+\b/ /[^\(\)]*/ Item(s?) ")" # Simply two tokens
"(" /\b[^\s]+\b/ "\"" /[^\"]*/ "\"" Item(s?) ")"
<error>

# These items left in for clarity's sake. Functionally equivalent
# to Item above, but hopefully faster
OldItem:
"(" Label Data Item(s?) ")" # Simply two tokens
"(" Label QuotedData Item(s?) ")"
<error>
Label:
/\b[^\s]+\b/

Data:
/[^\(\)]*/

QuotedData:
"\"" /[^\"]*/ "\""
############################################################

Search Discussions

  • David Weber at Jul 17, 2006 at 7:50 pm
    FYI, here's the perl file:

    use strict;
    use Parse::RecDescent;

    $::RD_ERRORS = 1; # unless undefined, report fatal errors
    $::RD_WARN = 1; # unless undefined, also report non-fatal problems
    $::RD_HINT = 1; # if defined, also suggestion remedies
    #$::RD_TRACE = 1; # if defined, also trace parsers' behaviour
    #$::RD_AUTOSTUB = 1; # if defined, generates "stubs" for undefined
    rules
    $::RD_AUTOACTION = q{print "."}; # if defined, appends specified action
    to productions

    # Load up the grammar from the file
    open( grammarFile, "QuickGrammar.txt" ) or die "Could not open grammar
    file\n";
    my @grammar = <grammarFile>;
    close(grammarFile);

    # Check the grammar
    my $parser = Parse::RecDescent->new(join '', @grammar) or die "Bad
    Grammar";

    # Open and save the file contents
    open( parsedFile, "bigfile.txt" ) or die "Could not open input file\n";
    my @data = <parsedFile>;
    close(parsedFile);

    # Parse the file contents, joining all of the lines into a single one
    my $retValue = $parser->OMDFile(join '', @data);
    -----Original Message-----
    From: david.weber@l-3com.com
    Sent: Monday, July 17, 2006 2:47 PM
    To: recdescent@perl.org
    Subject: Speed issue w/ LARGE parsed file

    Hey all,
    I'm a recdescent newbie, so please cut me some slack ;)

    I've got a ~1.5Mb file that I'm parsing. The grammar is
    pretty well established, in such that it's from a formal
    paper, and has EBNF notation written about it. I've looked
    at the EBNF notation, and done my best to simplify it. In
    other words, EBNF says some number should be from 0-65535, so
    I just specify /\d{1,5}/ to simplify & speed up the processing.

    W/ the first set of working grammar (tested using a subset of
    the file), and it has about 85 separate rules.
    I tried running it on the "full" file, but it just took too damn long.

    So, I went about creating a much simpler parser (even
    dumber), so I could do some pre-parsing, to speed things up.

    The file looks like:

    (foo bar)
    (foo (bar baz))
    (foo "bar")
    (foo (bar "baz")

    And these levels of data could be several levels deep w/ data. E.g.:

    (foo (bar baz)
    (baz baz)
    (baz (baz (baz(baz "bar")))))

    So, I dumbed down my grammar (as can be seen below) but it
    still takes longer than I have patience for ( > 10 minutes) to parse.

    Am I SOL with parsing this file use RecDescent or is
    something glaringly bad w/ the below syntax?

    TIA

    --dw

    ############################################################
    # The main file has a header, and one or more object models
    File : Header Model(s)

    # Define what the header is
    Header:
    "(" /Header[\s]v[\d]+\.[\d]+\.[\d]+\.[\d]+/ ")"
    <error: Invalid Header>
    # Define what the object model is
    Model:
    "(Model"
    Item(s)
    ")"
    <error: Invalid parse of the ObjectModel>
    Item:
    "(" /\b[^\s]+\b/ /[^\(\)]*/ Item(s?) ")" # Simply two tokens
    "(" /\b[^\s]+\b/ "\"" /[^\"]*/ "\"" Item(s?) ")"
    <error>

    # These items left in for clarity's sake. Functionally
    equivalent # to Item above, but hopefully faster
    OldItem:
    "(" Label Data Item(s?) ")" # Simply two tokens
    "(" Label QuotedData Item(s?) ")"
    <error>
    Label:
    /\b[^\s]+\b/

    Data:
    /[^\(\)]*/

    QuotedData:
    "\"" /[^\"]*/ "\""
    ############################################################
  • Colin Kuskie at Jul 17, 2006 at 7:54 pm

    On Mon, Jul 17, 2006 at 02:47:09PM -0500, david.weber@l-3com.com wrote:

    Am I SOL with parsing this file use RecDescent or is something glaringly
    bad w/ the below syntax?
    I haven't looked at your file, but if you're using Perl 5.8.5 or greater,
    the speed issue may be caused by perl itself:

    http://rt.perl.org/rt3//Public/Bug/Display.html?id=34925

    The fix for this bug causes a huge slowdown inside of Text::Balanced,
    or any other module that reblesses objects with overloading.

    Colin Kuskie

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
grouprecdescent @
categoriesperl
postedJul 17, '06 at 7:48p
activeJul 17, '06 at 7:54p
posts3
users2
websitemetacpan.org...

2 users in discussion

David Weber: 2 posts Colin Kuskie: 1 post

People

Translate

site design / logo © 2019 Grokbase