FAQ
The following grammar (piece) works fine:

geoloc: geoloc_ and(?) geoloc_ | geoloc_
geoloc_: city | state | country | area

but per
http://search.cpan.org/~dconway/Parse-RecDescent-v1.95.1/lib/Parse/RecDescent.pm#Subrules

I want distinct geoloc values, so I did this:

geoloc: geoloc1 and(?) geoloc2 | geoloc_
geeloc1: geoloc_
geeloc2: geoloc_
geoloc_: city | state | country | area


but then the autotrace immediately fails:

2|tok_of_ty| |" bahrain kuwait"
2|tok_of_ty|Trying subrule: [geoloc] |
3| geoloc |Trying rule: [geoloc] |
3| geoloc |Trying production: [geoloc1 and |
geoloc2] |
3| geoloc |Trying subrule: [geoloc1] |
3| geoloc |<<Didn't match subrule: [geoloc1]>> |

--
Terrence Brannon - SID W049945
614-213-2475 (office)
614-213-3426 (fax)
818-359-0893 (cell)



-----------------------------------------
This communication is for informational purposes only. It is not
intended as an offer or solicitation for the purchase or sale of
any financial instrument or as an official confirmation of any
transaction. All market prices, data and other information are not
warranted as to completeness or accuracy and are subject to change
without notice. Any comments or statements made herein do not
necessarily reflect those of JPMorgan Chase & Co., its subsidiaries
and affiliates.

This transmission may contain information that is privileged,
confidential, legally privileged, and/or exempt from disclosure
under applicable law. If you are not the intended recipient, you
are hereby notified that any disclosure, copying, distribution, or
use of the information contained herein (including any reliance
thereon) is STRICTLY PROHIBITED. Although this transmission and any
attachments are believed to be free of any virus or other defect
that might affect any computer system into which it is received and
opened, it is the responsibility of the recipient to ensure that it
is virus free and no responsibility is accepted by JPMorgan Chase &
Co., its subsidiaries and affiliates, as applicable, for any loss
or damage arising in any way from its use. If you received this
transmission in error, please immediately contact the sender and
destroy the material in its entirety, whether in electronic or hard
copy format. Thank you.

Please refer to http://www.jpmorgan.com/pages/disclosures for
disclosures relating to UK legal entities.

Search Discussions

  • Ted Zlatanov at Oct 23, 2007 at 8:40 pm
    On Tue, 23 Oct 2007 14:31:03 -0400 terrence.x.brannon@jpmchase.com wrote:

    txb> The following grammar (piece) works fine:
    txb> geoloc: geoloc_ and(?) geoloc_ | geoloc_
    txb> geoloc_: city | state | country | area

    txb> but per
    txb> http://search.cpan.org/~dconway/Parse-RecDescent-v1.95.1/lib/Parse/RecDescent.pm#Subrules

    txb> I want distinct geoloc values, so I did this:

    txb> geoloc: geoloc1 and(?) geoloc2 | geoloc_
    txb> geeloc1: geoloc_
    txb> geeloc2: geoloc_
    txb> geoloc_: city | state | country | area


    txb> but then the autotrace immediately fails:

    txb> 2|tok_of_ty| |" bahrain kuwait"
    txb> 2|tok_of_ty|Trying subrule: [geoloc] |
    txb> 3| geoloc |Trying rule: [geoloc] |
    txb> 3| geoloc |Trying production: [geoloc1 and |
    txb> | |geoloc2] |
    txb> 3| geoloc |Trying subrule: [geoloc1] |
    txb> 3| geoloc |<<Didn't match subrule: [geoloc1]>> |

    Any chance you can post a complete example with sample input that fails?

    Thanks
    Ted
  • Terrence Brannon at Oct 24, 2007 at 9:17 am

    On 10/23/07, Ted Zlatanov wrote:
    Any chance you can post a complete example with sample input that fails?
    What was I thinking? My goodness, par for the course when submitting a
    bug. Anyway, my reduced test case works just fine, which hopefully
    means that it's on my end of things. So I will just do some more
    head-scratching on my end for now. Here is my test case that worked
    flawlessly:

    use strict;
    use warnings;

    use Data::Dumper;

    use Parse::RecDescent;

    # Generate a parser from the specification in $grammar:

    my $grammar = << 'EOGRAMMAR';
    store: name geoloc

    name: "trader joe's" | "whole foods"

    geoloc: geoloc1 and(?) geoloc2 | geoloc_
    geoloc1: geoloc_
    geoloc2: geoloc_
    geoloc_: city | state | country | area

    and: 'and'

    city: 'los angeles' | 'new york'
    state: 'california' | 'new york'
    country: 'united states'
    area: 'north' | 'south' | 'east' | 'west'

    EOGRAMMAR

    $::RD_AUTOACTION = q { [\%item] } ;

    my $parser = new Parse::RecDescent ($grammar);


    my $r = $parser->store("trader joe's los angeles california");

    warn Dumper $r;
  • Ted Zlatanov at Oct 24, 2007 at 7:08 pm
    On Wed, 24 Oct 2007 05:17:11 -0400 "Terrence Brannon" wrote:

    TB> On 10/23/07, Ted Zlatanov wrote:
    Any chance you can post a complete example with sample input that fails?
    TB> my $grammar = << 'EOGRAMMAR';
    TB> store: name geoloc

    TB> name: "trader joe's" | "whole foods"

    TB> geoloc: geoloc1 and(?) geoloc2 | geoloc_
    TB> geoloc1: geoloc_
    TB> geoloc2: geoloc_
    TB> geoloc_: city | state | country | area

    TB> and: 'and'

    TB> city: 'los angeles' | 'new york'
    TB> state: 'california' | 'new york'
    TB> country: 'united states'
    TB> area: 'north' | 'south' | 'east' | 'west'

    TB> EOGRAMMAR

    I think actions may be the answer for your original problem, which was
    to distinguish the two positions (so you created the geoloc1 and geoloc2
    rules). An action like this:

    { $return = { item1 => $item[1], item2 => $item[3] }; }

    would give you back a hash with the entries for your matched items named
    appropriately. I don't know why you had subrule problems, sorry.

    You could also consider the <leftop> command, which could set up
    "A and B and C" parsing for you, unlike your current rules which only
    accomodate one "and". You can use actions again to return the right
    things by name.

    Hope this helps.
    Ted
  • Terrence X Brannon at Oct 30, 2007 at 5:42 pm
    Note: I'm sorry for the long disclaimer in these emails. I have put in
    requests to GMANE and Nabble to archive this list, so hopefully my future
    posts (while at $dayjob) will be easier for others to trim and reply to.
    With that said, I continue...

    Ok, I'm wondering what would be the best way to test for a match against a
    grammar where there may be up to 25 "junk" characters preceding the match.
    We know that PRD does not do backtracking:

    http://search.cpan.org/dist/Parse-RecDescent-FAQ/FAQ.pm#Answer_by_Randal_L._Schwartz

    And we know that greediness is difficult to control:

    http://search.cpan.org/~dconway/Parse-RecDescent-v1.95.1/lib/Parse/RecDescent.pm#ON-GOING_ISSUES_AND_FUTURE_DIRECTIONS

    But given all that, and the toy grammar below, which matches "sir george",
    how could we modify it to match "io;ajwer;i324 sir george"

    One possible strategy is to simply feed 25 successive substrings, chopping
    off one character at a time.

    #!/usr/bin/perl

    use strict;
    use warnings;

    use Data::Dumper;

    use lib '.' ;
    use Parse::RecDescent;

    {
    #last;
    #$::RD_WARN++;
    #$::RD_HINT++;
    $::RD_TRACE++;
    }

    $::RD_AUTOACTION =
    q { [@item] } ;

    my $G = << 'EOGRAMMAR' ;

    name: name_types eofile { $return = $item[1] }
    eofile: /^\Z/

    name_types: royal

    royal: title firstname of(?)

    title: 'sir' | 'his holiness'

    firstname: 'george' | 'john'

    of: 'of' place

    place: 'kent'


    EOGRAMMAR

    my $p = Parse::RecDescent->new($G) ;

    my $string = "sir george";

    my $parser = 'name' ;

    my $r = $p -> $parser ( $string ) ;

    warn Dumper $r;


    --
    Terrence Brannon - SID W049945
    614-213-2475 (office)
    614-213-3426 (fax)
    818-359-0893 (cell)



    -----------------------------------------
    This communication is for informational purposes only. It is not
    intended as an offer or solicitation for the purchase or sale of
    any financial instrument or as an official confirmation of any
    transaction. All market prices, data and other information are not
    warranted as to completeness or accuracy and are subject to change
    without notice. Any comments or statements made herein do not
    necessarily reflect those of JPMorgan Chase & Co., its subsidiaries
    and affiliates.

    This transmission may contain information that is privileged,
    confidential, legally privileged, and/or exempt from disclosure
    under applicable law. If you are not the intended recipient, you
    are hereby notified that any disclosure, copying, distribution, or
    use of the information contained herein (including any reliance
    thereon) is STRICTLY PROHIBITED. Although this transmission and any
    attachments are believed to be free of any virus or other defect
    that might affect any computer system into which it is received and
    opened, it is the responsibility of the recipient to ensure that it
    is virus free and no responsibility is accepted by JPMorgan Chase &
    Co., its subsidiaries and affiliates, as applicable, for any loss
    or damage arising in any way from its use. If you received this
    transmission in error, please immediately contact the sender and
    destroy the material in its entirety, whether in electronic or hard
    copy format. Thank you.

    Please refer to http://www.jpmorgan.com/pages/disclosures for
    disclosures relating to UK legal entities.
  • Ted Zlatanov at Oct 31, 2007 at 12:07 pm
    On Tue, 30 Oct 2007 13:41:22 -0400 terrence.x.brannon@jpmchase.com wrote:

    txb> But given all that, and the toy grammar below, which matches "sir george",
    txb> how could we modify it to match "io;ajwer;i324 sir george"

    Here's my solution:

    1) treat the whole input as a line made of items
    2) each item can be a word or a name (we try to match a 'name' first)
    3) a word is any number of non-space characters

    The key is to walk through the input, looking for names. I tried a few
    other inputs and they seemed to work the way you want.

    Ted

    #!/usr/bin/perl

    use warnings;
    use strict;

    use Data::Dumper;
    use Parse::RecDescent;

    {
    #last;
    #$::RD_WARN++;
    #$::RD_HINT++;
    $::RD_TRACE++;
    }

    $::RD_AUTOACTION =
    q { [@item] } ;

    my $G = << 'EOGRAMMAR' ;

    line: item(s)
    item: name | word
    name: name_types eofile { $return = $item[1] }
    eofile: /^\Z/

    word: /\S+/

    name_types: royal

    royal: title firstname of(?)

    title: 'sir' | 'his holiness'

    firstname: 'george' | 'john'

    of: 'of' place

    place: 'kent'


    EOGRAMMAR

    my $p = Parse::RecDescent->new($G) ;

    my @strings = ("hello there sir george",
    "io;ajwer;i324 sir george",
    "welcome his holiness john of kent");

    my $parser = 'line' ;

    foreach my $string (@strings)
    {
    my $r = $p -> $parser ( $string ) ;
    warn Dumper $r;
    }
  • Terrence X Brannon at Oct 31, 2007 at 5:25 pm
    I will be adding this to the FAQ shortly, but thought I would mention
    how excellent Data::Match

    http://search.cpan.org/~kstephens/Data-Match-0.06/Match.pm

    has been for my work.


    So, I have these deeply nested parse trees thanks to my use of an
    Autoaction:
    $::RD_AUTOACTION = q { [@item] } ;

    I first inquired about various tools for spelunking in such trees:
    http://perlmonks.org/?node_id=646560

    And am very happy so far with Data::Match. Here is a sample parse tree
    from my parse:


    $VAR1 = [
    'simple_house',
    [
    'pre_simple',
    [
    'geoloc',
    [
    'geoloc_',
    [
    'place',
    [
    'city',
    'taipei'
    ]
    ]
    ]
    ]
    ],
    [
    'house',
    'house'
    ],
    []
    ];

    Now, my goal is to get at the inner-most array in $VAR1->[1]:

    [
    'city',
    'taipei'
    ]


    And with Data::Match, I can do so in a very definitional fashion:

    my $match = match
    (
    # The parse tree
    $self->{parse_result}[1],

    # The Data::Match pattern match template
    FIND (
    COLLECT (
    'x',
    [
    EXPR(q{! ref}),
    EXPR(q{! ref})


    ]
    )
    )
    );

    The pattern is basically saying: "match an array ref consisting of 2
    elements where each element is not an reference of any sort". Since
    the strings 'city' and 'taipei' both fulfill that criterion that is
    what matches.

    --
    Terrence Brannon - SID W049945
    614-213-2475 (office)
    614-213-3426 (fax)
    818-359-0893 (cell)



    -----------------------------------------
    This communication is for informational purposes only. It is not
    intended as an offer or solicitation for the purchase or sale of
    any financial instrument or as an official confirmation of any
    transaction. All market prices, data and other information are not
    warranted as to completeness or accuracy and are subject to change
    without notice. Any comments or statements made herein do not
    necessarily reflect those of JPMorgan Chase & Co., its subsidiaries
    and affiliates.

    This transmission may contain information that is privileged,
    confidential, legally privileged, and/or exempt from disclosure
    under applicable law. If you are not the intended recipient, you
    are hereby notified that any disclosure, copying, distribution, or
    use of the information contained herein (including any reliance
    thereon) is STRICTLY PROHIBITED. Although this transmission and any
    attachments are believed to be free of any virus or other defect
    that might affect any computer system into which it is received and
    opened, it is the responsibility of the recipient to ensure that it
    is virus free and no responsibility is accepted by JPMorgan Chase &
    Co., its subsidiaries and affiliates, as applicable, for any loss
    or damage arising in any way from its use. If you received this
    transmission in error, please immediately contact the sender and
    destroy the material in its entirety, whether in electronic or hard
    copy format. Thank you.

    Please refer to http://www.jpmorgan.com/pages/disclosures for
    disclosures relating to UK legal entities.
  • Terrence X Brannon at Nov 2, 2007 at 7:24 pm
    Large parts of my current grammar consist of alternation lists which are
    easier entered into a file as follows:

    +the(?) /bahamas?/ #bahamas
    chile
    cuba
    /d(a|e)nmark/

    The items preceded with a plus are non-terminals and entered literally
    The iterms preceded with a slash are regexps and are entered literally
    The other items get single quote marks put around them

    And all items are thrown into an alternation, such as

    country: the(?) /bahamas?/ # bahamas
    'chile'
    'cuba'
    /d(a|e)nmark/
    This saves me from typing double quotes.
    And sorting the file makes it easy to check for duplicate entries. Which
    is very possible when entering as many records as I am.
    And of course it is nice to have the main grammar much smaller.

    I'm starting to like Class::Base for all my OOP work, so I would probably
    have an API like:

    my $o = Parse::RecDescent::Slurp(base => 'path/to/data/files/');
    for my $rule (qw(city country state)) {
    $grammar = sprintf "$grammar\n%s\n", $o->slurp($rule) ; # rule and data
    file name are the same unless 2nd arg gives rule explicit name
    }

    my $p = Parse::RecDescent->new($grammar);

    or maybe I should provide the grammar to the constructor and have slurp
    automatically tack the rules on at the end...

    at any rate, I dont like how Class::Base takes named parms for the
    constructor but positional parms for the methods... I think I will see
    what
    perlmonks like for their OO-work.

    Just brainstorming for now anyway...

    --
    Terrence Brannon - SID W049945
    614-213-2475 (office)
    614-213-3426 (fax)
    818-359-0893 (cell)

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
grouprecdescent @
categoriesperl
postedOct 23, '07 at 6:31p
activeNov 2, '07 at 7:24p
posts8
users3
websitemetacpan.org...

People

Translate

site design / logo © 2019 Grokbase