FAQ

Topaz and Regular Expressions

Ed Peschko
Oct 15, 1999 at 5:12 pm
I just listened to the topaz talk ( http://www.perl.com/pub/1999/09/topaz.html),
and found it and accurate in two respects: 1) that a good rewrite
would lower the barriers to entry for coders (like me) quite a bit, and 2)
that the 'hard issues' are going to be what causes its success or failure.

So I'm surprised that regular expressions weren't mentioned - exactly how
entwined is the current implementation around perl5, and how easy will it be
to engineer their inclusion in topaz?

Ed
reply

Search Discussions

9 responses

  • Chip Salzenberg at Oct 18, 1999 at 6:14 pm

    According to Ed Peschko:
    I just listened to the topaz talk ( http://www.perl.com/pub/1999/09/topaz.html),
    and found it and accurate in two respects: 1) that a good rewrite
    would lower the barriers to entry for coders (like me) quite a bit, and 2)
    that the 'hard issues' are going to be what causes its success or failure.
    Thanks for the feedback. Many have disputed one point or another. :-)
    So I'm surprised that regular expressions weren't mentioned - exactly how
    entwined is the current implementation around perl5, and how easy will it be
    to engineer their inclusion in topaz?
    I expect very few problems with the regex engine. It's a separate
    language unto itself. Its connections with Perl's guts are limited to
    a few specific points, as I recall: (1) parameter calculations, (2)
    returned values, and (3) callouts via (?{}). No biggie.

    The lexer will be a lot more complicated....
    --
    Chip Salzenberg - a.k.a. - <chip@valinux.com>
    "I am the Lemon Zester of Destruction!" //MST3K
  • Ed Peschko at Oct 19, 1999 at 11:20 pm

    I expect very few problems with the regex engine. It's a separate
    language unto itself. Its connections with Perl's guts are limited to
    a few specific points, as I recall: (1) parameter calculations, (2)
    returned values, and (3) callouts via (?{}). No biggie.
    Ok, I guess that leads into my next questions:

    1) is the regular expression engine itself going to get a rewrite/revise?
    2)how 'clean' is the API going to be? I mean right now with wPerl, you can
    embed perl in C++ but it comes at the expense of embedding an interpreter.
    Wheras some of us would like to say:

    #include <Scalar.h>

    main()
    {
    Scalar a = new Scalar();
    a+= "HELLO HERE";

    while (a.regmatch("H(.*?)(?=H|$)", "sg"))
    {
    cout << "MATCHED ON " << a.dollar(1) << "\n";
    }
    }

    without paying the interpreter overhead. Will this be available
    'out of the box' or will a API layer be needed?

    Ed
  • Chip Salzenberg at Oct 19, 1999 at 11:43 pm

    According to Ed Peschko:
    1) is the regular expression engine itself going to get a rewrite/revise?
    I think Ilya's handling that side of the house quite well. I'd like
    to minimize the changes required and let him keep working on it.
    2)how 'clean' is the API going to be?
    while (a.regmatch("H(.*?)(?=H|$)", "sg"))
    That'd be very nice to have, and I'd love to see it. I can't make any
    promises, though. It'll take a great deal of care to make sure that
    all connections back from the regex engine to the Perl runtime can be
    severed cleanly on demand.
    --
    Chip Salzenberg - a.k.a. - <chip@valinux.com>
    "I am the Lemon Zester of Destruction!" //MST3K
  • Ken Fox at Oct 20, 1999 at 1:37 am

    Ed Peschko writes:
    embed perl in C++ but it comes at the expense of embedding an interpreter.
    I wish there were a direct interface into regex matching (and other ops
    for that matter), but I don't see a problem with embedding itself. I
    think it's an incredibly difficult challenge to optimize ops without
    assuming that a perl interpreter is available. For example, would you
    want the regex engine to abstract the concept of evaluating a
    replacement string? That might hurt performance/maintanability quite a
    bit. Is it worth it?
    Wheras some of us would like to say: ...
    while (a.regmatch("H(.*?)(?=H|$)", "sg"))
    This interface looks pretty, but it is a total pain to optimize. It
    is possible to special case for (const char *) and lookup pre-built
    regexps from a cache -- but that leads to surprises when a programmer
    changes from a constant to a variable and suddenly the performance
    drops by an order.

    I'd much rather encourage user-visible regexp objects. IMHO C++ can be
    used to hide a great deal of complexity, but when things that look fast
    run extremely slowly, the hiding gets in the way. (This is one of the
    reasons Linux developers oppose C++ -- it's hard to "get a feel" for
    the way code will run by just looking at it.)

    Just my $.02 of course. (libperl++ allows (char *) in place of
    regexp objects BTW... ;)
    without paying the interpreter overhead. Will this be available
    'out of the box' or will a API layer be needed?
    I hope that we will be able to run ops directly without bouncing
    through the trampoline code. Is that what you mean? I rather think
    that an API layer is a very good idea -- I just don't want the API
    to be the current perl_call_sv() interface...

    - Ken

    --
    Ken Fox, kfox@ford.com, (313)59-44794
    ------------------------------------------------------------------------
    Ford Motor Company, Powertrain | "Is this some sort of trick
    Analytical Powertrain Methods Department | question or what?" -- Calvin
    C3P Implementation Section |
  • John Porter at Oct 20, 1999 at 2:15 pm

    Ken Fox wrote:

    IMHO C++ can be
    used to hide a great deal of complexity, but when things that look fast
    run extremely slowly, the hiding gets in the way. (This is one of the
    reasons Linux developers oppose C++ -- it's hard to "get a feel" for
    the way code will run by just looking at it.)
    s/Linux developers/all systems programmers/.

    I sure would like to know how this meme got started,
    that Linux developers are the first and only systems
    programmers.

    --
    John Porter
  • Chip Salzenberg at Oct 20, 1999 at 8:11 pm

    According to John Porter:
    I sure would like to know how this meme got started,
    that Linux developers are the first and only systems
    programmers.
    I hadn't noticed that meme, really. C++ was actually a subject of
    heated debate on linux-kernel within the last year, and I don't know
    if the same is true of other systems programming groups.
    --
    Chip Salzenberg - a.k.a. - <chip@valinux.com>
    "I am the Lemon Zester of Destruction!" //MST3K
  • Ken Fox at Oct 20, 1999 at 9:34 pm

    John Porter writes:
    Ken Fox wrote:
    This is one of the reasons Linux developers oppose C++ -- it's hard
    to "get a feel" for the way code will run by just looking at it.
    s/Linux developers/all systems programmers/.
    Well, that's not a true statement -- I happen to know a system programmer
    who advocates C++. ;)
    I sure would like to know how this meme got started,
    that Linux developers are the first and only systems
    programmers.
    I never said that and certainly I don't believe it. Perhaps the meme
    exists only in the popular press? Or maybe you just have a healthy
    skepticism towards the popular? ;)

    - Ken

    --
    Ken Fox, kfox@ford.com, (313)59-44794
    ------------------------------------------------------------------------
    Ford Motor Company, Powertrain | "Is this some sort of trick
    Analytical Powertrain Methods Department | question or what?" -- Calvin
    C3P Implementation Section |
  • John Porter at Oct 21, 1999 at 2:37 pm

    Ken Fox wrote:
    John Porter writes:
    Ken Fox wrote:
    This is one of the reasons Linux developers oppose C++ -- it's hard
    to "get a feel" for the way code will run by just looking at it.
    s/Linux developers/all systems programmers/.
    Well, that's not a true statement -- I happen to know a system programmer
    who advocates C++. ;)
    Well, o.k., I should have said "systems programmers in general".

    I sure would like to know how this meme got started,
    that Linux developers are the first and only systems
    programmers.
    I never said that and certainly I don't believe it.
    (Not to make too big a deal of it... but) I inferred from your
    statement a lack of awareness of the existence -- and, dare
    I say, predominance -- of non-Linux systems programmers.
    More likely, I suppose, is that you didn't want to
    *over*-generalize, for which I credit you.

    Or maybe you just have a healthy
    skepticism towards the popular? ;)
    Yes, there's that. :-)

    cheers, hand,
    John Porter
  • Ed Peschko at Oct 20, 1999 at 5:22 pm

    On Tue, Oct 19, 1999 at 09:37:28PM -0400, Ken Fox wrote:
    Ed Peschko writes:
    embed perl in C++ but it comes at the expense of embedding an interpreter.
    I wish there were a direct interface into regex matching (and other ops
    for that matter), but I don't see a problem with embedding itself. I
    think it's an incredibly difficult challenge to optimize ops without
    assuming that a perl interpreter is available. For example, would you
    want the regex engine to abstract the concept of evaluating a
    replacement string? That might hurt performance/maintanability quite a
    bit. Is it worth it?
    I don't quite understand - when you say 'evaluation', are you referring to
    something like:

    $line =~ s"(.*)" $1 x 2"ge;

    or

    $parenmatch =
    qr{ \(
    (?:
    (?>$major_re|[^()])+
    (?p{ $parenmatch })
    )*
    \)
    }x;


    where there are 'perl bits' embedded in the regular expression? If so, I'd say
    give the opportunity to disable them - by #ifdef if necessary - and provide the
    direct interface with the missing functionality.

    The importance of all this? Well, I've been thinking a bit about the whole
    'conquer the world' thing. I agree with Chip - there are basically two types
    of languages : systems languages and applications languages. Perl is
    relatively good as an application language but it isn't much as a system
    language.

    So if we want perl to grow into the system arena, making perl a viable API would
    help quite a bit. The perl API probably would not make itself into time
    sensitive applications like kernels, but large mission-critical systems
    (like business systems software) definitely. We do it already at our company,
    but in limited scope because of the interpreter issue and thread-safeness.
    Wheras some of us would like to say: ....
    while (a.regmatch("H(.*?)(?=H|$)", "sg"))
    This interface looks pretty, but it is a total pain to optimize. It
    is possible to special case for (const char *) and lookup pre-built
    regexps from a cache -- but that leads to surprises when a programmer
    changes from a constant to a variable and suddenly the performance
    drops by an order.
    I'm not sure what you mean here, either. Are you talking about interpolation,
    about stuff like:

    a.regmatch("H(.*)$variable", "sg")

    If so, I think that the C++ regmatch should handle char strings and *only*
    char strings, not do any interpolation whatsoever ( well, except for internal
    interpolation for things like $1). Instead I'd think that interpolation could
    be handled by something like.

    Scalar a = new Scalar("HELLO WORLD");

    Scalar b = new Scalar("This is $a");
    b.Interpolate("a", a); // This is HELLO WORLD

    ie, that the interpolation is done explicity by the user.
    I'd much rather encourage user-visible regexp objects. IMHO C++ can be
    used to hide a great deal of complexity, but when things that look fast
    run extremely slowly, the hiding gets in the way. (This is one of the
    reasons Linux developers oppose C++ -- it's hard to "get a feel" for
    the way code will run by just looking at it.)
    Again, 'user visible' is a bit of an abstraction... Could you provide a bit
    more in the way of syntax or example?

    In any case, what I'm proposing is pretty much the same as what MS is trying
    to do with 'cool' (horrid name) - provide a useful layer of abstraction for
    datatypes whilst still maintaining an 'in place' language (ie: C++) The theory
    then being that you can migrate to the new API with a lot less pain than if
    they were writing to a new language.
    I hope that we will be able to run ops directly without bouncing
    through the trampoline code. Is that what you mean? I rather think
    that an API layer is a very good idea -- I just don't want the API
    to be the current perl_call_sv() interface...
    Well, I think that a good API would come out of Topaz if, in implementing it,
    the conscious intent to make a good API - sans interpreter - was there and it
    was banged against whilst development. So perhaps 'we' (the collective we of
    C++ users) should be thrashing against it to see how useful it is currently for
    development, with the lexer in the back of our mind.

    Ed

Related Discussions

Discussion Navigation
viewthread | post