FAQ
I was optimizing some log-parsing script today, and found
DateTime::Format::Strptime to be bottleneck. Out of curiosity I wrote
simple benchmark. Code is here:
     http://pastebin.com/PU8nXGPW

Results are interesting, as they show fairly noticeable differences:
     http://pastebin.com/S4rt6bYd
(run on Ubuntu 12.04 with perl 5.14.2 and ubuntu-packaged datetime
modules)

While the fact that „trying many approaches” parses like Natural or
Flexible are slow is, well, natural (still DateParse is much better),
I find it confusing that Strptime is soo slow. ISO8601 parser also
has fairly strict syntax to handle so I expected it to perform better...

Any insights? And which parsing method would you recommend to get
optimal performance?

~~~~ Sidenote ~~~~~

DateTime::Format::Builder does not give any easy way to treat
„below-second” part with float semantic (treat .12 as 120 miliseconds,
treat .1347 as 134700 microseconds, etc). In spite of the fact, that
this is most natural and ... only sensible semantics.

As you can see from my code and it's results, DateTime::Format::Flexible
falls into this trap (treats those digits as nanoseconds even if there
are less than 9 of them), while my hand-made builders require
postprocessing to clean this field up.

Am I missing sth? Is there a way to handle this better?

Search Discussions

  • Rick Measham at Jan 30, 2014 at 9:44 pm
    First note is that Strptime is designed to be flexible. Speed wasn't the goal. That said, it also wasn't designed to be slow!

    Second, and most importantly, your benchmark should fare MUCH better if you don't create a new parser every time. Create the parser outside the benchmark sub and it should be muuuuuuch quicker. And that is how your production code should work.

    - Rick
    📱
    On 31 Jan 2014, at 5:58, Marcin Kasperski wrote:


    I was optimizing some log-parsing script today, and found
    DateTime::Format::Strptime to be bottleneck. Out of curiosity I wrote
    simple benchmark. Code is here:
    http://pastebin.com/PU8nXGPW

    Results are interesting, as they show fairly noticeable differences:
    http://pastebin.com/S4rt6bYd
    (run on Ubuntu 12.04 with perl 5.14.2 and ubuntu-packaged datetime
    modules)

    While the fact that „trying many approaches” parses like Natural or
    Flexible are slow is, well, natural (still DateParse is much better),
    I find it confusing that Strptime is soo slow. ISO8601 parser also
    has fairly strict syntax to handle so I expected it to perform better...

    Any insights? And which parsing method would you recommend to get
    optimal performance?

    ~~~~ Sidenote ~~~~~

    DateTime::Format::Builder does not give any easy way to treat
    „below-second” part with float semantic (treat .12 as 120 miliseconds,
    treat .1347 as 134700 microseconds, etc). In spite of the fact, that
    this is most natural and ... only sensible semantics.

    As you can see from my code and it's results, DateTime::Format::Flexible
    falls into this trap (treats those digits as nanoseconds even if there
    are less than 9 of them), while my hand-made builders require
    postprocessing to clean this field up.

    Am I missing sth? Is there a way to handle this better?
    --
    Message protected for iSite by MailGuard: e-mail anti-virus, anti-spam and content filtering.http://www.mailguard.com.au
    Click here to report this message as spam:
    https://login.mailguard.com.au/report/1ITnAf1Ld9/7AiiWUInKq0y5APy9tARIw/0
    --
    Message protected for iSite by MailGuard: e-mail anti-virus, anti-spam and content filtering.http://www.mailguard.com.au
  • Marcin Kasperski at Jan 30, 2014 at 10:38 pm

    Second, and most importantly, your benchmark should fare MUCH better
    if you don't create a new parser every time.
    I don’t.

    http://perldoc.perl.org/functions/state.html
  • Olivier Mengué at Jan 30, 2014 at 11:26 pm
    2014-01-30 Rick Measham <rick@measham.id.au>:
    Second, and most importantly, your benchmark should fare MUCH better if
    you don't create a new parser every time. Create the parser outside the
    benchmark sub and it should be muuuuuuch quicker. And that is how your
    production code should work.
    The code takes care to create the parser only once in a 'state' variable.
    But this state variable will be initialized on the first run, not at
    compile time. So this will still impact the performance of the first run.
    But that first run is not the one that is run in cmptheese because there is
    a first run at line 112. So I don't see an issue in that benchmark.


    Anyway I would still suggest to write the code differently to always avoid
    the 'state initialization issue' in benchmarks: use a closure instead of a
    'state' variable.

    Original code:
       'strptime' => sub {
             state $parser = DateTime::Format::Strptime->new(
                 pattern => '%F %T.%N',
                 locale => 'C',
                 time_zone => 'Europe/Warsaw',
                 on_error => 'croak',
                );
             return $parser->parse_datetime(@_);
         },

    Modified code:
       'strptime' => do {
             my $parser = DateTime::Format::Strptime->new(
                 pattern => '%F %T.%N',
                 locale => 'C',
                 time_zone => 'Europe/Warsaw',
                 on_error => 'croak',
                );
             sub { $parser->parse_datetime(@_) }
         },

    Olivier.

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupdatetime @
categoriesperl
postedJan 30, '14 at 7:08p
activeJan 30, '14 at 11:26p
posts4
users3
websitemetacpan.org...

People

Translate

site design / logo © 2019 Grokbase