FAQ
Hello all,

I have a few files that are on average 30MB and need to be processed through
a perl script. The script ends up taking almost and hour. Thing is, the
script cannot run more than an hour cause another script is kicked off (same
script different data every hour).

What is the best way to read the data in and process it? Read each line of
the file into an array and then process each line or just process the line
directly?

Anyone have any scripts that process large files quickly? I'd love to see
examples of how you did it.

Thanks,
Kevin

Search Discussions

  • Kevin Old at Mar 12, 2002 at 1:43 am
    Hello all,

    I have a few files that are on average 30MB and need to be processed through
    a perl script. The script ends up taking almost and hour. Thing is, the
    script cannot run more than an hour cause another script is kicked off (same
    script different data every hour).

    What is the best way to read the data in and process it? Read each line of
    the file into an array and then process each line or just process the line
    directly?

    Anyone have any scripts that process large files quickly? I'd love to see
    examples of how you did it.

    Thanks,
    Kevin
  • Timothy Johnson at Mar 12, 2002 at 3:13 am
    If you gave some code with your question, I would have a better idea what is
    taking so long. I will venture a guess, only because I know what happened
    when I first started working with large files. The first thing to check is
    if you have any code that looks like this:
    ---------------------------
    open(INFILE,"bigfile.log");
    my @infile = <INFILE>;
    foreach(@infile){
    do something...
    }
    ---------------------------

    If you do, you'll want to change it to something like this:

    ---------------------------
    open(INFILE,"bigfile.log");
    while(<INFILE>){
    do something...
    }
    ---------------------------

    The differenc maye seem like a small one when you are working with smaller
    files, but if you are working with large files, you will soon find that
    loading 30MB into RAM before even starting to parse the file is very costly
    in terms of resources, and can make your scripts much longer than they
    really have to be. This might not even be your problem, though. The best
    way to get a good answer is to ask a good question, and if you want to know
    a better way to code something you should provide some of your code as a
    reference.

    -----Original Message-----
    From: Kevin Old
    To: beginners@perl.org
    Sent: 3/11/02 5:42 PM
    Subject: Process large files quickly...how to?

    Hello all,

    I have a few files that are on average 30MB and need to be processed
    through
    a perl script. The script ends up taking almost and hour. Thing is,
    the
    script cannot run more than an hour cause another script is kicked off
    (same
    script different data every hour).

    What is the best way to read the data in and process it? Read each line
    of
    the file into an array and then process each line or just process the
    line
    directly?

    Anyone have any scripts that process large files quickly? I'd love to
    see
    examples of how you did it.

    Thanks,
    Kevin




    --
    To unsubscribe, e-mail: beginners-unsubscribe@perl.org
    For additional commands, e-mail: beginners-help@perl.org


    --------------------------------------------------------------------------------
    This email may contain confidential and privileged
    material for the sole use of the intended recipient.
    If you are not the intended recipient, please contact
    the sender and delete all copies.
  • Russ Foster at Mar 12, 2002 at 4:45 pm
    I suppose it depends on what kind of processing you are doing. Can you give
    us examples?

    For that size of data, I would make sure you are reading the file in line by
    line (as opposed to reading it to an array).

    Also, anything that prints to the screen will slow things down considerably.
    I know it's nice to see the line count continuing to increase so you know
    something is happening, but printing even a single character adds a lot of
    overhead.

    Save most of the printing until the end to dump your results.

    If I want a status, I usually print a single char (like a "-") for every
    1,000 lines processed, and a "+" at the 10,000 marks.

    Or...depending on the data...I'll check the file size ahead of time then
    create an estimated finish time. I know, based on historical runs that I can
    process about $X MB/minute. If the file is $Y MB, it should take $Y/$X
    minutes to complete. Print this info up front, then a status line every so
    often...

    -rjf

    -----Original Message-----
    From: Kevin Old
    Sent: Monday, March 11, 2002 19:42
    To: beginners@perl.org
    Subject: Process large files quickly...how to?


    Hello all,

    I have a few files that are on average 30MB and need to be
    processed through a perl script. The script ends up taking
    almost and hour. Thing is, the script cannot run more than
    an hour cause another script is kicked off (same script
    different data every hour).

    What is the best way to read the data in and process it?
    Read each line of the file into an array and then process
    each line or just process the line directly?

    Anyone have any scripts that process large files quickly?
    I'd love to see examples of how you did it.

    Thanks,
    Kevin




    --
    To unsubscribe, e-mail: beginners-unsubscribe@perl.org
    For additional commands, e-mail: beginners-help@perl.org

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupbeginners @
categoriesperl
postedMar 12, '02 at 1:40a
activeMar 12, '02 at 4:45p
posts4
users4
websiteperl.org

People

Translate

site design / logo © 2022 Grokbase