FAQ
I have a text file with around 1 million lines and I need to do a search

And replace on over 9000 words. I am currently reading a line and
passing
A hash table against it and any matches it is replacing the word in the
string. It is running real slow. Any thoughts on how to improve it?

Daniel Gladstone ([email protected])

***************************************************************************
The information contained in this communication is confidential, is
intended only for the use of the recipient named above, and may be legally
privileged.

If the reader of this message is not the intended recipient, you are
hereby notified that any dissemination, distribution or copying of this
communication is strictly prohibited.

If you have received this communication in error, please resend this
communication to the sender and delete the original message or any copy
of it from your computer system.

Thank You.
****************************************************************************

Search Discussions

  • Timothy Johnson at Feb 27, 2006 at 10:30 pm
    Since you haven't provided any code, it would be difficult. Here are
    some thoughts:

    * Make sure you're doing a "while(<INFILE>){" instead of "@array =
    <INFILE>"
    * Try precompiling your regular expressions.
    (see qr// in perlop under "Quote and Quote-like Operators")


    -----Original Message-----
    From: Gladstone Daniel - dglads
    Sent: Monday, February 27, 2006 8:50 AM
    To: [email protected]
    Subject: Need to improve throughput - Any thoughts

    I have a text file with around 1 million lines and I need to do a search

    And replace on over 9000 words. I am currently reading a line and
    passing
    A hash table against it and any matches it is replacing the word in the
    string. It is running real slow. Any thoughts on how to improve it?

    Daniel Gladstone ([email protected])
  • Zentara at Feb 28, 2006 at 2:15 pm

    On Mon, 27 Feb 2006 10:49:35 -0600, Daniel.Gla[email protected] ("Gladstone Daniel - dglads") wrote:

    I have a text file with around 1 million lines and I need to do a search

    And replace on over 9000 words. I am currently reading a line and
    passing
    A hash table against it and any matches it is replacing the word in the
    string. It is running real slow. Any thoughts on how to improve it?
    Look on cpan for Regexp-Optimizer.

    It will help optimize the regexp for your list. If you think about
    it, 9000 words will have alot in common, so you should be able
    to find patterns in those sets of words, and rexexp for them.
    There is no need to look for each word.

    You also might experiment with breaking your 9000 word
    list into smaller lists, use the optimizer on those smaller
    lists, and run each one against each line separately.






    --
    I'm not really a human, but I play one on earth.
    http://zentara.net/japh.html

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupbeginners @
categoriesperl
postedFeb 27, '06 at 4:49p
activeFeb 28, '06 at 2:15p
posts3
users3
websiteperl.org

People

Translate

site design / logo © 2023 Grokbase