Grokbase Groups Pig user January 2010
FAQ
Hi,

Every time I run a Pig script I get a number of Job jars left in the /tmp
directory of my client, 1 per MR job it seems. The file names look like
/tmp/Job875278192.jar.

I have scripts that run every five minutes and fire 10 MR jobs each, so the
amount of space used by these jars grows rapidly. Is there a way to tell Pig
to clean up after itself and remove these jars, or do I need to just write
my own clean-up script?

thanks,
Bill

Search Discussions

  • Rekha Joshi at Jan 27, 2010 at 4:41 am
    You might like to check up PIG-116 and HADOOP-5175.Also think there is a JobCleanup task which takes care of cleaning.., AFAIK.., unless its failed job.
    Cheers,
    /R


    On 1/27/10 12:01 AM, "Bill Graham" wrote:

    Hi,

    Every time I run a Pig script I get a number of Job jars left in the /tmp
    directory of my client, 1 per MR job it seems. The file names look like
    /tmp/Job875278192.jar.

    I have scripts that run every five minutes and fire 10 MR jobs each, so the
    amount of space used by these jars grows rapidly. Is there a way to tell Pig
    to clean up after itself and remove these jars, or do I need to just write
    my own clean-up script?

    thanks,
    Bill
  • Bill Graham at Jan 27, 2010 at 7:33 pm
    Thanks Rekha.

    These issues seem to be related to cleaning up Pig/Hadoop file upon shutdown
    of the VM. I just checked and when I shut down the VM, all files are cleaned
    up as expected.

    My issue is that I have Pig jobs that run in an app server which are
    triggered by quartz. It might be days or weeks between app server bounces.
    If anyone knows a way to configure or kick off some sort of cleanup process
    without shutting down the VM, please let me know.

    Otherwise, I need to deploy a hacky crontab script like this:

    find /tmp/Job[0-9]*.jar -type f -mmin +50 -exec rm {} \;

    On Tue, Jan 26, 2010 at 8:40 PM, Rekha Joshi wrote:

    You might like to check up PIG-116 and HADOOP-5175.Also think there is a
    JobCleanup task which takes care of cleaning.., AFAIK.., unless its failed
    job.
    Cheers,
    /R



    On 1/27/10 12:01 AM, "Bill Graham" wrote:

    Hi,

    Every time I run a Pig script I get a number of Job jars left in the /tmp
    directory of my client, 1 per MR job it seems. The file names look like
    /tmp/Job875278192.jar.

    I have scripts that run every five minutes and fire 10 MR jobs each, so the
    amount of space used by these jars grows rapidly. Is there a way to tell
    Pig
    to clean up after itself and remove these jars, or do I need to just write
    my own clean-up script?

    thanks,
    Bill

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupuser @
categoriespig, hadoop
postedJan 26, '10 at 6:31p
activeJan 27, '10 at 7:33p
posts3
users2
websitepig.apache.org

2 users in discussion

Bill Graham: 2 posts Rekha Joshi: 1 post

People

Translate

site design / logo © 2023 Grokbase