Grokbase Groups Pig user June 2011
FAQ
Hello Pig mailing list,

I have around 10 TB of apache log files (1 TB as .gz compressed files)
and analyze these files with pig.
Obviously apache log files can be compressed pretty good with gzip, so
it would be great if Pig would accept the log files in compressed
form.

Is this possible with the CombinedLogLoader from contrib/piggybank or
is there any other way to do this? It is pretty easy with the normal
TextLoader. It automatically detects if the file is a .gz file.

If there is no way, would the RegExLoader be the correct class to extend?

Regards
Dirk

Search Discussions

  • Dmitriy Ryaboy at Jun 18, 2011 at 2:31 am
    Dirk, if you look at the code for pigStorage, you'll see some code in there that looks at file names and chooses the right input format to use based on that. You should just add the same thing to regexloader.
    On Jun 17, 2011, at 1:44 AM, "dirk.mst@gmail.com" wrote:

    Hello Pig mailing list,

    I have around 10 TB of apache log files (1 TB as .gz compressed files)
    and analyze these files with pig.
    Obviously apache log files can be compressed pretty good with gzip, so
    it would be great if Pig would accept the log files in compressed
    form.

    Is this possible with the CombinedLogLoader from contrib/piggybank or
    is there any other way to do this? It is pretty easy with the normal
    TextLoader. It automatically detects if the file is a .gz file.

    If there is no way, would the RegExLoader be the correct class to extend?

    Regards
    Dirk

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupuser @
categoriespig, hadoop
postedJun 17, '11 at 8:45a
activeJun 18, '11 at 2:31a
posts2
users2
websitepig.apache.org

2 users in discussion

Dmitriy Ryaboy: 1 post Dirk Mst: 1 post

People

Translate

site design / logo © 2021 Grokbase