Grokbase Groups Pig user March 2011
FAQ
Hello,

I have some LZO files, which i

a) indexed via DistributedLzoIndexer to create index files
b) did not index, so just some LZO files in a directory.

Using both approaches, I tried creating a subclass LzoBaseRegexLoader
that returns a pattern.
Sadly, not a single line matched. This is not a problem of the regex
(checked it works with other strings),
i modified LzoBaseRegexLoader.java to print the strings coming in and
I'm getting binary e.g.

http://pastebin.com/wAveGzDy

I'm using Pig 0.8 and ElephantBird checked out from
https://github.com/gerritjvv/elephant-bird

Any suggestions?

Saptarshi

Search Discussions

  • Dmitriy Ryaboy at Mar 21, 2011 at 8:22 pm
    Try the more up-to-date version at
    https://github.com/dvryaboy/elephant-bird/tree/pig-08

    please send me your class if it still fails, and the input (uncompressed)
    data to reproduce the error.

    D
    On Mon, Mar 21, 2011 at 1:11 PM, Saptarshi Guha wrote:

    Hello,

    I have some LZO files, which i

    a) indexed via DistributedLzoIndexer to create index files
    b) did not index, so just some LZO files in a directory.

    Using both approaches, I tried creating a subclass LzoBaseRegexLoader
    that returns a pattern.
    Sadly, not a single line matched. This is not a problem of the regex
    (checked it works with other strings),
    i modified LzoBaseRegexLoader.java to print the strings coming in and
    I'm getting binary e.g.

    http://pastebin.com/wAveGzDy

    I'm using Pig 0.8 and ElephantBird checked out from
    https://github.com/gerritjvv/elephant-bird

    Any suggestions?

    Saptarshi
  • Saptarshi Guha at Mar 21, 2011 at 11:50 pm
    Hi Dmitriy and Gerrit,

    I did the following,

    - confirmed that LzoPigStorage is indeed reading my lzo files (took
    this from gerrit's github)
    - confirmed that LzoBaseRegexLoader sublcass (mine) was getting
    strings (from pig-08 branch of Dmitriy's github)

    Both work. I was messing around (foolishly) with wrong extensions.

    Thanks
    Saptarshi



    On Mon, Mar 21, 2011 at 1:22 PM, Dmitriy Ryaboy wrote:
    Try the more up-to-date version
    at https://github.com/dvryaboy/elephant-bird/tree/pig-08
    please send me your class if it still fails, and the input (uncompressed)
    data to reproduce the error.
    D
    On Mon, Mar 21, 2011 at 1:11 PM, Saptarshi Guha wrote:

    Hello,

    I have some LZO files, which i

    a) indexed via DistributedLzoIndexer to create index files
    b) did not index, so just some LZO files in a directory.

    Using  both approaches, I tried creating a subclass LzoBaseRegexLoader
    that returns a pattern.
    Sadly, not a single line matched. This is not a problem of the regex
    (checked it works with other strings),
    i modified LzoBaseRegexLoader.java to print the strings coming in and
    I'm getting binary  e.g.

    http://pastebin.com/wAveGzDy

    I'm using Pig 0.8 and ElephantBird checked out from
    https://github.com/gerritjvv/elephant-bird

    Any suggestions?

    Saptarshi
  • Saptarshi Guha at Mar 22, 2011 at 4:29 am
    Did I also say, thank you to both of you and everyone involved for elephantbird.
    An extremely useful set of tools (like a wonderful christmas present)

    Cheers
    Saptarshi


    On Mon, Mar 21, 2011 at 4:50 PM, Saptarshi Guha
    wrote:
    Hi Dmitriy and Gerrit,

    I did the following,

    - confirmed that LzoPigStorage is indeed reading my lzo files (took
    this from gerrit's github)
    - confirmed that LzoBaseRegexLoader sublcass (mine) was getting
    strings  (from pig-08 branch of Dmitriy's github)

    Both work. I was messing around (foolishly) with wrong extensions.

    Thanks
    Saptarshi



    On Mon, Mar 21, 2011 at 1:22 PM, Dmitriy Ryaboy wrote:
    Try the more up-to-date version
    at https://github.com/dvryaboy/elephant-bird/tree/pig-08
    please send me your class if it still fails, and the input (uncompressed)
    data to reproduce the error.
    D

    On Mon, Mar 21, 2011 at 1:11 PM, Saptarshi Guha <saptarshi.guha@gmail.com>
    wrote:
    Hello,

    I have some LZO files, which i

    a) indexed via DistributedLzoIndexer to create index files
    b) did not index, so just some LZO files in a directory.

    Using  both approaches, I tried creating a subclass LzoBaseRegexLoader
    that returns a pattern.
    Sadly, not a single line matched. This is not a problem of the regex
    (checked it works with other strings),
    i modified LzoBaseRegexLoader.java to print the strings coming in and
    I'm getting binary  e.g.

    http://pastebin.com/wAveGzDy

    I'm using Pig 0.8 and ElephantBird checked out from
    https://github.com/gerritjvv/elephant-bird

    Any suggestions?

    Saptarshi
  • Gerrit Jansen van Vuuren at Mar 21, 2011 at 8:26 pm
    Hi,

    Several things you can try:

    1)Try using com.twitter.elephantbird.pig.load.LzoPigStorage() and print out
    a few lines just to make sure you can read clear text from the lzo files.
    2) You can use this in combination with pigs REGEX_EXTRACT(String
    expression, String regex, int matchIndex) built int function
    3) Have you tried LzoRegexLoader(String pattern)?

    Cheers,
    Gerrit


    On Mon, Mar 21, 2011 at 9:11 PM, Saptarshi Guha wrote:

    Hello,

    I have some LZO files, which i

    a) indexed via DistributedLzoIndexer to create index files
    b) did not index, so just some LZO files in a directory.

    Using both approaches, I tried creating a subclass LzoBaseRegexLoader
    that returns a pattern.
    Sadly, not a single line matched. This is not a problem of the regex
    (checked it works with other strings),
    i modified LzoBaseRegexLoader.java to print the strings coming in and
    I'm getting binary e.g.

    http://pastebin.com/wAveGzDy

    I'm using Pig 0.8 and ElephantBird checked out from
    https://github.com/gerritjvv/elephant-bird

    Any suggestions?

    Saptarshi

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupuser @
categoriespig, hadoop
postedMar 21, '11 at 8:11p
activeMar 22, '11 at 4:29a
posts5
users3
websitepig.apache.org

People

Translate

site design / logo © 2021 Grokbase