FAQ
Hello,

After getting all the errors to go away with LZO libraries not being found
and missing jar files for elephant-bird I've run into a new problem when
using the elephant-bird branch for pig 0.7

The following simple pig script works as expected

REGISTER elephant-bird-1.0.jar
REGISTER /usr/lib/elephant-bird/lib/google-collect-1.0.jar
A = load '/usr/foo/input/test_input_chars.txt';
DUMP A;

This just dumps out the contents of the test_input_chars.txt file which is
tab delimited. The output looks like:

(1,a,a,a,a,a,a)
(2,b,b,b,b,b,b)
(3,c,c,c,c,c,c)
(4,d,d,d,d,d,d)
(5,e,e,e,e,e,e)

I then lzop the test file to get test_input_chars.txt.lzo (I decompressed
this with lzop -d to make sure the compression worked fine and everything
looks good).
If I run the exact same script provided above on the lzo file it works
fine. However, this file is really small and doesn't need to use indexes.
As a result, I wanted to
have LZO support that worked with indexes. Based on this I decided to try
out the elephant-bird branch for pig 0.7 located here (
http://github.com/hirohanin/elephant-bird/) as
recommended by Dimitriy.

I created the following pig script that mirrors the above script but should
hopefully work on LZO files (including indexed ones)

REGISTER elephant-bird-1.0.jar
REGISTER /usr/lib/elephant-bird/lib/google-collect-1.0.jar
A = load '/usr/foo/input/test_input_chars.txt.lzo' USING
com.twitter.elephantbird.pig.load.LzoTokenizedLoader('\t');
DUMP A;

When I run this script which uses the LzoTokenizedLoader there is no
output. The script appears to run without errors but there are zero Records
Written and 0 Bytes Written.

Here is the exact output:

grunt > DUMP A;
[main] INFO com.twitter.elephantbird.pig.load.LzoTokenizedLoader -
LzoTokenizedLoader with given delimited [ ]
[main] INFO com.twitter.elephantbird.pig.load.LzoTokenizedLoader -
LzoTokenizedLoader with given delimited [ ]
[main] INFO com.twitter.elephantbird.pig.load.LzoTokenizedLoader -
LzoTokenizedLoader with given delimited [ ]
[main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine -
(Name:
Store(hdfs://master:9000/tmp/temp-2052828736/tmp-1533645117:org.apache.pig.builtin.BinStorage)
- 1-4 Operator Key: 1-4
[main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
- MR plan size before optimization: 1
[main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
- MR plan size after optimization: 1
[main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
- mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3
[main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
- Setting up single store job
[main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- 1 map-reduce job(s) waiting for submission.
[Thread-12] WARN org.apache.hadoop.mapred.JobClient - Use
GenericOptionsParser for parsing the arguments. Applications should
implement Tool for the same.
[Thread-12] INFO com.twitter.elephantbird.pig.load.LzoTokenizedLoader -
LzoTokenizedLoader with given delimiter [ ]
[Thread-12] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat -
Total input paths to process : 1
[main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- 0% complete
[main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- HadoopJobId: job_201009101108_0151
[main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- More information at
http://master:50030/jobdetails.jsp?jobid=job_201009101108_0151
[main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- 50% complete
[main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- 100% complete
[main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- Succesfully stored result in
"hdfs://amb-hadoop-01:9000/tmp/temp-2052828736/tmp-1533645117
[main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- Records written: 0
[main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- Bytes written: 0
[main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- Spillable Memory Manager spill count : 0
[main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- Proactive spill count : 0
[main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- Success!
[main] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total
input paths to process: 1
[main] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil -
Total input paths to process: 1
grunt >

I'm not sure if I'm doing something wrong in my use of LzoTokenizedLoader or
if there is a problem with the class itself (most likely the problem is with
my code heh) Thank you for any help!

~Ed

Search Discussions

  • Dmitriy Ryaboy at Sep 23, 2010 at 6:05 pm
    The 0.7 branch is not tested.. it's quite likely it doesn't actually work
    :).
    Rohan Rai was working on it.. Rohan, think you can take a look and help Ed
    out?

    Ed, you may want to check if the same input works when you use Pig 0.6 (and
    the official elephant-bird, on Kevin Weil's github).

    -D
    On Thu, Sep 23, 2010 at 6:49 AM, pig wrote:

    Hello,

    After getting all the errors to go away with LZO libraries not being found
    and missing jar files for elephant-bird I've run into a new problem when
    using the elephant-bird branch for pig 0.7

    The following simple pig script works as expected

    REGISTER elephant-bird-1.0.jar
    REGISTER /usr/lib/elephant-bird/lib/google-collect-1.0.jar
    A = load '/usr/foo/input/test_input_chars.txt';
    DUMP A;

    This just dumps out the contents of the test_input_chars.txt file which is
    tab delimited. The output looks like:

    (1,a,a,a,a,a,a)
    (2,b,b,b,b,b,b)
    (3,c,c,c,c,c,c)
    (4,d,d,d,d,d,d)
    (5,e,e,e,e,e,e)

    I then lzop the test file to get test_input_chars.txt.lzo (I decompressed
    this with lzop -d to make sure the compression worked fine and everything
    looks good).
    If I run the exact same script provided above on the lzo file it works
    fine. However, this file is really small and doesn't need to use indexes.
    As a result, I wanted to
    have LZO support that worked with indexes. Based on this I decided to try
    out the elephant-bird branch for pig 0.7 located here (
    http://github.com/hirohanin/elephant-bird/) as
    recommended by Dimitriy.

    I created the following pig script that mirrors the above script but should
    hopefully work on LZO files (including indexed ones)

    REGISTER elephant-bird-1.0.jar
    REGISTER /usr/lib/elephant-bird/lib/google-collect-1.0.jar
    A = load '/usr/foo/input/test_input_chars.txt.lzo' USING
    com.twitter.elephantbird.pig.load.LzoTokenizedLoader('\t');
    DUMP A;

    When I run this script which uses the LzoTokenizedLoader there is no
    output. The script appears to run without errors but there are zero
    Records
    Written and 0 Bytes Written.

    Here is the exact output:

    grunt > DUMP A;
    [main] INFO com.twitter.elephantbird.pig.load.LzoTokenizedLoader -
    LzoTokenizedLoader with given delimited [ ]
    [main] INFO com.twitter.elephantbird.pig.load.LzoTokenizedLoader -
    LzoTokenizedLoader with given delimited [ ]
    [main] INFO com.twitter.elephantbird.pig.load.LzoTokenizedLoader -
    LzoTokenizedLoader with given delimited [ ]
    [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine
    -
    (Name:

    Store(hdfs://master:9000/tmp/temp-2052828736/tmp-1533645117:org.apache.pig.builtin.BinStorage)
    - 1-4 Operator Key: 1-4
    [main] INFO

    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
    - MR plan size before optimization: 1
    [main] INFO

    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
    - MR plan size after optimization: 1
    [main] INFO

    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
    - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3
    [main] INFO

    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
    - Setting up single store job
    [main] INFO

    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
    - 1 map-reduce job(s) waiting for submission.
    [Thread-12] WARN org.apache.hadoop.mapred.JobClient - Use
    GenericOptionsParser for parsing the arguments. Applications should
    implement Tool for the same.
    [Thread-12] INFO com.twitter.elephantbird.pig.load.LzoTokenizedLoader -
    LzoTokenizedLoader with given delimiter [ ]
    [Thread-12] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat -
    Total input paths to process : 1
    [main] INFO

    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
    - 0% complete
    [main] INFO

    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
    - HadoopJobId: job_201009101108_0151
    [main] INFO

    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
    - More information at
    http://master:50030/jobdetails.jsp?jobid=job_201009101108_0151
    [main] INFO

    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
    - 50% complete
    [main] INFO

    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
    - 100% complete
    [main] INFO

    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
    - Succesfully stored result in
    "hdfs://amb-hadoop-01:9000/tmp/temp-2052828736/tmp-1533645117
    [main] INFO

    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
    - Records written: 0
    [main] INFO

    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
    - Bytes written: 0
    [main] INFO

    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
    - Spillable Memory Manager spill count : 0
    [main] INFO

    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
    - Proactive spill count : 0
    [main] INFO

    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
    - Success!
    [main] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total
    input paths to process: 1
    [main] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil -
    Total input paths to process: 1
    grunt >

    I'm not sure if I'm doing something wrong in my use of LzoTokenizedLoader
    or
    if there is a problem with the class itself (most likely the problem is
    with
    my code heh) Thank you for any help!

    ~Ed
  • Pig at Sep 24, 2010 at 9:08 pm
    Hi Dmitriy,

    I took a look at the source and it doesn't look like the LzoTokenizedLoader
    is completely implemented for 0.7. I'm really new to Pig but I'll see what
    I can do with it to get it working. In the meantime for testing purposes I
    just split up our single large lzo file into lots of small lzo files.
    Thanks!

    ~Ed
    On Thu, Sep 23, 2010 at 2:04 PM, Dmitriy Ryaboy wrote:

    The 0.7 branch is not tested.. it's quite likely it doesn't actually work
    :).
    Rohan Rai was working on it.. Rohan, think you can take a look and help Ed
    out?

    Ed, you may want to check if the same input works when you use Pig 0.6 (and
    the official elephant-bird, on Kevin Weil's github).

    -D
    On Thu, Sep 23, 2010 at 6:49 AM, pig wrote:

    Hello,

    After getting all the errors to go away with LZO libraries not being found
    and missing jar files for elephant-bird I've run into a new problem when
    using the elephant-bird branch for pig 0.7

    The following simple pig script works as expected

    REGISTER elephant-bird-1.0.jar
    REGISTER /usr/lib/elephant-bird/lib/google-collect-1.0.jar
    A = load '/usr/foo/input/test_input_chars.txt';
    DUMP A;

    This just dumps out the contents of the test_input_chars.txt file which is
    tab delimited. The output looks like:

    (1,a,a,a,a,a,a)
    (2,b,b,b,b,b,b)
    (3,c,c,c,c,c,c)
    (4,d,d,d,d,d,d)
    (5,e,e,e,e,e,e)

    I then lzop the test file to get test_input_chars.txt.lzo (I decompressed
    this with lzop -d to make sure the compression worked fine and everything
    looks good).
    If I run the exact same script provided above on the lzo file it works
    fine. However, this file is really small and doesn't need to use indexes.
    As a result, I wanted to
    have LZO support that worked with indexes. Based on this I decided to try
    out the elephant-bird branch for pig 0.7 located here (
    http://github.com/hirohanin/elephant-bird/) as
    recommended by Dimitriy.

    I created the following pig script that mirrors the above script but should
    hopefully work on LZO files (including indexed ones)

    REGISTER elephant-bird-1.0.jar
    REGISTER /usr/lib/elephant-bird/lib/google-collect-1.0.jar
    A = load '/usr/foo/input/test_input_chars.txt.lzo' USING
    com.twitter.elephantbird.pig.load.LzoTokenizedLoader('\t');
    DUMP A;

    When I run this script which uses the LzoTokenizedLoader there is no
    output. The script appears to run without errors but there are zero
    Records
    Written and 0 Bytes Written.

    Here is the exact output:

    grunt > DUMP A;
    [main] INFO com.twitter.elephantbird.pig.load.LzoTokenizedLoader -
    LzoTokenizedLoader with given delimited [ ]
    [main] INFO com.twitter.elephantbird.pig.load.LzoTokenizedLoader -
    LzoTokenizedLoader with given delimited [ ]
    [main] INFO com.twitter.elephantbird.pig.load.LzoTokenizedLoader -
    LzoTokenizedLoader with given delimited [ ]
    [main] INFO
    org.apache.pig.backend.hadoop.executionengine.HExecutionEngine
    -
    (Name:

    Store(hdfs://master:9000/tmp/temp-2052828736/tmp-1533645117:org.apache.pig.builtin.BinStorage)
    - 1-4 Operator Key: 1-4
    [main] INFO

    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
    - MR plan size before optimization: 1
    [main] INFO

    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
    - MR plan size after optimization: 1
    [main] INFO

    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
    - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3
    [main] INFO

    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
    - Setting up single store job
    [main] INFO

    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
    - 1 map-reduce job(s) waiting for submission.
    [Thread-12] WARN org.apache.hadoop.mapred.JobClient - Use
    GenericOptionsParser for parsing the arguments. Applications should
    implement Tool for the same.
    [Thread-12] INFO com.twitter.elephantbird.pig.load.LzoTokenizedLoader -
    LzoTokenizedLoader with given delimiter [ ]
    [Thread-12] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat -
    Total input paths to process : 1
    [main] INFO

    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
    - 0% complete
    [main] INFO

    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
    - HadoopJobId: job_201009101108_0151
    [main] INFO

    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
    - More information at
    http://master:50030/jobdetails.jsp?jobid=job_201009101108_0151
    [main] INFO

    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
    - 50% complete
    [main] INFO

    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
    - 100% complete
    [main] INFO

    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
    - Succesfully stored result in
    "hdfs://amb-hadoop-01:9000/tmp/temp-2052828736/tmp-1533645117
    [main] INFO

    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
    - Records written: 0
    [main] INFO

    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
    - Bytes written: 0
    [main] INFO

    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
    - Spillable Memory Manager spill count : 0
    [main] INFO

    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
    - Proactive spill count : 0
    [main] INFO

    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
    - Success!
    [main] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total
    input paths to process: 1
    [main] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil -
    Total input paths to process: 1
    grunt >

    I'm not sure if I'm doing something wrong in my use of LzoTokenizedLoader
    or
    if there is a problem with the class itself (most likely the problem is
    with
    my code heh) Thank you for any help!

    ~Ed
  • Rohan Rai at Sep 27, 2010 at 5:19 am
    Oh Sorry I did not see this mail ...

    Its not an official patch/release

    But here is a fork on elephant-bird which works with pig 0.7

    for normal LZOText Loading etc

    (NOt HbaseLoader)

    Regards
    Rohan

    Dmitriy Ryaboy wrote:
    The 0.7 branch is not tested.. it's quite likely it doesn't actually work
    :).
    Rohan Rai was working on it.. Rohan, think you can take a look and help Ed
    out?

    Ed, you may want to check if the same input works when you use Pig 0.6 (and
    the official elephant-bird, on Kevin Weil's github).

    -D

    On Thu, Sep 23, 2010 at 6:49 AM, pig wrote:

    Hello,

    After getting all the errors to go away with LZO libraries not being found
    and missing jar files for elephant-bird I've run into a new problem when
    using the elephant-bird branch for pig 0.7

    The following simple pig script works as expected

    REGISTER elephant-bird-1.0.jar
    REGISTER /usr/lib/elephant-bird/lib/google-collect-1.0.jar
    A = load '/usr/foo/input/test_input_chars.txt';
    DUMP A;

    This just dumps out the contents of the test_input_chars.txt file which is
    tab delimited. The output looks like:

    (1,a,a,a,a,a,a)
    (2,b,b,b,b,b,b)
    (3,c,c,c,c,c,c)
    (4,d,d,d,d,d,d)
    (5,e,e,e,e,e,e)

    I then lzop the test file to get test_input_chars.txt.lzo (I decompressed
    this with lzop -d to make sure the compression worked fine and everything
    looks good).
    If I run the exact same script provided above on the lzo file it works
    fine. However, this file is really small and doesn't need to use indexes.
    As a result, I wanted to
    have LZO support that worked with indexes. Based on this I decided to try
    out the elephant-bird branch for pig 0.7 located here (
    http://github.com/hirohanin/elephant-bird/) as
    recommended by Dimitriy.

    I created the following pig script that mirrors the above script but should
    hopefully work on LZO files (including indexed ones)

    REGISTER elephant-bird-1.0.jar
    REGISTER /usr/lib/elephant-bird/lib/google-collect-1.0.jar
    A = load '/usr/foo/input/test_input_chars.txt.lzo' USING
    com.twitter.elephantbird.pig.load.LzoTokenizedLoader('\t');
    DUMP A;

    When I run this script which uses the LzoTokenizedLoader there is no
    output. The script appears to run without errors but there are zero
    Records
    Written and 0 Bytes Written.

    Here is the exact output:

    grunt > DUMP A;
    [main] INFO com.twitter.elephantbird.pig.load.LzoTokenizedLoader -
    LzoTokenizedLoader with given delimited [ ]
    [main] INFO com.twitter.elephantbird.pig.load.LzoTokenizedLoader -
    LzoTokenizedLoader with given delimited [ ]
    [main] INFO com.twitter.elephantbird.pig.load.LzoTokenizedLoader -
    LzoTokenizedLoader with given delimited [ ]
    [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine
    -
    (Name:

    Store(hdfs://master:9000/tmp/temp-2052828736/tmp-1533645117:org.apache.pig.builtin.BinStorage)
    - 1-4 Operator Key: 1-4
    [main] INFO

    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
    - MR plan size before optimization: 1
    [main] INFO

    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
    - MR plan size after optimization: 1
    [main] INFO

    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
    - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3
    [main] INFO

    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
    - Setting up single store job
    [main] INFO

    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
    - 1 map-reduce job(s) waiting for submission.
    [Thread-12] WARN org.apache.hadoop.mapred.JobClient - Use
    GenericOptionsParser for parsing the arguments. Applications should
    implement Tool for the same.
    [Thread-12] INFO com.twitter.elephantbird.pig.load.LzoTokenizedLoader -
    LzoTokenizedLoader with given delimiter [ ]
    [Thread-12] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat -
    Total input paths to process : 1
    [main] INFO

    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
    - 0% complete
    [main] INFO

    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
    - HadoopJobId: job_201009101108_0151
    [main] INFO

    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
    - More information at
    http://master:50030/jobdetails.jsp?jobid=job_201009101108_0151
    [main] INFO

    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
    - 50% complete
    [main] INFO

    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
    - 100% complete
    [main] INFO

    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
    - Succesfully stored result in
    "hdfs://amb-hadoop-01:9000/tmp/temp-2052828736/tmp-1533645117
    [main] INFO

    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
    - Records written: 0
    [main] INFO

    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
    - Bytes written: 0
    [main] INFO

    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
    - Spillable Memory Manager spill count : 0
    [main] INFO

    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
    - Proactive spill count : 0
    [main] INFO

    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
    - Success!
    [main] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total
    input paths to process: 1
    [main] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil -
    Total input paths to process: 1
    grunt >

    I'm not sure if I'm doing something wrong in my use of LzoTokenizedLoader
    or
    if there is a problem with the class itself (most likely the problem is
    with
    my code heh) Thank you for any help!

    ~Ed
    .

    The information contained in this communication is intended solely for the use of the individual or entity to whom it is addressed and others authorized to receive it. It may contain confidential or legally privileged information. If you are not the intended recipient you are hereby notified that any disclosure, copying, distribution or taking any action in reliance on the contents of this information is strictly prohibited and may be unlawful. If you have received this communication in error, please notify us immediately by responding to this email and then delete it from your system. The firm is neither liable for the proper and complete transmission of the information contained in this communication nor for any delay in its receipt.
  • Rohan Rai at Sep 27, 2010 at 5:20 am
    Oops

    Here is the link

    http://github.com/hirohanin/elephant-bird

    Regards
    Rohan

    Rohan Rai wrote:
    Oh Sorry I did not see this mail ...

    Its not an official patch/release

    But here is a fork on elephant-bird which works with pig 0.7

    for normal LZOText Loading etc

    (NOt HbaseLoader)

    Regards
    Rohan

    Dmitriy Ryaboy wrote:
    The 0.7 branch is not tested.. it's quite likely it doesn't actually
    work
    :).
    Rohan Rai was working on it.. Rohan, think you can take a look and
    help Ed
    out?

    Ed, you may want to check if the same input works when you use Pig
    0.6 (and
    the official elephant-bird, on Kevin Weil's github).

    -D

    On Thu, Sep 23, 2010 at 6:49 AM, pig wrote:

    Hello,

    After getting all the errors to go away with LZO libraries not being
    found
    and missing jar files for elephant-bird I've run into a new problem
    when
    using the elephant-bird branch for pig 0.7

    The following simple pig script works as expected

    REGISTER elephant-bird-1.0.jar
    REGISTER /usr/lib/elephant-bird/lib/google-collect-1.0.jar
    A = load '/usr/foo/input/test_input_chars.txt';
    DUMP A;

    This just dumps out the contents of the test_input_chars.txt file
    which is
    tab delimited. The output looks like:

    (1,a,a,a,a,a,a)
    (2,b,b,b,b,b,b)
    (3,c,c,c,c,c,c)
    (4,d,d,d,d,d,d)
    (5,e,e,e,e,e,e)

    I then lzop the test file to get test_input_chars.txt.lzo (I
    decompressed
    this with lzop -d to make sure the compression worked fine and
    everything
    looks good).
    If I run the exact same script provided above on the lzo file it works
    fine. However, this file is really small and doesn't need to use
    indexes.
    As a result, I wanted to
    have LZO support that worked with indexes. Based on this I decided
    to try
    out the elephant-bird branch for pig 0.7 located here (
    http://github.com/hirohanin/elephant-bird/) as
    recommended by Dimitriy.

    I created the following pig script that mirrors the above script but
    should
    hopefully work on LZO files (including indexed ones)

    REGISTER elephant-bird-1.0.jar
    REGISTER /usr/lib/elephant-bird/lib/google-collect-1.0.jar
    A = load '/usr/foo/input/test_input_chars.txt.lzo' USING
    com.twitter.elephantbird.pig.load.LzoTokenizedLoader('\t');
    DUMP A;

    When I run this script which uses the LzoTokenizedLoader there is no
    output. The script appears to run without errors but there are zero
    Records
    Written and 0 Bytes Written.

    Here is the exact output:

    grunt > DUMP A;
    [main] INFO com.twitter.elephantbird.pig.load.LzoTokenizedLoader -
    LzoTokenizedLoader with given delimited [ ]
    [main] INFO com.twitter.elephantbird.pig.load.LzoTokenizedLoader -
    LzoTokenizedLoader with given delimited [ ]
    [main] INFO com.twitter.elephantbird.pig.load.LzoTokenizedLoader -
    LzoTokenizedLoader with given delimited [ ]
    [main] INFO
    org.apache.pig.backend.hadoop.executionengine.HExecutionEngine
    -
    (Name:

    Store(hdfs://master:9000/tmp/temp-2052828736/tmp-1533645117:org.apache.pig.builtin.BinStorage)

    - 1-4 Operator Key: 1-4
    [main] INFO

    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer

    - MR plan size before optimization: 1
    [main] INFO

    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer

    - MR plan size after optimization: 1
    [main] INFO

    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler

    - mapred.job.reduce.markreset.buffer.percent is not set, set to
    default 0.3
    [main] INFO

    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler

    - Setting up single store job
    [main] INFO

    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher

    - 1 map-reduce job(s) waiting for submission.
    [Thread-12] WARN org.apache.hadoop.mapred.JobClient - Use
    GenericOptionsParser for parsing the arguments. Applications should
    implement Tool for the same.
    [Thread-12] INFO com.twitter.elephantbird.pig.load.LzoTokenizedLoader -
    LzoTokenizedLoader with given delimiter [ ]
    [Thread-12] INFO
    org.apache.hadoop.mapreduce.lib.input.FileInputFormat -
    Total input paths to process : 1
    [main] INFO

    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher

    - 0% complete
    [main] INFO

    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher

    - HadoopJobId: job_201009101108_0151
    [main] INFO

    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher

    - More information at
    http://master:50030/jobdetails.jsp?jobid=job_201009101108_0151
    [main] INFO

    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher

    - 50% complete
    [main] INFO

    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher

    - 100% complete
    [main] INFO

    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher

    - Succesfully stored result in
    "hdfs://amb-hadoop-01:9000/tmp/temp-2052828736/tmp-1533645117
    [main] INFO

    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher

    - Records written: 0
    [main] INFO

    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher

    - Bytes written: 0
    [main] INFO

    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher

    - Spillable Memory Manager spill count : 0
    [main] INFO

    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher

    - Proactive spill count : 0
    [main] INFO

    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher

    - Success!
    [main] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat -
    Total
    input paths to process: 1
    [main] INFO
    org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil -
    Total input paths to process: 1
    grunt >

    I'm not sure if I'm doing something wrong in my use of
    LzoTokenizedLoader
    or
    if there is a problem with the class itself (most likely the problem is
    with
    my code heh) Thank you for any help!

    ~Ed
    .

    The information contained in this communication is intended solely for the use of the individual or entity to whom it is addressed and others authorized to receive it. It may contain confidential or legally privileged information. If you are not the intended recipient you are hereby notified that any disclosure, copying, distribution or taking any action in reliance on the contents of this information is strictly prohibited and may be unlawful. If you have received this communication in error, please notify us immediately by responding to this email and then delete it from your system. The firm is neither liable for the proper and complete transmission of the information contained in this communication nor for any delay in its receipt.
  • Rohan Rai at Sep 27, 2010 at 5:26 am
    Oh Sorry I am completely out of sync...

    Can you tell how did you lzo'ed and indexed the file

    Regards
    Rohan

    Rohan Rai wrote:
    Oh Sorry I did not see this mail ...

    Its not an official patch/release

    But here is a fork on elephant-bird which works with pig 0.7

    for normal LZOText Loading etc

    (NOt HbaseLoader)

    Regards
    Rohan

    Dmitriy Ryaboy wrote:
    The 0.7 branch is not tested.. it's quite likely it doesn't actually work
    :).
    Rohan Rai was working on it.. Rohan, think you can take a look and help Ed
    out?

    Ed, you may want to check if the same input works when you use Pig 0.6 (and
    the official elephant-bird, on Kevin Weil's github).

    -D

    On Thu, Sep 23, 2010 at 6:49 AM, pig wrote:


    Hello,

    After getting all the errors to go away with LZO libraries not being found
    and missing jar files for elephant-bird I've run into a new problem when
    using the elephant-bird branch for pig 0.7

    The following simple pig script works as expected

    REGISTER elephant-bird-1.0.jar
    REGISTER /usr/lib/elephant-bird/lib/google-collect-1.0.jar
    A = load '/usr/foo/input/test_input_chars.txt';
    DUMP A;

    This just dumps out the contents of the test_input_chars.txt file which is
    tab delimited. The output looks like:

    (1,a,a,a,a,a,a)
    (2,b,b,b,b,b,b)
    (3,c,c,c,c,c,c)
    (4,d,d,d,d,d,d)
    (5,e,e,e,e,e,e)

    I then lzop the test file to get test_input_chars.txt.lzo (I decompressed
    this with lzop -d to make sure the compression worked fine and everything
    looks good).
    If I run the exact same script provided above on the lzo file it works
    fine. However, this file is really small and doesn't need to use indexes.
    As a result, I wanted to
    have LZO support that worked with indexes. Based on this I decided to try
    out the elephant-bird branch for pig 0.7 located here (
    http://github.com/hirohanin/elephant-bird/) as
    recommended by Dimitriy.

    I created the following pig script that mirrors the above script but should
    hopefully work on LZO files (including indexed ones)

    REGISTER elephant-bird-1.0.jar
    REGISTER /usr/lib/elephant-bird/lib/google-collect-1.0.jar
    A = load '/usr/foo/input/test_input_chars.txt.lzo' USING
    com.twitter.elephantbird.pig.load.LzoTokenizedLoader('\t');
    DUMP A;

    When I run this script which uses the LzoTokenizedLoader there is no
    output. The script appears to run without errors but there are zero
    Records
    Written and 0 Bytes Written.

    Here is the exact output:

    grunt > DUMP A;
    [main] INFO com.twitter.elephantbird.pig.load.LzoTokenizedLoader -
    LzoTokenizedLoader with given delimited [ ]
    [main] INFO com.twitter.elephantbird.pig.load.LzoTokenizedLoader -
    LzoTokenizedLoader with given delimited [ ]
    [main] INFO com.twitter.elephantbird.pig.load.LzoTokenizedLoader -
    LzoTokenizedLoader with given delimited [ ]
    [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine
    -
    (Name:

    Store(hdfs://master:9000/tmp/temp-2052828736/tmp-1533645117:org.apache.pig.builtin.BinStorage)
    - 1-4 Operator Key: 1-4
    [main] INFO

    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
    - MR plan size before optimization: 1
    [main] INFO

    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
    - MR plan size after optimization: 1
    [main] INFO

    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
    - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3
    [main] INFO

    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
    - Setting up single store job
    [main] INFO

    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
    - 1 map-reduce job(s) waiting for submission.
    [Thread-12] WARN org.apache.hadoop.mapred.JobClient - Use
    GenericOptionsParser for parsing the arguments. Applications should
    implement Tool for the same.
    [Thread-12] INFO com.twitter.elephantbird.pig.load.LzoTokenizedLoader -
    LzoTokenizedLoader with given delimiter [ ]
    [Thread-12] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat -
    Total input paths to process : 1
    [main] INFO

    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
    - 0% complete
    [main] INFO

    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
    - HadoopJobId: job_201009101108_0151
    [main] INFO

    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
    - More information at
    http://master:50030/jobdetails.jsp?jobid=job_201009101108_0151
    [main] INFO

    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
    - 50% complete
    [main] INFO

    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
    - 100% complete
    [main] INFO

    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
    - Succesfully stored result in
    "hdfs://amb-hadoop-01:9000/tmp/temp-2052828736/tmp-1533645117
    [main] INFO

    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
    - Records written: 0
    [main] INFO

    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
    - Bytes written: 0
    [main] INFO

    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
    - Spillable Memory Manager spill count : 0
    [main] INFO

    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
    - Proactive spill count : 0
    [main] INFO

    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
    - Success!
    [main] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total
    input paths to process: 1
    [main] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil -
    Total input paths to process: 1
    grunt >

    I'm not sure if I'm doing something wrong in my use of LzoTokenizedLoader
    or
    if there is a problem with the class itself (most likely the problem is
    with
    my code heh) Thank you for any help!

    ~Ed

    .


    The information contained in this communication is intended solely for the use of the individual or entity to whom it is addressed and others authorized to receive it. It may contain confidential or legally privileged information. If you are not the intended recipient you are hereby notified that any disclosure, copying, distribution or taking any action in reliance on the contents of this information is strictly prohibited and may be unlawful. If you have received this communication in error, please notify us immediately by responding to this email and then delete it from your system. The firm is neither liable for the proper and complete transmission of the information contained in this communication nor for any delay in its receipt.
    .

    The information contained in this communication is intended solely for the use of the individual or entity to whom it is addressed and others authorized to receive it. It may contain confidential or legally privileged information. If you are not the intended recipient you are hereby notified that any disclosure, copying, distribution or taking any action in reliance on the contents of this information is strictly prohibited and may be unlawful. If you have received this communication in error, please notify us immediately by responding to this email and then delete it from your system. The firm is neither liable for the proper and complete transmission of the information contained in this communication nor for any delay in its receipt.
  • Pig at Sep 27, 2010 at 12:56 pm
    Hi Rohan,

    The test file (test_input_chars.txt.lzo) is not indexed. I created it using
    the command

    'lzop test_input_chars.txt'

    It's a really small file (only 6 lines) so I didn't think it needed to be
    index. Do all files regardless of size need to be indexed for the
    LzoTokenizedLoader to work?

    Thank you!

    ~Ed
    On Mon, Sep 27, 2010 at 1:25 AM, Rohan Rai wrote:

    Oh Sorry I am completely out of sync...

    Can you tell how did you lzo'ed and indexed the file


    Regards
    Rohan

    Rohan Rai wrote:
    Oh Sorry I did not see this mail ...

    Its not an official patch/release

    But here is a fork on elephant-bird which works with pig 0.7

    for normal LZOText Loading etc

    (NOt HbaseLoader)

    Regards
    Rohan

    Dmitriy Ryaboy wrote:

    The 0.7 branch is not tested.. it's quite likely it doesn't actually work
    :).
    Rohan Rai was working on it.. Rohan, think you can take a look and help
    Ed
    out?

    Ed, you may want to check if the same input works when you use Pig 0.6
    (and
    the official elephant-bird, on Kevin Weil's github).

    -D

    On Thu, Sep 23, 2010 at 6:49 AM, pig wrote:



    Hello,
    After getting all the errors to go away with LZO libraries not being
    found
    and missing jar files for elephant-bird I've run into a new problem when
    using the elephant-bird branch for pig 0.7

    The following simple pig script works as expected

    REGISTER elephant-bird-1.0.jar
    REGISTER /usr/lib/elephant-bird/lib/google-collect-1.0.jar
    A = load '/usr/foo/input/test_input_chars.txt';
    DUMP A;

    This just dumps out the contents of the test_input_chars.txt file which
    is
    tab delimited. The output looks like:

    (1,a,a,a,a,a,a)
    (2,b,b,b,b,b,b)
    (3,c,c,c,c,c,c)
    (4,d,d,d,d,d,d)
    (5,e,e,e,e,e,e)

    I then lzop the test file to get test_input_chars.txt.lzo (I
    decompressed
    this with lzop -d to make sure the compression worked fine and
    everything
    looks good).
    If I run the exact same script provided above on the lzo file it works
    fine. However, this file is really small and doesn't need to use
    indexes.
    As a result, I wanted to
    have LZO support that worked with indexes. Based on this I decided to
    try
    out the elephant-bird branch for pig 0.7 located here (
    http://github.com/hirohanin/elephant-bird/) as
    recommended by Dimitriy.

    I created the following pig script that mirrors the above script but
    should
    hopefully work on LZO files (including indexed ones)

    REGISTER elephant-bird-1.0.jar
    REGISTER /usr/lib/elephant-bird/lib/google-collect-1.0.jar
    A = load '/usr/foo/input/test_input_chars.txt.lzo' USING
    com.twitter.elephantbird.pig.load.LzoTokenizedLoader('\t');
    DUMP A;

    When I run this script which uses the LzoTokenizedLoader there is no
    output. The script appears to run without errors but there are zero
    Records
    Written and 0 Bytes Written.

    Here is the exact output:

    grunt > DUMP A;
    [main] INFO com.twitter.elephantbird.pig.load.LzoTokenizedLoader -
    LzoTokenizedLoader with given delimited [ ]
    [main] INFO com.twitter.elephantbird.pig.load.LzoTokenizedLoader -
    LzoTokenizedLoader with given delimited [ ]
    [main] INFO com.twitter.elephantbird.pig.load.LzoTokenizedLoader -
    LzoTokenizedLoader with given delimited [ ]
    [main] INFO
    org.apache.pig.backend.hadoop.executionengine.HExecutionEngine
    -
    (Name:


    Store(hdfs://master:9000/tmp/temp-2052828736/tmp-1533645117:org.apache.pig.builtin.BinStorage)
    - 1-4 Operator Key: 1-4
    [main] INFO


    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
    - MR plan size before optimization: 1
    [main] INFO


    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
    - MR plan size after optimization: 1
    [main] INFO


    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
    - mapred.job.reduce.markreset.buffer.percent is not set, set to default
    0.3
    [main] INFO


    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
    - Setting up single store job
    [main] INFO


    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
    - 1 map-reduce job(s) waiting for submission.
    [Thread-12] WARN org.apache.hadoop.mapred.JobClient - Use
    GenericOptionsParser for parsing the arguments. Applications should
    implement Tool for the same.
    [Thread-12] INFO com.twitter.elephantbird.pig.load.LzoTokenizedLoader -
    LzoTokenizedLoader with given delimiter [ ]
    [Thread-12] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat -
    Total input paths to process : 1
    [main] INFO


    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
    - 0% complete
    [main] INFO


    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
    - HadoopJobId: job_201009101108_0151
    [main] INFO


    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
    - More information at
    http://master:50030/jobdetails.jsp?jobid=job_201009101108_0151
    [main] INFO


    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
    - 50% complete
    [main] INFO


    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
    - 100% complete
    [main] INFO


    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
    - Succesfully stored result in
    "hdfs://amb-hadoop-01:9000/tmp/temp-2052828736/tmp-1533645117
    [main] INFO


    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
    - Records written: 0
    [main] INFO


    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
    - Bytes written: 0
    [main] INFO


    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
    - Spillable Memory Manager spill count : 0
    [main] INFO


    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
    - Proactive spill count : 0
    [main] INFO


    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
    - Success!
    [main] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat -
    Total
    input paths to process: 1
    [main] INFO
    org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil -
    Total input paths to process: 1
    grunt >

    I'm not sure if I'm doing something wrong in my use of
    LzoTokenizedLoader
    or
    if there is a problem with the class itself (most likely the problem is
    with
    my code heh) Thank you for any help!

    ~Ed



    .

    The information contained in this communication is intended solely for the
    use of the individual or entity to whom it is addressed and others
    authorized to receive it. It may contain confidential or legally privileged
    information. If you are not the intended recipient you are hereby notified
    that any disclosure, copying, distribution or taking any action in reliance
    on the contents of this information is strictly prohibited and may be
    unlawful. If you have received this communication in error, please notify us
    immediately by responding to this email and then delete it from your system.
    The firm is neither liable for the proper and complete transmission of the
    information contained in this communication nor for any delay in its
    receipt.
    .

    The information contained in this communication is intended solely for the
    use of the individual or entity to whom it is addressed and others
    authorized to receive it. It may contain confidential or legally privileged
    information. If you are not the intended recipient you are hereby notified
    that any disclosure, copying, distribution or taking any action in reliance
    on the contents of this information is strictly prohibited and may be
    unlawful. If you have received this communication in error, please notify us
    immediately by responding to this email and then delete it from your system.
    The firm is neither liable for the proper and complete transmission of the
    information contained in this communication nor for any delay in its
    receipt.
  • Rohan Rai at Sep 27, 2010 at 6:00 pm
    Well

    I haven't tried (rather I don't remember) compressing via lzop and then
    putting on cluster...
    So cant tell you about that...Here is what works for me.

    I do it by first putting the file on cluster and then doing Stream
    Compression.

    And yes it need not be indexed (I guess it doesn't matter for small
    test file, otherwise it is unwise
    for one loses the benefit of parallelism)

    Regards
    Rohan

    pig wrote:
    Hi Rohan,

    The test file (test_input_chars.txt.lzo) is not indexed. I created it using
    the command

    'lzop test_input_chars.txt'

    It's a really small file (only 6 lines) so I didn't think it needed to be
    index. Do all files regardless of size need to be indexed for the
    LzoTokenizedLoader to work?

    Thank you!

    ~Ed

    On Mon, Sep 27, 2010 at 1:25 AM, Rohan Rai wrote:

    Oh Sorry I am completely out of sync...

    Can you tell how did you lzo'ed and indexed the file


    Regards
    Rohan

    Rohan Rai wrote:

    Oh Sorry I did not see this mail ...

    Its not an official patch/release

    But here is a fork on elephant-bird which works with pig 0.7

    for normal LZOText Loading etc

    (NOt HbaseLoader)

    Regards
    Rohan

    Dmitriy Ryaboy wrote:

    The 0.7 branch is not tested.. it's quite likely it doesn't actually work
    :).
    Rohan Rai was working on it.. Rohan, think you can take a look and help
    Ed
    out?

    Ed, you may want to check if the same input works when you use Pig 0.6
    (and
    the official elephant-bird, on Kevin Weil's github).

    -D

    On Thu, Sep 23, 2010 at 6:49 AM, pig wrote:



    Hello,
    After getting all the errors to go away with LZO libraries not being
    found
    and missing jar files for elephant-bird I've run into a new problem when
    using the elephant-bird branch for pig 0.7

    The following simple pig script works as expected

    REGISTER elephant-bird-1.0.jar
    REGISTER /usr/lib/elephant-bird/lib/google-collect-1.0.jar
    A = load '/usr/foo/input/test_input_chars.txt';
    DUMP A;

    This just dumps out the contents of the test_input_chars.txt file which
    is
    tab delimited. The output looks like:

    (1,a,a,a,a,a,a)
    (2,b,b,b,b,b,b)
    (3,c,c,c,c,c,c)
    (4,d,d,d,d,d,d)
    (5,e,e,e,e,e,e)

    I then lzop the test file to get test_input_chars.txt.lzo (I
    decompressed
    this with lzop -d to make sure the compression worked fine and
    everything
    looks good).
    If I run the exact same script provided above on the lzo file it works
    fine. However, this file is really small and doesn't need to use
    indexes.
    As a result, I wanted to
    have LZO support that worked with indexes. Based on this I decided to
    try
    out the elephant-bird branch for pig 0.7 located here (
    http://github.com/hirohanin/elephant-bird/) as
    recommended by Dimitriy.

    I created the following pig script that mirrors the above script but
    should
    hopefully work on LZO files (including indexed ones)

    REGISTER elephant-bird-1.0.jar
    REGISTER /usr/lib/elephant-bird/lib/google-collect-1.0.jar
    A = load '/usr/foo/input/test_input_chars.txt.lzo' USING
    com.twitter.elephantbird.pig.load.LzoTokenizedLoader('\t');
    DUMP A;

    When I run this script which uses the LzoTokenizedLoader there is no
    output. The script appears to run without errors but there are zero
    Records
    Written and 0 Bytes Written.

    Here is the exact output:

    grunt > DUMP A;
    [main] INFO com.twitter.elephantbird.pig.load.LzoTokenizedLoader -
    LzoTokenizedLoader with given delimited [ ]
    [main] INFO com.twitter.elephantbird.pig.load.LzoTokenizedLoader -
    LzoTokenizedLoader with given delimited [ ]
    [main] INFO com.twitter.elephantbird.pig.load.LzoTokenizedLoader -
    LzoTokenizedLoader with given delimited [ ]
    [main] INFO
    org.apache.pig.backend.hadoop.executionengine.HExecutionEngine
    -
    (Name:


    Store(hdfs://master:9000/tmp/temp-2052828736/tmp-1533645117:org.apache.pig.builtin.BinStorage)
    - 1-4 Operator Key: 1-4
    [main] INFO


    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
    - MR plan size before optimization: 1
    [main] INFO


    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
    - MR plan size after optimization: 1
    [main] INFO


    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
    - mapred.job.reduce.markreset.buffer.percent is not set, set to default
    0.3
    [main] INFO


    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
    - Setting up single store job
    [main] INFO


    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
    - 1 map-reduce job(s) waiting for submission.
    [Thread-12] WARN org.apache.hadoop.mapred.JobClient - Use
    GenericOptionsParser for parsing the arguments. Applications should
    implement Tool for the same.
    [Thread-12] INFO com.twitter.elephantbird.pig.load.LzoTokenizedLoader -
    LzoTokenizedLoader with given delimiter [ ]
    [Thread-12] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat -
    Total input paths to process : 1
    [main] INFO


    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
    - 0% complete
    [main] INFO


    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
    - HadoopJobId: job_201009101108_0151
    [main] INFO


    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
    - More information at
    http://master:50030/jobdetails.jsp?jobid=job_201009101108_0151
    [main] INFO


    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
    - 50% complete
    [main] INFO


    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
    - 100% complete
    [main] INFO


    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
    - Succesfully stored result in
    "hdfs://amb-hadoop-01:9000/tmp/temp-2052828736/tmp-1533645117
    [main] INFO


    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
    - Records written: 0
    [main] INFO


    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
    - Bytes written: 0
    [main] INFO


    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
    - Spillable Memory Manager spill count : 0
    [main] INFO


    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
    - Proactive spill count : 0
    [main] INFO


    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
    - Success!
    [main] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat -
    Total
    input paths to process: 1
    [main] INFO
    org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil -
    Total input paths to process: 1
    grunt >

    I'm not sure if I'm doing something wrong in my use of
    LzoTokenizedLoader
    or
    if there is a problem with the class itself (most likely the problem is
    with
    my code heh) Thank you for any help!

    ~Ed



    .
    The information contained in this communication is intended solely for the
    use of the individual or entity to whom it is addressed and others
    authorized to receive it. It may contain confidential or legally privileged
    information. If you are not the intended recipient you are hereby notified
    that any disclosure, copying, distribution or taking any action in reliance
    on the contents of this information is strictly prohibited and may be
    unlawful. If you have received this communication in error, please notify us
    immediately by responding to this email and then delete it from your system.
    The firm is neither liable for the proper and complete transmission of the
    information contained in this communication nor for any delay in its
    receipt.
    .


    The information contained in this communication is intended solely for the
    use of the individual or entity to whom it is addressed and others
    authorized to receive it. It may contain confidential or legally privileged
    information. If you are not the intended recipient you are hereby notified
    that any disclosure, copying, distribution or taking any action in reliance
    on the contents of this information is strictly prohibited and may be
    unlawful. If you have received this communication in error, please notify us
    immediately by responding to this email and then delete it from your system.
    The firm is neither liable for the proper and complete transmission of the
    information contained in this communication nor for any delay in its
    receipt.
    .

    The information contained in this communication is intended solely for the use of the individual or entity to whom it is addressed and others authorized to receive it. It may contain confidential or legally privileged information. If you are not the intended recipient you are hereby notified that any disclosure, copying, distribution or taking any action in reliance on the contents of this information is strictly prohibited and may be unlawful. If you have received this communication in error, please notify us immediately by responding to this email and then delete it from your system. The firm is neither liable for the proper and complete transmission of the information contained in this communication nor for any delay in its receipt.
  • Dmitriy Ryaboy at Sep 27, 2010 at 8:13 pm
    lzop should work.
    On Mon, Sep 27, 2010 at 10:59 AM, Rohan Rai wrote:

    Well

    I haven't tried (rather I don't remember) compressing via lzop and then
    putting on cluster...
    So cant tell you about that...Here is what works for me.

    I do it by first putting the file on cluster and then doing Stream
    Compression.

    And yes it need not be indexed (I guess it doesn't matter for small
    test file, otherwise it is unwise
    for one loses the benefit of parallelism)

    Regards
    Rohan


    pig wrote:
    Hi Rohan,

    The test file (test_input_chars.txt.lzo) is not indexed. I created it
    using
    the command

    'lzop test_input_chars.txt'

    It's a really small file (only 6 lines) so I didn't think it needed to be
    index. Do all files regardless of size need to be indexed for the
    LzoTokenizedLoader to work?

    Thank you!

    ~Ed

    On Mon, Sep 27, 2010 at 1:25 AM, Rohan Rai wrote:


    Oh Sorry I am completely out of sync...
    Can you tell how did you lzo'ed and indexed the file


    Regards
    Rohan

    Rohan Rai wrote:


    Oh Sorry I did not see this mail ...
    Its not an official patch/release

    But here is a fork on elephant-bird which works with pig 0.7

    for normal LZOText Loading etc

    (NOt HbaseLoader)

    Regards
    Rohan

    Dmitriy Ryaboy wrote:

    The 0.7 branch is not tested.. it's quite likely it doesn't actually
    work

    :).
    Rohan Rai was working on it.. Rohan, think you can take a look and help
    Ed
    out?

    Ed, you may want to check if the same input works when you use Pig 0.6
    (and
    the official elephant-bird, on Kevin Weil's github).

    -D

    On Thu, Sep 23, 2010 at 6:49 AM, pig wrote:



    Hello,

    After getting all the errors to go away with LZO libraries not being
    found
    and missing jar files for elephant-bird I've run into a new problem
    when
    using the elephant-bird branch for pig 0.7

    The following simple pig script works as expected

    REGISTER elephant-bird-1.0.jar
    REGISTER /usr/lib/elephant-bird/lib/google-collect-1.0.jar
    A = load '/usr/foo/input/test_input_chars.txt';
    DUMP A;

    This just dumps out the contents of the test_input_chars.txt file
    which
    is
    tab delimited. The output looks like:

    (1,a,a,a,a,a,a)
    (2,b,b,b,b,b,b)
    (3,c,c,c,c,c,c)
    (4,d,d,d,d,d,d)
    (5,e,e,e,e,e,e)

    I then lzop the test file to get test_input_chars.txt.lzo (I
    decompressed
    this with lzop -d to make sure the compression worked fine and
    everything
    looks good).
    If I run the exact same script provided above on the lzo file it works
    fine. However, this file is really small and doesn't need to use
    indexes.
    As a result, I wanted to
    have LZO support that worked with indexes. Based on this I decided to
    try
    out the elephant-bird branch for pig 0.7 located here (
    http://github.com/hirohanin/elephant-bird/) as
    recommended by Dimitriy.

    I created the following pig script that mirrors the above script but
    should
    hopefully work on LZO files (including indexed ones)

    REGISTER elephant-bird-1.0.jar
    REGISTER /usr/lib/elephant-bird/lib/google-collect-1.0.jar
    A = load '/usr/foo/input/test_input_chars.txt.lzo' USING
    com.twitter.elephantbird.pig.load.LzoTokenizedLoader('\t');
    DUMP A;

    When I run this script which uses the LzoTokenizedLoader there is no
    output. The script appears to run without errors but there are zero
    Records
    Written and 0 Bytes Written.

    Here is the exact output:

    grunt > DUMP A;
    [main] INFO com.twitter.elephantbird.pig.load.LzoTokenizedLoader -
    LzoTokenizedLoader with given delimited [ ]
    [main] INFO com.twitter.elephantbird.pig.load.LzoTokenizedLoader -
    LzoTokenizedLoader with given delimited [ ]
    [main] INFO com.twitter.elephantbird.pig.load.LzoTokenizedLoader -
    LzoTokenizedLoader with given delimited [ ]
    [main] INFO
    org.apache.pig.backend.hadoop.executionengine.HExecutionEngine
    -
    (Name:



    Store(hdfs://master:9000/tmp/temp-2052828736/tmp-1533645117:org.apache.pig.builtin.BinStorage)
    - 1-4 Operator Key: 1-4
    [main] INFO



    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
    - MR plan size before optimization: 1
    [main] INFO



    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
    - MR plan size after optimization: 1
    [main] INFO



    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
    - mapred.job.reduce.markreset.buffer.percent is not set, set to
    default
    0.3
    [main] INFO



    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
    - Setting up single store job
    [main] INFO



    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
    - 1 map-reduce job(s) waiting for submission.
    [Thread-12] WARN org.apache.hadoop.mapred.JobClient - Use
    GenericOptionsParser for parsing the arguments. Applications should
    implement Tool for the same.
    [Thread-12] INFO com.twitter.elephantbird.pig.load.LzoTokenizedLoader
    -
    LzoTokenizedLoader with given delimiter [ ]
    [Thread-12] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat
    -
    Total input paths to process : 1
    [main] INFO



    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
    - 0% complete
    [main] INFO



    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
    - HadoopJobId: job_201009101108_0151
    [main] INFO



    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
    - More information at
    http://master:50030/jobdetails.jsp?jobid=job_201009101108_0151
    [main] INFO



    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
    - 50% complete
    [main] INFO



    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
    - 100% complete
    [main] INFO



    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
    - Succesfully stored result in
    "hdfs://amb-hadoop-01:9000/tmp/temp-2052828736/tmp-1533645117
    [main] INFO



    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
    - Records written: 0
    [main] INFO



    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
    - Bytes written: 0
    [main] INFO



    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
    - Spillable Memory Manager spill count : 0
    [main] INFO



    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
    - Proactive spill count : 0
    [main] INFO



    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
    - Success!
    [main] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat -
    Total
    input paths to process: 1
    [main] INFO
    org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil -
    Total input paths to process: 1
    grunt >

    I'm not sure if I'm doing something wrong in my use of
    LzoTokenizedLoader
    or
    if there is a problem with the class itself (most likely the problem
    is
    with
    my code heh) Thank you for any help!

    ~Ed



    .

    The information contained in this communication is intended solely for
    the
    use of the individual or entity to whom it is addressed and others
    authorized to receive it. It may contain confidential or legally
    privileged
    information. If you are not the intended recipient you are hereby
    notified
    that any disclosure, copying, distribution or taking any action in
    reliance
    on the contents of this information is strictly prohibited and may be
    unlawful. If you have received this communication in error, please
    notify us
    immediately by responding to this email and then delete it from your
    system.
    The firm is neither liable for the proper and complete transmission of
    the
    information contained in this communication nor for any delay in its
    receipt.
    .




    The information contained in this communication is intended solely for
    the
    use of the individual or entity to whom it is addressed and others
    authorized to receive it. It may contain confidential or legally
    privileged
    information. If you are not the intended recipient you are hereby
    notified
    that any disclosure, copying, distribution or taking any action in
    reliance
    on the contents of this information is strictly prohibited and may be
    unlawful. If you have received this communication in error, please notify
    us
    immediately by responding to this email and then delete it from your
    system.
    The firm is neither liable for the proper and complete transmission of
    the
    information contained in this communication nor for any delay in its
    receipt.


    .
    The information contained in this communication is intended solely for the
    use of the individual or entity to whom it is addressed and others
    authorized to receive it. It may contain confidential or legally privileged
    information. If you are not the intended recipient you are hereby notified
    that any disclosure, copying, distribution or taking any action in reliance
    on the contents of this information is strictly prohibited and may be
    unlawful. If you have received this communication in error, please notify us
    immediately by responding to this email and then delete it from your system.
    The firm is neither liable for the proper and complete transmission of the
    information contained in this communication nor for any delay in its
    receipt.
  • Rohan Rai at Sep 28, 2010 at 3:52 am
    Just corrected/tested and pushed LzoTokenizedLoader to the personal fork

    Hopefully it works now

    Regards
    Rohan

    Dmitriy Ryaboy wrote:
    lzop should work.

    On Mon, Sep 27, 2010 at 10:59 AM, Rohan Rai wrote:

    Well

    I haven't tried (rather I don't remember) compressing via lzop and then
    putting on cluster...
    So cant tell you about that...Here is what works for me.

    I do it by first putting the file on cluster and then doing Stream
    Compression.

    And yes it need not be indexed (I guess it doesn't matter for small
    test file, otherwise it is unwise
    for one loses the benefit of parallelism)

    Regards
    Rohan


    pig wrote:

    Hi Rohan,

    The test file (test_input_chars.txt.lzo) is not indexed. I created it
    using
    the command

    'lzop test_input_chars.txt'

    It's a really small file (only 6 lines) so I didn't think it needed to be
    index. Do all files regardless of size need to be indexed for the
    LzoTokenizedLoader to work?

    Thank you!

    ~Ed

    On Mon, Sep 27, 2010 at 1:25 AM, Rohan Rai wrote:


    Oh Sorry I am completely out of sync...
    Can you tell how did you lzo'ed and indexed the file


    Regards
    Rohan

    Rohan Rai wrote:


    Oh Sorry I did not see this mail ...
    Its not an official patch/release

    But here is a fork on elephant-bird which works with pig 0.7

    for normal LZOText Loading etc

    (NOt HbaseLoader)

    Regards
    Rohan

    Dmitriy Ryaboy wrote:

    The 0.7 branch is not tested.. it's quite likely it doesn't actually
    work

    :).
    Rohan Rai was working on it.. Rohan, think you can take a look and help
    Ed
    out?

    Ed, you may want to check if the same input works when you use Pig 0.6
    (and
    the official elephant-bird, on Kevin Weil's github).

    -D

    On Thu, Sep 23, 2010 at 6:49 AM, pig wrote:



    Hello,

    After getting all the errors to go away with LZO libraries not being
    found
    and missing jar files for elephant-bird I've run into a new problem
    when
    using the elephant-bird branch for pig 0.7

    The following simple pig script works as expected

    REGISTER elephant-bird-1.0.jar
    REGISTER /usr/lib/elephant-bird/lib/google-collect-1.0.jar
    A = load '/usr/foo/input/test_input_chars.txt';
    DUMP A;

    This just dumps out the contents of the test_input_chars.txt file
    which
    is
    tab delimited. The output looks like:

    (1,a,a,a,a,a,a)
    (2,b,b,b,b,b,b)
    (3,c,c,c,c,c,c)
    (4,d,d,d,d,d,d)
    (5,e,e,e,e,e,e)

    I then lzop the test file to get test_input_chars.txt.lzo (I
    decompressed
    this with lzop -d to make sure the compression worked fine and
    everything
    looks good).
    If I run the exact same script provided above on the lzo file it works
    fine. However, this file is really small and doesn't need to use
    indexes.
    As a result, I wanted to
    have LZO support that worked with indexes. Based on this I decided to
    try
    out the elephant-bird branch for pig 0.7 located here (
    http://github.com/hirohanin/elephant-bird/) as
    recommended by Dimitriy.

    I created the following pig script that mirrors the above script but
    should
    hopefully work on LZO files (including indexed ones)

    REGISTER elephant-bird-1.0.jar
    REGISTER /usr/lib/elephant-bird/lib/google-collect-1.0.jar
    A = load '/usr/foo/input/test_input_chars.txt.lzo' USING
    com.twitter.elephantbird.pig.load.LzoTokenizedLoader('\t');
    DUMP A;

    When I run this script which uses the LzoTokenizedLoader there is no
    output. The script appears to run without errors but there are zero
    Records
    Written and 0 Bytes Written.

    Here is the exact output:

    grunt > DUMP A;
    [main] INFO com.twitter.elephantbird.pig.load.LzoTokenizedLoader -
    LzoTokenizedLoader with given delimited [ ]
    [main] INFO com.twitter.elephantbird.pig.load.LzoTokenizedLoader -
    LzoTokenizedLoader with given delimited [ ]
    [main] INFO com.twitter.elephantbird.pig.load.LzoTokenizedLoader -
    LzoTokenizedLoader with given delimited [ ]
    [main] INFO
    org.apache.pig.backend.hadoop.executionengine.HExecutionEngine
    -
    (Name:



    Store(hdfs://master:9000/tmp/temp-2052828736/tmp-1533645117:org.apache.pig.builtin.BinStorage)
    - 1-4 Operator Key: 1-4
    [main] INFO



    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
    - MR plan size before optimization: 1
    [main] INFO



    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
    - MR plan size after optimization: 1
    [main] INFO



    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
    - mapred.job.reduce.markreset.buffer.percent is not set, set to
    default
    0.3
    [main] INFO



    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
    - Setting up single store job
    [main] INFO



    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
    - 1 map-reduce job(s) waiting for submission.
    [Thread-12] WARN org.apache.hadoop.mapred.JobClient - Use
    GenericOptionsParser for parsing the arguments. Applications should
    implement Tool for the same.
    [Thread-12] INFO com.twitter.elephantbird.pig.load.LzoTokenizedLoader
    -
    LzoTokenizedLoader with given delimiter [ ]
    [Thread-12] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat
    -
    Total input paths to process : 1
    [main] INFO



    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
    - 0% complete
    [main] INFO



    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
    - HadoopJobId: job_201009101108_0151
    [main] INFO



    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
    - More information at
    http://master:50030/jobdetails.jsp?jobid=job_201009101108_0151
    [main] INFO



    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
    - 50% complete
    [main] INFO



    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
    - 100% complete
    [main] INFO



    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
    - Succesfully stored result in
    "hdfs://amb-hadoop-01:9000/tmp/temp-2052828736/tmp-1533645117
    [main] INFO



    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
    - Records written: 0
    [main] INFO



    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
    - Bytes written: 0
    [main] INFO



    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
    - Spillable Memory Manager spill count : 0
    [main] INFO



    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
    - Proactive spill count : 0
    [main] INFO



    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
    - Success!
    [main] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat -
    Total
    input paths to process: 1
    [main] INFO
    org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil -
    Total input paths to process: 1
    grunt >

    I'm not sure if I'm doing something wrong in my use of
    LzoTokenizedLoader
    or
    if there is a problem with the class itself (most likely the problem
    is
    with
    my code heh) Thank you for any help!

    ~Ed



    .

    The information contained in this communication is intended solely for
    the
    use of the individual or entity to whom it is addressed and others
    authorized to receive it. It may contain confidential or legally
    privileged
    information. If you are not the intended recipient you are hereby
    notified
    that any disclosure, copying, distribution or taking any action in
    reliance
    on the contents of this information is strictly prohibited and may be
    unlawful. If you have received this communication in error, please
    notify us
    immediately by responding to this email and then delete it from your
    system.
    The firm is neither liable for the proper and complete transmission of
    the
    information contained in this communication nor for any delay in its
    receipt.
    .




    The information contained in this communication is intended solely for
    the
    use of the individual or entity to whom it is addressed and others
    authorized to receive it. It may contain confidential or legally
    privileged
    information. If you are not the intended recipient you are hereby
    notified
    that any disclosure, copying, distribution or taking any action in
    reliance
    on the contents of this information is strictly prohibited and may be
    unlawful. If you have received this communication in error, please notify
    us
    immediately by responding to this email and then delete it from your
    system.
    The firm is neither liable for the proper and complete transmission of
    the
    information contained in this communication nor for any delay in its
    receipt.


    .
    The information contained in this communication is intended solely for the
    use of the individual or entity to whom it is addressed and others
    authorized to receive it. It may contain confidential or legally privileged
    information. If you are not the intended recipient you are hereby notified
    that any disclosure, copying, distribution or taking any action in reliance
    on the contents of this information is strictly prohibited and may be
    unlawful. If you have received this communication in error, please notify us
    immediately by responding to this email and then delete it from your system.
    The firm is neither liable for the proper and complete transmission of the
    information contained in this communication nor for any delay in its
    receipt.
    .

    The information contained in this communication is intended solely for the use of the individual or entity to whom it is addressed and others authorized to receive it. It may contain confidential or legally privileged information. If you are not the intended recipient you are hereby notified that any disclosure, copying, distribution or taking any action in reliance on the contents of this information is strictly prohibited and may be unlawful. If you have received this communication in error, please notify us immediately by responding to this email and then delete it from your system. The firm is neither liable for the proper and complete transmission of the information contained in this communication nor for any delay in its receipt.
  • Ed at Sep 28, 2010 at 6:11 pm
    Thank you Rohan, I really appreciate your help! I'll give it shot and post
    back if it works.

    ~Ed
    On Mon, Sep 27, 2010 at 11:51 PM, Rohan Rai wrote:

    Just corrected/tested and pushed LzoTokenizedLoader to the personal fork

    Hopefully it works now


    Regards
    Rohan

    Dmitriy Ryaboy wrote:
    lzop should work.

    On Mon, Sep 27, 2010 at 10:59 AM, Rohan Rai wrote:


    Well
    I haven't tried (rather I don't remember) compressing via lzop and then
    putting on cluster...
    So cant tell you about that...Here is what works for me.

    I do it by first putting the file on cluster and then doing Stream
    Compression.

    And yes it need not be indexed (I guess it doesn't matter for small
    test file, otherwise it is unwise
    for one loses the benefit of parallelism)

    Regards
    Rohan


    pig wrote:


    Hi Rohan,
    The test file (test_input_chars.txt.lzo) is not indexed. I created it
    using
    the command

    'lzop test_input_chars.txt'

    It's a really small file (only 6 lines) so I didn't think it needed to
    be
    index. Do all files regardless of size need to be indexed for the
    LzoTokenizedLoader to work?

    Thank you!

    ~Ed

    On Mon, Sep 27, 2010 at 1:25 AM, Rohan Rai <rohan.rai@inmobi.com>
    wrote:


    Oh Sorry I am completely out of sync...

    Can you tell how did you lzo'ed and indexed the file

    Regards
    Rohan

    Rohan Rai wrote:


    Oh Sorry I did not see this mail ...

    Its not an official patch/release
    But here is a fork on elephant-bird which works with pig 0.7

    for normal LZOText Loading etc

    (NOt HbaseLoader)

    Regards
    Rohan

    Dmitriy Ryaboy wrote:

    The 0.7 branch is not tested.. it's quite likely it doesn't actually
    work

    :).

    Rohan Rai was working on it.. Rohan, think you can take a look and
    help
    Ed
    out?

    Ed, you may want to check if the same input works when you use Pig
    0.6
    (and
    the official elephant-bird, on Kevin Weil's github).

    -D

    On Thu, Sep 23, 2010 at 6:49 AM, pig wrote:



    Hello,

    After getting all the errors to go away with LZO libraries not being

    found
    and missing jar files for elephant-bird I've run into a new problem
    when
    using the elephant-bird branch for pig 0.7

    The following simple pig script works as expected

    REGISTER elephant-bird-1.0.jar
    REGISTER /usr/lib/elephant-bird/lib/google-collect-1.0.jar
    A = load '/usr/foo/input/test_input_chars.txt';
    DUMP A;

    This just dumps out the contents of the test_input_chars.txt file
    which
    is
    tab delimited. The output looks like:

    (1,a,a,a,a,a,a)
    (2,b,b,b,b,b,b)
    (3,c,c,c,c,c,c)
    (4,d,d,d,d,d,d)
    (5,e,e,e,e,e,e)

    I then lzop the test file to get test_input_chars.txt.lzo (I
    decompressed
    this with lzop -d to make sure the compression worked fine and
    everything
    looks good).
    If I run the exact same script provided above on the lzo file it
    works
    fine. However, this file is really small and doesn't need to use
    indexes.
    As a result, I wanted to
    have LZO support that worked with indexes. Based on this I decided
    to
    try
    out the elephant-bird branch for pig 0.7 located here (
    http://github.com/hirohanin/elephant-bird/) as
    recommended by Dimitriy.

    I created the following pig script that mirrors the above script but
    should
    hopefully work on LZO files (including indexed ones)

    REGISTER elephant-bird-1.0.jar
    REGISTER /usr/lib/elephant-bird/lib/google-collect-1.0.jar
    A = load '/usr/foo/input/test_input_chars.txt.lzo' USING
    com.twitter.elephantbird.pig.load.LzoTokenizedLoader('\t');
    DUMP A;

    When I run this script which uses the LzoTokenizedLoader there is no
    output. The script appears to run without errors but there are zero
    Records
    Written and 0 Bytes Written.

    Here is the exact output:

    grunt > DUMP A;
    [main] INFO com.twitter.elephantbird.pig.load.LzoTokenizedLoader -
    LzoTokenizedLoader with given delimited [ ]
    [main] INFO com.twitter.elephantbird.pig.load.LzoTokenizedLoader -
    LzoTokenizedLoader with given delimited [ ]
    [main] INFO com.twitter.elephantbird.pig.load.LzoTokenizedLoader -
    LzoTokenizedLoader with given delimited [ ]
    [main] INFO
    org.apache.pig.backend.hadoop.executionengine.HExecutionEngine
    -
    (Name:




    Store(hdfs://master:9000/tmp/temp-2052828736/tmp-1533645117:org.apache.pig.builtin.BinStorage)
    - 1-4 Operator Key: 1-4
    [main] INFO




    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
    - MR plan size before optimization: 1
    [main] INFO




    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
    - MR plan size after optimization: 1
    [main] INFO




    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
    - mapred.job.reduce.markreset.buffer.percent is not set, set to
    default
    0.3
    [main] INFO




    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
    - Setting up single store job
    [main] INFO




    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
    - 1 map-reduce job(s) waiting for submission.
    [Thread-12] WARN org.apache.hadoop.mapred.JobClient - Use
    GenericOptionsParser for parsing the arguments. Applications should
    implement Tool for the same.
    [Thread-12] INFO
    com.twitter.elephantbird.pig.load.LzoTokenizedLoader
    -
    LzoTokenizedLoader with given delimiter [ ]
    [Thread-12] INFO
    org.apache.hadoop.mapreduce.lib.input.FileInputFormat
    -
    Total input paths to process : 1
    [main] INFO




    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
    - 0% complete
    [main] INFO




    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
    - HadoopJobId: job_201009101108_0151
    [main] INFO




    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
    - More information at
    http://master:50030/jobdetails.jsp?jobid=job_201009101108_0151
    [main] INFO




    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
    - 50% complete
    [main] INFO




    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
    - 100% complete
    [main] INFO




    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
    - Succesfully stored result in
    "hdfs://amb-hadoop-01:9000/tmp/temp-2052828736/tmp-1533645117
    [main] INFO




    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
    - Records written: 0
    [main] INFO




    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
    - Bytes written: 0
    [main] INFO




    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
    - Spillable Memory Manager spill count : 0
    [main] INFO




    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
    - Proactive spill count : 0
    [main] INFO




    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
    - Success!
    [main] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat -
    Total
    input paths to process: 1
    [main] INFO
    org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil -
    Total input paths to process: 1
    grunt >

    I'm not sure if I'm doing something wrong in my use of
    LzoTokenizedLoader
    or
    if there is a problem with the class itself (most likely the problem
    is
    with
    my code heh) Thank you for any help!

    ~Ed



    .



    The information contained in this communication is intended solely
    for

    the
    use of the individual or entity to whom it is addressed and others
    authorized to receive it. It may contain confidential or legally
    privileged
    information. If you are not the intended recipient you are hereby
    notified
    that any disclosure, copying, distribution or taking any action in
    reliance
    on the contents of this information is strictly prohibited and may be
    unlawful. If you have received this communication in error, please
    notify us
    immediately by responding to this email and then delete it from your
    system.
    The firm is neither liable for the proper and complete transmission of
    the
    information contained in this communication nor for any delay in its
    receipt.
    .




    The information contained in this communication is intended solely
    for

    the
    use of the individual or entity to whom it is addressed and others
    authorized to receive it. It may contain confidential or legally
    privileged
    information. If you are not the intended recipient you are hereby
    notified
    that any disclosure, copying, distribution or taking any action in
    reliance
    on the contents of this information is strictly prohibited and may be
    unlawful. If you have received this communication in error, please
    notify
    us
    immediately by responding to this email and then delete it from your
    system.
    The firm is neither liable for the proper and complete transmission of
    the
    information contained in this communication nor for any delay in its
    receipt.


    .
    The information contained in this communication is intended solely for
    the
    use of the individual or entity to whom it is addressed and others
    authorized to receive it. It may contain confidential or legally
    privileged
    information. If you are not the intended recipient you are hereby
    notified
    that any disclosure, copying, distribution or taking any action in
    reliance
    on the contents of this information is strictly prohibited and may be
    unlawful. If you have received this communication in error, please notify
    us
    immediately by responding to this email and then delete it from your
    system.
    The firm is neither liable for the proper and complete transmission of
    the
    information contained in this communication nor for any delay in its
    receipt.


    .
    The information contained in this communication is intended solely for the
    use of the individual or entity to whom it is addressed and others
    authorized to receive it. It may contain confidential or legally privileged
    information. If you are not the intended recipient you are hereby notified
    that any disclosure, copying, distribution or taking any action in reliance
    on the contents of this information is strictly prohibited and may be
    unlawful. If you have received this communication in error, please notify us
    immediately by responding to this email and then delete it from your system.
    The firm is neither liable for the proper and complete transmission of the
    information contained in this communication nor for any delay in its
    receipt.
  • Ed at Sep 29, 2010 at 7:19 pm
    Hello,

    I tested the newest push to the hirohanin elephant-bird branch (for pig 0.7)
    and had an error when trying to use LzoTokenizedLoader with the following
    pig script:

    REGISTER elephant-bird-1.0.jar
    REGISTER /usr/lib/elephant-bird/lib/google-collect-1.0.jar
    A = load '/usr/foo/input/test_input_chars.txt.lzo' USING
    com.twitter.elephantbird.pig.load.LzoTokenizedLoader('\t');
    DUMP A;

    The error I get is in the mapper logs and is as follows:

    INFO com.hadoop.compression.lzo.GPLNativeCodeLoader: Loaded native gpl
    library
    INFO com.hadoop.compression.lzo.LzoCodec: Succesfully loaded & initialized
    native-lzo library
    INFO com.twitter.elephantbird.pig.load.LzoTokenizedLoader:
    LzoTokenizedLoader with given delimiter [ ]
    INFO com.twitter.elephantbird.mapreduce.input.LzoRecordReader: Seeking to
    split start at pos 0
    FATAL org.apache.hadoop.mapred.TaskTracker: Error running child :
    java.lang.NoSuchMethodError:
    org.apache.pig.backend.executionengine.mapReduceLayer.PigHadoopLogger.getTaskIOCContext()Lorg/apache/hadoop/mapreduce/TaskInputOutputContext;
    at com.twitter.elephantbird.pig.util.PigCounterHelper.getTIOC (Unknown
    Source)
    at com.twitter.elephantbird.pig.util.PigCounterHelper.incrCounter
    (Unknown Source)
    at com.twitter.elephantbird.pig.load.LzoBaseLoadFunc.incrCounter
    (Unknown Source)
    at com.twitter.elephantbird.pig.load.LzoTokenizedLoader.getNext
    (Unknown Source)
    at
    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.nextKeyValue
    (PigRecordReader.java:142)
    at
    org.apache.hadoop.mapred.MapTask$NewTrackignRecordReader.nextKeyValue(MapTask.java:423)
    at org.apache.hadoop.mapreduce.MapContext.nextKeyValue
    (MapContent.java:67)
    at org.apache.hadoop.mapreduce.Mapper.run (Mapper.java:143)
    at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
    at org.apache.hadoop.mapred.Child.main(Child.java:170)

    Do you think I'm forgetting some required library?

    Thank you!

    ~Ed
    On Tue, Sep 28, 2010 at 2:10 PM, ed wrote:

    Thank you Rohan, I really appreciate your help! I'll give it shot and
    post back if it works.

    ~Ed

    On Mon, Sep 27, 2010 at 11:51 PM, Rohan Rai wrote:

    Just corrected/tested and pushed LzoTokenizedLoader to the personal fork

    Hopefully it works now


    Regards
    Rohan

    Dmitriy Ryaboy wrote:
    lzop should work.

    On Mon, Sep 27, 2010 at 10:59 AM, Rohan Rai <rohan.rai@inmobi.com>
    wrote:


    Well
    I haven't tried (rather I don't remember) compressing via lzop and then
    putting on cluster...
    So cant tell you about that...Here is what works for me.

    I do it by first putting the file on cluster and then doing Stream
    Compression.

    And yes it need not be indexed (I guess it doesn't matter for small
    test file, otherwise it is unwise
    for one loses the benefit of parallelism)

    Regards
    Rohan


    pig wrote:


    Hi Rohan,
    The test file (test_input_chars.txt.lzo) is not indexed. I created it
    using
    the command

    'lzop test_input_chars.txt'

    It's a really small file (only 6 lines) so I didn't think it needed to
    be
    index. Do all files regardless of size need to be indexed for the
    LzoTokenizedLoader to work?

    Thank you!

    ~Ed

    On Mon, Sep 27, 2010 at 1:25 AM, Rohan Rai <rohan.rai@inmobi.com>
    wrote:


    Oh Sorry I am completely out of sync...

    Can you tell how did you lzo'ed and indexed the file

    Regards
    Rohan

    Rohan Rai wrote:


    Oh Sorry I did not see this mail ...

    Its not an official patch/release
    But here is a fork on elephant-bird which works with pig 0.7

    for normal LZOText Loading etc

    (NOt HbaseLoader)

    Regards
    Rohan

    Dmitriy Ryaboy wrote:

    The 0.7 branch is not tested.. it's quite likely it doesn't actually
    work

    :).

    Rohan Rai was working on it.. Rohan, think you can take a look and
    help
    Ed
    out?

    Ed, you may want to check if the same input works when you use Pig
    0.6
    (and
    the official elephant-bird, on Kevin Weil's github).

    -D

    On Thu, Sep 23, 2010 at 6:49 AM, pig wrote:



    Hello,

    After getting all the errors to go away with LZO libraries not
    being

    found
    and missing jar files for elephant-bird I've run into a new problem
    when
    using the elephant-bird branch for pig 0.7

    The following simple pig script works as expected

    REGISTER elephant-bird-1.0.jar
    REGISTER /usr/lib/elephant-bird/lib/google-collect-1.0.jar
    A = load '/usr/foo/input/test_input_chars.txt';
    DUMP A;

    This just dumps out the contents of the test_input_chars.txt file
    which
    is
    tab delimited. The output looks like:

    (1,a,a,a,a,a,a)
    (2,b,b,b,b,b,b)
    (3,c,c,c,c,c,c)
    (4,d,d,d,d,d,d)
    (5,e,e,e,e,e,e)

    I then lzop the test file to get test_input_chars.txt.lzo (I
    decompressed
    this with lzop -d to make sure the compression worked fine and
    everything
    looks good).
    If I run the exact same script provided above on the lzo file it
    works
    fine. However, this file is really small and doesn't need to use
    indexes.
    As a result, I wanted to
    have LZO support that worked with indexes. Based on this I decided
    to
    try
    out the elephant-bird branch for pig 0.7 located here (
    http://github.com/hirohanin/elephant-bird/) as
    recommended by Dimitriy.

    I created the following pig script that mirrors the above script
    but
    should
    hopefully work on LZO files (including indexed ones)

    REGISTER elephant-bird-1.0.jar
    REGISTER /usr/lib/elephant-bird/lib/google-collect-1.0.jar
    A = load '/usr/foo/input/test_input_chars.txt.lzo' USING
    com.twitter.elephantbird.pig.load.LzoTokenizedLoader('\t');
    DUMP A;

    When I run this script which uses the LzoTokenizedLoader there is
    no
    output. The script appears to run without errors but there are
    zero
    Records
    Written and 0 Bytes Written.

    Here is the exact output:

    grunt > DUMP A;
    [main] INFO com.twitter.elephantbird.pig.load.LzoTokenizedLoader -
    LzoTokenizedLoader with given delimited [ ]
    [main] INFO com.twitter.elephantbird.pig.load.LzoTokenizedLoader -
    LzoTokenizedLoader with given delimited [ ]
    [main] INFO com.twitter.elephantbird.pig.load.LzoTokenizedLoader -
    LzoTokenizedLoader with given delimited [ ]
    [main] INFO
    org.apache.pig.backend.hadoop.executionengine.HExecutionEngine
    -
    (Name:




    Store(hdfs://master:9000/tmp/temp-2052828736/tmp-1533645117:org.apache.pig.builtin.BinStorage)
    - 1-4 Operator Key: 1-4
    [main] INFO




    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
    - MR plan size before optimization: 1
    [main] INFO




    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
    - MR plan size after optimization: 1
    [main] INFO




    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
    - mapred.job.reduce.markreset.buffer.percent is not set, set to
    default
    0.3
    [main] INFO




    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
    - Setting up single store job
    [main] INFO




    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
    - 1 map-reduce job(s) waiting for submission.
    [Thread-12] WARN org.apache.hadoop.mapred.JobClient - Use
    GenericOptionsParser for parsing the arguments. Applications
    should
    implement Tool for the same.
    [Thread-12] INFO
    com.twitter.elephantbird.pig.load.LzoTokenizedLoader
    -
    LzoTokenizedLoader with given delimiter [ ]
    [Thread-12] INFO
    org.apache.hadoop.mapreduce.lib.input.FileInputFormat
    -
    Total input paths to process : 1
    [main] INFO




    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
    - 0% complete
    [main] INFO




    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
    - HadoopJobId: job_201009101108_0151
    [main] INFO




    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
    - More information at
    http://master:50030/jobdetails.jsp?jobid=job_201009101108_0151
    [main] INFO




    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
    - 50% complete
    [main] INFO




    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
    - 100% complete
    [main] INFO




    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
    - Succesfully stored result in
    "hdfs://amb-hadoop-01:9000/tmp/temp-2052828736/tmp-1533645117
    [main] INFO




    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
    - Records written: 0
    [main] INFO




    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
    - Bytes written: 0
    [main] INFO




    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
    - Spillable Memory Manager spill count : 0
    [main] INFO




    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
    - Proactive spill count : 0
    [main] INFO




    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
    - Success!
    [main] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat -
    Total
    input paths to process: 1
    [main] INFO
    org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil -
    Total input paths to process: 1
    grunt >

    I'm not sure if I'm doing something wrong in my use of
    LzoTokenizedLoader
    or
    if there is a problem with the class itself (most likely the
    problem
    is
    with
    my code heh) Thank you for any help!

    ~Ed



    .



    The information contained in this communication is intended
    solely for

    the
    use of the individual or entity to whom it is addressed and others
    authorized to receive it. It may contain confidential or legally
    privileged
    information. If you are not the intended recipient you are hereby
    notified
    that any disclosure, copying, distribution or taking any action in
    reliance
    on the contents of this information is strictly prohibited and may be
    unlawful. If you have received this communication in error, please
    notify us
    immediately by responding to this email and then delete it from your
    system.
    The firm is neither liable for the proper and complete transmission
    of
    the
    information contained in this communication nor for any delay in its
    receipt.
    .




    The information contained in this communication is intended solely
    for

    the
    use of the individual or entity to whom it is addressed and others
    authorized to receive it. It may contain confidential or legally
    privileged
    information. If you are not the intended recipient you are hereby
    notified
    that any disclosure, copying, distribution or taking any action in
    reliance
    on the contents of this information is strictly prohibited and may be
    unlawful. If you have received this communication in error, please
    notify
    us
    immediately by responding to this email and then delete it from your
    system.
    The firm is neither liable for the proper and complete transmission of
    the
    information contained in this communication nor for any delay in its
    receipt.


    .
    The information contained in this communication is intended solely for
    the
    use of the individual or entity to whom it is addressed and others
    authorized to receive it. It may contain confidential or legally
    privileged
    information. If you are not the intended recipient you are hereby
    notified
    that any disclosure, copying, distribution or taking any action in
    reliance
    on the contents of this information is strictly prohibited and may be
    unlawful. If you have received this communication in error, please
    notify us
    immediately by responding to this email and then delete it from your
    system.
    The firm is neither liable for the proper and complete transmission of
    the
    information contained in this communication nor for any delay in its
    receipt.


    .
    The information contained in this communication is intended solely for the
    use of the individual or entity to whom it is addressed and others
    authorized to receive it. It may contain confidential or legally privileged
    information. If you are not the intended recipient you are hereby notified
    that any disclosure, copying, distribution or taking any action in reliance
    on the contents of this information is strictly prohibited and may be
    unlawful. If you have received this communication in error, please notify us
    immediately by responding to this email and then delete it from your system.
    The firm is neither liable for the proper and complete transmission of the
    information contained in this communication nor for any delay in its
    receipt.
  • Rohan Rai at Sep 30, 2010 at 3:57 am
    Hi

    Which Hadoop/ PIg version are you using ??

    Regards
    Rohan

    ed wrote:
    Hello,

    I tested the newest push to the hirohanin elephant-bird branch (for pig 0.7)
    and had an error when trying to use LzoTokenizedLoader with the following
    pig script:

    REGISTER elephant-bird-1.0.jar
    REGISTER /usr/lib/elephant-bird/lib/google-collect-1.0.jar
    A = load '/usr/foo/input/test_input_chars.txt.lzo' USING
    com.twitter.elephantbird.pig.load.LzoTokenizedLoader('\t');
    DUMP A;

    The error I get is in the mapper logs and is as follows:

    INFO com.hadoop.compression.lzo.GPLNativeCodeLoader: Loaded native gpl
    library
    INFO com.hadoop.compression.lzo.LzoCodec: Succesfully loaded & initialized
    native-lzo library
    INFO com.twitter.elephantbird.pig.load.LzoTokenizedLoader:
    LzoTokenizedLoader with given delimiter [ ]
    INFO com.twitter.elephantbird.mapreduce.input.LzoRecordReader: Seeking to
    split start at pos 0
    FATAL org.apache.hadoop.mapred.TaskTracker: Error running child :
    java.lang.NoSuchMethodError:
    org.apache.pig.backend.executionengine.mapReduceLayer.PigHadoopLogger.getTaskIOCContext()Lorg/apache/hadoop/mapreduce/TaskInputOutputContext;
    at com.twitter.elephantbird.pig.util.PigCounterHelper.getTIOC (Unknown
    Source)
    at com.twitter.elephantbird.pig.util.PigCounterHelper.incrCounter
    (Unknown Source)
    at com.twitter.elephantbird.pig.load.LzoBaseLoadFunc.incrCounter
    (Unknown Source)
    at com.twitter.elephantbird.pig.load.LzoTokenizedLoader.getNext
    (Unknown Source)
    at
    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.nextKeyValue
    (PigRecordReader.java:142)
    at
    org.apache.hadoop.mapred.MapTask$NewTrackignRecordReader.nextKeyValue(MapTask.java:423)
    at org.apache.hadoop.mapreduce.MapContext.nextKeyValue
    (MapContent.java:67)
    at org.apache.hadoop.mapreduce.Mapper.run (Mapper.java:143)
    at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
    at org.apache.hadoop.mapred.Child.main(Child.java:170)

    Do you think I'm forgetting some required library?

    Thank you!

    ~Ed

    On Tue, Sep 28, 2010 at 2:10 PM, ed wrote:

    Thank you Rohan, I really appreciate your help! I'll give it shot and
    post back if it works.

    ~Ed


    On Mon, Sep 27, 2010 at 11:51 PM, Rohan Rai wrote:

    Just corrected/tested and pushed LzoTokenizedLoader to the personal fork

    Hopefully it works now


    Regards
    Rohan

    Dmitriy Ryaboy wrote:

    lzop should work.

    On Mon, Sep 27, 2010 at 10:59 AM, Rohan Rai <rohan.rai@inmobi.com>
    wrote:


    Well
    I haven't tried (rather I don't remember) compressing via lzop and then
    putting on cluster...
    So cant tell you about that...Here is what works for me.

    I do it by first putting the file on cluster and then doing Stream
    Compression.

    And yes it need not be indexed (I guess it doesn't matter for small
    test file, otherwise it is unwise
    for one loses the benefit of parallelism)

    Regards
    Rohan


    pig wrote:


    Hi Rohan,
    The test file (test_input_chars.txt.lzo) is not indexed. I created it
    using
    the command

    'lzop test_input_chars.txt'

    It's a really small file (only 6 lines) so I didn't think it needed to
    be
    index. Do all files regardless of size need to be indexed for the
    LzoTokenizedLoader to work?

    Thank you!

    ~Ed

    On Mon, Sep 27, 2010 at 1:25 AM, Rohan Rai <rohan.rai@inmobi.com>
    wrote:


    Oh Sorry I am completely out of sync...

    Can you tell how did you lzo'ed and indexed the file
    Regards
    Rohan

    Rohan Rai wrote:


    Oh Sorry I did not see this mail ...

    Its not an official patch/release
    But here is a fork on elephant-bird which works with pig 0.7

    for normal LZOText Loading etc

    (NOt HbaseLoader)

    Regards
    Rohan

    Dmitriy Ryaboy wrote:

    The 0.7 branch is not tested.. it's quite likely it doesn't actually
    work

    :).

    Rohan Rai was working on it.. Rohan, think you can take a look and
    help
    Ed
    out?

    Ed, you may want to check if the same input works when you use Pig
    0.6
    (and
    the official elephant-bird, on Kevin Weil's github).

    -D

    On Thu, Sep 23, 2010 at 6:49 AM, pig wrote:



    Hello,

    After getting all the errors to go away with LZO libraries not
    being

    found
    and missing jar files for elephant-bird I've run into a new problem
    when
    using the elephant-bird branch for pig 0.7

    The following simple pig script works as expected

    REGISTER elephant-bird-1.0.jar
    REGISTER /usr/lib/elephant-bird/lib/google-collect-1.0.jar
    A = load '/usr/foo/input/test_input_chars.txt';
    DUMP A;

    This just dumps out the contents of the test_input_chars.txt file
    which
    is
    tab delimited. The output looks like:

    (1,a,a,a,a,a,a)
    (2,b,b,b,b,b,b)
    (3,c,c,c,c,c,c)
    (4,d,d,d,d,d,d)
    (5,e,e,e,e,e,e)

    I then lzop the test file to get test_input_chars.txt.lzo (I
    decompressed
    this with lzop -d to make sure the compression worked fine and
    everything
    looks good).
    If I run the exact same script provided above on the lzo file it
    works
    fine. However, this file is really small and doesn't need to use
    indexes.
    As a result, I wanted to
    have LZO support that worked with indexes. Based on this I decided
    to
    try
    out the elephant-bird branch for pig 0.7 located here (
    http://github.com/hirohanin/elephant-bird/) as
    recommended by Dimitriy.

    I created the following pig script that mirrors the above script
    but
    should
    hopefully work on LZO files (including indexed ones)

    REGISTER elephant-bird-1.0.jar
    REGISTER /usr/lib/elephant-bird/lib/google-collect-1.0.jar
    A = load '/usr/foo/input/test_input_chars.txt.lzo' USING
    com.twitter.elephantbird.pig.load.LzoTokenizedLoader('\t');
    DUMP A;

    When I run this script which uses the LzoTokenizedLoader there is
    no
    output. The script appears to run without errors but there are
    zero
    Records
    Written and 0 Bytes Written.

    Here is the exact output:

    grunt > DUMP A;
    [main] INFO com.twitter.elephantbird.pig.load.LzoTokenizedLoader -
    LzoTokenizedLoader with given delimited [ ]
    [main] INFO com.twitter.elephantbird.pig.load.LzoTokenizedLoader -
    LzoTokenizedLoader with given delimited [ ]
    [main] INFO com.twitter.elephantbird.pig.load.LzoTokenizedLoader -
    LzoTokenizedLoader with given delimited [ ]
    [main] INFO
    org.apache.pig.backend.hadoop.executionengine.HExecutionEngine
    -
    (Name:




    Store(hdfs://master:9000/tmp/temp-2052828736/tmp-1533645117:org.apache.pig.builtin.BinStorage)
    - 1-4 Operator Key: 1-4
    [main] INFO




    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
    - MR plan size before optimization: 1
    [main] INFO




    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
    - MR plan size after optimization: 1
    [main] INFO




    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
    - mapred.job.reduce.markreset.buffer.percent is not set, set to
    default
    0.3
    [main] INFO




    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
    - Setting up single store job
    [main] INFO




    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
    - 1 map-reduce job(s) waiting for submission.
    [Thread-12] WARN org.apache.hadoop.mapred.JobClient - Use
    GenericOptionsParser for parsing the arguments. Applications
    should
    implement Tool for the same.
    [Thread-12] INFO
    com.twitter.elephantbird.pig.load.LzoTokenizedLoader
    -
    LzoTokenizedLoader with given delimiter [ ]
    [Thread-12] INFO
    org.apache.hadoop.mapreduce.lib.input.FileInputFormat
    -
    Total input paths to process : 1
    [main] INFO




    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
    - 0% complete
    [main] INFO




    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
    - HadoopJobId: job_201009101108_0151
    [main] INFO




    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
    - More information at
    http://master:50030/jobdetails.jsp?jobid=job_201009101108_0151
    [main] INFO




    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
    - 50% complete
    [main] INFO




    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
    - 100% complete
    [main] INFO




    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
    - Succesfully stored result in
    "hdfs://amb-hadoop-01:9000/tmp/temp-2052828736/tmp-1533645117
    [main] INFO




    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
    - Records written: 0
    [main] INFO




    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
    - Bytes written: 0
    [main] INFO




    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
    - Spillable Memory Manager spill count : 0
    [main] INFO




    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
    - Proactive spill count : 0
    [main] INFO




    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
    - Success!
    [main] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat -
    Total
    input paths to process: 1
    [main] INFO
    org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil -
    Total input paths to process: 1
    grunt >

    I'm not sure if I'm doing something wrong in my use of
    LzoTokenizedLoader
    or
    if there is a problem with the class itself (most likely the
    problem
    is
    with
    my code heh) Thank you for any help!

    ~Ed



    .



    The information contained in this communication is intended
    solely for

    the
    use of the individual or entity to whom it is addressed and others
    authorized to receive it. It may contain confidential or legally
    privileged
    information. If you are not the intended recipient you are hereby
    notified
    that any disclosure, copying, distribution or taking any action in
    reliance
    on the contents of this information is strictly prohibited and may be
    unlawful. If you have received this communication in error, please
    notify us
    immediately by responding to this email and then delete it from your
    system.
    The firm is neither liable for the proper and complete transmission
    of
    the
    information contained in this communication nor for any delay in its
    receipt.
    .




    The information contained in this communication is intended solely
    for

    the
    use of the individual or entity to whom it is addressed and others
    authorized to receive it. It may contain confidential or legally
    privileged
    information. If you are not the intended recipient you are hereby
    notified
    that any disclosure, copying, distribution or taking any action in
    reliance
    on the contents of this information is strictly prohibited and may be
    unlawful. If you have received this communication in error, please
    notify
    us
    immediately by responding to this email and then delete it from your
    system.
    The firm is neither liable for the proper and complete transmission of
    the
    information contained in this communication nor for any delay in its
    receipt.


    .

    The information contained in this communication is intended solely for
    the
    use of the individual or entity to whom it is addressed and others
    authorized to receive it. It may contain confidential or legally
    privileged
    information. If you are not the intended recipient you are hereby
    notified
    that any disclosure, copying, distribution or taking any action in
    reliance
    on the contents of this information is strictly prohibited and may be
    unlawful. If you have received this communication in error, please
    notify us
    immediately by responding to this email and then delete it from your
    system.
    The firm is neither liable for the proper and complete transmission of
    the
    information contained in this communication nor for any delay in its
    receipt.


    .
    The information contained in this communication is intended solely for the
    use of the individual or entity to whom it is addressed and others
    authorized to receive it. It may contain confidential or legally privileged
    information. If you are not the intended recipient you are hereby notified
    that any disclosure, copying, distribution or taking any action in reliance
    on the contents of this information is strictly prohibited and may be
    unlawful. If you have received this communication in error, please notify us
    immediately by responding to this email and then delete it from your system.
    The firm is neither liable for the proper and complete transmission of the
    information contained in this communication nor for any delay in its
    receipt.
    .

    The information contained in this communication is intended solely for the use of the individual or entity to whom it is addressed and others authorized to receive it. It may contain confidential or legally privileged information. If you are not the intended recipient you are hereby notified that any disclosure, copying, distribution or taking any action in reliance on the contents of this information is strictly prohibited and may be unlawful. If you have received this communication in error, please notify us immediately by responding to this email and then delete it from your system. The firm is neither liable for the proper and complete transmission of the information contained in this communication nor for any delay in its receipt.
  • Ed at Sep 30, 2010 at 12:38 pm
    Hello,

    I'm using Cloudera's Hadoop CDH3B2--Hadoop-0.20.2+320 (based on Apache
    Hadoop 20.2) with Pig 0.7 (from Cloudera's distro).

    Thank you!

    ~Ed

    On Wed, Sep 29, 2010 at 11:56 PM, Rohan Rai wrote:

    Hi

    Which Hadoop/ PIg version are you using ??

    Regards
    Rohan


    ed wrote:
    Hello,

    I tested the newest push to the hirohanin elephant-bird branch (for pig
    0.7)
    and had an error when trying to use LzoTokenizedLoader with the following
    pig script:

    REGISTER elephant-bird-1.0.jar
    REGISTER /usr/lib/elephant-bird/lib/google-collect-1.0.jar
    A = load '/usr/foo/input/test_input_chars.txt.lzo' USING
    com.twitter.elephantbird.pig.load.LzoTokenizedLoader('\t');
    DUMP A;

    The error I get is in the mapper logs and is as follows:

    INFO com.hadoop.compression.lzo.GPLNativeCodeLoader: Loaded native gpl
    library
    INFO com.hadoop.compression.lzo.LzoCodec: Succesfully loaded & initialized
    native-lzo library
    INFO com.twitter.elephantbird.pig.load.LzoTokenizedLoader:
    LzoTokenizedLoader with given delimiter [ ]
    INFO com.twitter.elephantbird.mapreduce.input.LzoRecordReader: Seeking to
    split start at pos 0
    FATAL org.apache.hadoop.mapred.TaskTracker: Error running child :
    java.lang.NoSuchMethodError:

    org.apache.pig.backend.executionengine.mapReduceLayer.PigHadoopLogger.getTaskIOCContext()Lorg/apache/hadoop/mapreduce/TaskInputOutputContext;
    at com.twitter.elephantbird.pig.util.PigCounterHelper.getTIOC (Unknown
    Source)
    at com.twitter.elephantbird.pig.util.PigCounterHelper.incrCounter
    (Unknown Source)
    at com.twitter.elephantbird.pig.load.LzoBaseLoadFunc.incrCounter
    (Unknown Source)
    at com.twitter.elephantbird.pig.load.LzoTokenizedLoader.getNext
    (Unknown Source)
    at

    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.nextKeyValue
    (PigRecordReader.java:142)
    at

    org.apache.hadoop.mapred.MapTask$NewTrackignRecordReader.nextKeyValue(MapTask.java:423)
    at org.apache.hadoop.mapreduce.MapContext.nextKeyValue
    (MapContent.java:67)
    at org.apache.hadoop.mapreduce.Mapper.run (Mapper.java:143)
    at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
    at org.apache.hadoop.mapred.Child.main(Child.java:170)

    Do you think I'm forgetting some required library?

    Thank you!

    ~Ed

    On Tue, Sep 28, 2010 at 2:10 PM, ed wrote:


    Thank you Rohan, I really appreciate your help! I'll give it shot and
    post back if it works.

    ~Ed


    On Mon, Sep 27, 2010 at 11:51 PM, Rohan Rai <rohan.rai@inmobi.com>
    wrote:


    Just corrected/tested and pushed LzoTokenizedLoader to the personal fork
    Hopefully it works now


    Regards
    Rohan

    Dmitriy Ryaboy wrote:


    lzop should work.
    On Mon, Sep 27, 2010 at 10:59 AM, Rohan Rai <rohan.rai@inmobi.com>
    wrote:


    Well

    I haven't tried (rather I don't remember) compressing via lzop and
    then
    putting on cluster...
    So cant tell you about that...Here is what works for me.

    I do it by first putting the file on cluster and then doing Stream
    Compression.

    And yes it need not be indexed (I guess it doesn't matter for small
    test file, otherwise it is unwise
    for one loses the benefit of parallelism)

    Regards
    Rohan


    pig wrote:


    Hi Rohan,

    The test file (test_input_chars.txt.lzo) is not indexed. I created
    it
    using
    the command

    'lzop test_input_chars.txt'

    It's a really small file (only 6 lines) so I didn't think it needed
    to
    be
    index. Do all files regardless of size need to be indexed for the
    LzoTokenizedLoader to work?

    Thank you!

    ~Ed

    On Mon, Sep 27, 2010 at 1:25 AM, Rohan Rai <rohan.rai@inmobi.com>
    wrote:


    Oh Sorry I am completely out of sync...

    Can you tell how did you lzo'ed and indexed the file

    Regards
    Rohan

    Rohan Rai wrote:


    Oh Sorry I did not see this mail ...

    Its not an official patch/release

    But here is a fork on elephant-bird which works with pig 0.7
    for normal LZOText Loading etc

    (NOt HbaseLoader)

    Regards
    Rohan

    Dmitriy Ryaboy wrote:

    The 0.7 branch is not tested.. it's quite likely it doesn't
    actually
    work

    :).

    Rohan Rai was working on it.. Rohan, think you can take a look and

    help
    Ed
    out?

    Ed, you may want to check if the same input works when you use Pig
    0.6
    (and
    the official elephant-bird, on Kevin Weil's github).

    -D

    On Thu, Sep 23, 2010 at 6:49 AM, pig <hadoopnode@gmail.com>
    wrote:



    Hello,

    After getting all the errors to go away with LZO libraries not
    being

    found

    and missing jar files for elephant-bird I've run into a new
    problem
    when
    using the elephant-bird branch for pig 0.7

    The following simple pig script works as expected

    REGISTER elephant-bird-1.0.jar
    REGISTER /usr/lib/elephant-bird/lib/google-collect-1.0.jar
    A = load '/usr/foo/input/test_input_chars.txt';
    DUMP A;

    This just dumps out the contents of the test_input_chars.txt file
    which
    is
    tab delimited. The output looks like:

    (1,a,a,a,a,a,a)
    (2,b,b,b,b,b,b)
    (3,c,c,c,c,c,c)
    (4,d,d,d,d,d,d)
    (5,e,e,e,e,e,e)

    I then lzop the test file to get test_input_chars.txt.lzo (I
    decompressed
    this with lzop -d to make sure the compression worked fine and
    everything
    looks good).
    If I run the exact same script provided above on the lzo file it
    works
    fine. However, this file is really small and doesn't need to use
    indexes.
    As a result, I wanted to
    have LZO support that worked with indexes. Based on this I
    decided
    to
    try
    out the elephant-bird branch for pig 0.7 located here (
    http://github.com/hirohanin/elephant-bird/) as
    recommended by Dimitriy.

    I created the following pig script that mirrors the above script
    but
    should
    hopefully work on LZO files (including indexed ones)

    REGISTER elephant-bird-1.0.jar
    REGISTER /usr/lib/elephant-bird/lib/google-collect-1.0.jar
    A = load '/usr/foo/input/test_input_chars.txt.lzo' USING
    com.twitter.elephantbird.pig.load.LzoTokenizedLoader('\t');
    DUMP A;

    When I run this script which uses the LzoTokenizedLoader there is
    no
    output. The script appears to run without errors but there are
    zero
    Records
    Written and 0 Bytes Written.

    Here is the exact output:

    grunt > DUMP A;
    [main] INFO com.twitter.elephantbird.pig.load.LzoTokenizedLoader
    -
    LzoTokenizedLoader with given delimited [ ]
    [main] INFO com.twitter.elephantbird.pig.load.LzoTokenizedLoader
    -
    LzoTokenizedLoader with given delimited [ ]
    [main] INFO com.twitter.elephantbird.pig.load.LzoTokenizedLoader
    -
    LzoTokenizedLoader with given delimited [ ]
    [main] INFO
    org.apache.pig.backend.hadoop.executionengine.HExecutionEngine
    -
    (Name:





    Store(hdfs://master:9000/tmp/temp-2052828736/tmp-1533645117:org.apache.pig.builtin.BinStorage)
    - 1-4 Operator Key: 1-4
    [main] INFO





    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
    - MR plan size before optimization: 1
    [main] INFO





    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
    - MR plan size after optimization: 1
    [main] INFO





    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
    - mapred.job.reduce.markreset.buffer.percent is not set, set to
    default
    0.3
    [main] INFO





    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
    - Setting up single store job
    [main] INFO





    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
    - 1 map-reduce job(s) waiting for submission.
    [Thread-12] WARN org.apache.hadoop.mapred.JobClient - Use
    GenericOptionsParser for parsing the arguments. Applications
    should
    implement Tool for the same.
    [Thread-12] INFO
    com.twitter.elephantbird.pig.load.LzoTokenizedLoader
    -
    LzoTokenizedLoader with given delimiter [ ]
    [Thread-12] INFO
    org.apache.hadoop.mapreduce.lib.input.FileInputFormat
    -
    Total input paths to process : 1
    [main] INFO





    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
    - 0% complete
    [main] INFO





    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
    - HadoopJobId: job_201009101108_0151
    [main] INFO





    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
    - More information at
    http://master:50030/jobdetails.jsp?jobid=job_201009101108_0151
    [main] INFO





    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
    - 50% complete
    [main] INFO





    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
    - 100% complete
    [main] INFO





    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
    - Succesfully stored result in
    "hdfs://amb-hadoop-01:9000/tmp/temp-2052828736/tmp-1533645117
    [main] INFO





    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
    - Records written: 0
    [main] INFO





    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
    - Bytes written: 0
    [main] INFO





    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
    - Spillable Memory Manager spill count : 0
    [main] INFO





    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
    - Proactive spill count : 0
    [main] INFO





    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
    - Success!
    [main] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat
    -
    Total
    input paths to process: 1
    [main] INFO
    org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil -
    Total input paths to process: 1
    grunt >

    I'm not sure if I'm doing something wrong in my use of
    LzoTokenizedLoader
    or
    if there is a problem with the class itself (most likely the
    problem
    is
    with
    my code heh) Thank you for any help!

    ~Ed



    .



    The information contained in this communication is intended

    solely for
    the

    use of the individual or entity to whom it is addressed and
    others
    authorized to receive it. It may contain confidential or legally
    privileged
    information. If you are not the intended recipient you are hereby
    notified
    that any disclosure, copying, distribution or taking any action in
    reliance
    on the contents of this information is strictly prohibited and may
    be
    unlawful. If you have received this communication in error, please
    notify us
    immediately by responding to this email and then delete it from
    your
    system.
    The firm is neither liable for the proper and complete transmission
    of
    the
    information contained in this communication nor for any delay in
    its
    receipt.
    .




    The information contained in this communication is intended solely
    for

    the

    use of the individual or entity to whom it is addressed and others
    authorized to receive it. It may contain confidential or legally
    privileged
    information. If you are not the intended recipient you are hereby
    notified
    that any disclosure, copying, distribution or taking any action in
    reliance
    on the contents of this information is strictly prohibited and may
    be
    unlawful. If you have received this communication in error, please
    notify
    us
    immediately by responding to this email and then delete it from your
    system.
    The firm is neither liable for the proper and complete transmission
    of
    the
    information contained in this communication nor for any delay in its
    receipt.


    .



    The information contained in this communication is intended solely
    for

    the
    use of the individual or entity to whom it is addressed and others
    authorized to receive it. It may contain confidential or legally
    privileged
    information. If you are not the intended recipient you are hereby
    notified
    that any disclosure, copying, distribution or taking any action in
    reliance
    on the contents of this information is strictly prohibited and may be
    unlawful. If you have received this communication in error, please
    notify us
    immediately by responding to this email and then delete it from your
    system.
    The firm is neither liable for the proper and complete transmission of
    the
    information contained in this communication nor for any delay in its
    receipt.


    .
    The information contained in this communication is intended solely for
    the
    use of the individual or entity to whom it is addressed and others
    authorized to receive it. It may contain confidential or legally
    privileged
    information. If you are not the intended recipient you are hereby
    notified
    that any disclosure, copying, distribution or taking any action in
    reliance
    on the contents of this information is strictly prohibited and may be
    unlawful. If you have received this communication in error, please
    notify us
    immediately by responding to this email and then delete it from your
    system.
    The firm is neither liable for the proper and complete transmission of
    the
    information contained in this communication nor for any delay in its
    receipt.

    .
    The information contained in this communication is intended solely for the
    use of the individual or entity to whom it is addressed and others
    authorized to receive it. It may contain confidential or legally privileged
    information. If you are not the intended recipient you are hereby notified
    that any disclosure, copying, distribution or taking any action in reliance
    on the contents of this information is strictly prohibited and may be
    unlawful. If you have received this communication in error, please notify us
    immediately by responding to this email and then delete it from your system.
    The firm is neither liable for the proper and complete transmission of the
    information contained in this communication nor for any delay in its
    receipt.
  • Rohan Rai at Sep 30, 2010 at 12:48 pm
    The validation done by me was on
    Apache 0.20.2 and Apache Pig 0.7..

    I haven't tried it with Cloudera's version.

    Can we verify that it doesnt work with them too

    Regards
    Rohan

    ed wrote:
    Hello,

    I'm using Cloudera's Hadoop CDH3B2--Hadoop-0.20.2+320 (based on Apache
    Hadoop 20.2) with Pig 0.7 (from Cloudera's distro).

    Thank you!

    ~Ed


    On Wed, Sep 29, 2010 at 11:56 PM, Rohan Rai wrote:

    Hi

    Which Hadoop/ PIg version are you using ??

    Regards
    Rohan


    ed wrote:

    Hello,

    I tested the newest push to the hirohanin elephant-bird branch (for pig
    0.7)
    and had an error when trying to use LzoTokenizedLoader with the following
    pig script:

    REGISTER elephant-bird-1.0.jar
    REGISTER /usr/lib/elephant-bird/lib/google-collect-1.0.jar
    A = load '/usr/foo/input/test_input_chars.txt.lzo' USING
    com.twitter.elephantbird.pig.load.LzoTokenizedLoader('\t');
    DUMP A;

    The error I get is in the mapper logs and is as follows:

    INFO com.hadoop.compression.lzo.GPLNativeCodeLoader: Loaded native gpl
    library
    INFO com.hadoop.compression.lzo.LzoCodec: Succesfully loaded & initialized
    native-lzo library
    INFO com.twitter.elephantbird.pig.load.LzoTokenizedLoader:
    LzoTokenizedLoader with given delimiter [ ]
    INFO com.twitter.elephantbird.mapreduce.input.LzoRecordReader: Seeking to
    split start at pos 0
    FATAL org.apache.hadoop.mapred.TaskTracker: Error running child :
    java.lang.NoSuchMethodError:

    org.apache.pig.backend.executionengine.mapReduceLayer.PigHadoopLogger.getTaskIOCContext()Lorg/apache/hadoop/mapreduce/TaskInputOutputContext;
    at com.twitter.elephantbird.pig.util.PigCounterHelper.getTIOC (Unknown
    Source)
    at com.twitter.elephantbird.pig.util.PigCounterHelper.incrCounter
    (Unknown Source)
    at com.twitter.elephantbird.pig.load.LzoBaseLoadFunc.incrCounter
    (Unknown Source)
    at com.twitter.elephantbird.pig.load.LzoTokenizedLoader.getNext
    (Unknown Source)
    at

    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.nextKeyValue
    (PigRecordReader.java:142)
    at

    org.apache.hadoop.mapred.MapTask$NewTrackignRecordReader.nextKeyValue(MapTask.java:423)
    at org.apache.hadoop.mapreduce.MapContext.nextKeyValue
    (MapContent.java:67)
    at org.apache.hadoop.mapreduce.Mapper.run (Mapper.java:143)
    at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
    at org.apache.hadoop.mapred.Child.main(Child.java:170)

    Do you think I'm forgetting some required library?

    Thank you!

    ~Ed

    On Tue, Sep 28, 2010 at 2:10 PM, ed wrote:


    Thank you Rohan, I really appreciate your help! I'll give it shot and
    post back if it works.

    ~Ed


    On Mon, Sep 27, 2010 at 11:51 PM, Rohan Rai <rohan.rai@inmobi.com>
    wrote:


    Just corrected/tested and pushed LzoTokenizedLoader to the personal fork
    Hopefully it works now


    Regards
    Rohan

    Dmitriy Ryaboy wrote:


    lzop should work.
    On Mon, Sep 27, 2010 at 10:59 AM, Rohan Rai <rohan.rai@inmobi.com>
    wrote:


    Well

    I haven't tried (rather I don't remember) compressing via lzop and
    then
    putting on cluster...
    So cant tell you about that...Here is what works for me.

    I do it by first putting the file on cluster and then doing Stream
    Compression.

    And yes it need not be indexed (I guess it doesn't matter for small
    test file, otherwise it is unwise
    for one loses the benefit of parallelism)

    Regards
    Rohan


    pig wrote:


    Hi Rohan,

    The test file (test_input_chars.txt.lzo) is not indexed. I created
    it
    using
    the command

    'lzop test_input_chars.txt'

    It's a really small file (only 6 lines) so I didn't think it needed
    to
    be
    index. Do all files regardless of size need to be indexed for the
    LzoTokenizedLoader to work?

    Thank you!

    ~Ed

    On Mon, Sep 27, 2010 at 1:25 AM, Rohan Rai <rohan.rai@inmobi.com>
    wrote:


    Oh Sorry I am completely out of sync...

    Can you tell how did you lzo'ed and indexed the file

    Regards
    Rohan

    Rohan Rai wrote:


    Oh Sorry I did not see this mail ...

    Its not an official patch/release

    But here is a fork on elephant-bird which works with pig 0.7
    for normal LZOText Loading etc

    (NOt HbaseLoader)

    Regards
    Rohan

    Dmitriy Ryaboy wrote:

    The 0.7 branch is not tested.. it's quite likely it doesn't
    actually
    work

    :).

    Rohan Rai was working on it.. Rohan, think you can take a look and

    help
    Ed
    out?

    Ed, you may want to check if the same input works when you use Pig
    0.6
    (and
    the official elephant-bird, on Kevin Weil's github).

    -D

    On Thu, Sep 23, 2010 at 6:49 AM, pig <hadoopnode@gmail.com>
    wrote:



    Hello,

    After getting all the errors to go away with LZO libraries not
    being

    found

    and missing jar files for elephant-bird I've run into a new
    problem
    when
    using the elephant-bird branch for pig 0.7

    The following simple pig script works as expected

    REGISTER elephant-bird-1.0.jar
    REGISTER /usr/lib/elephant-bird/lib/google-collect-1.0.jar
    A = load '/usr/foo/input/test_input_chars.txt';
    DUMP A;

    This just dumps out the contents of the test_input_chars.txt file
    which
    is
    tab delimited. The output looks like:

    (1,a,a,a,a,a,a)
    (2,b,b,b,b,b,b)
    (3,c,c,c,c,c,c)
    (4,d,d,d,d,d,d)
    (5,e,e,e,e,e,e)

    I then lzop the test file to get test_input_chars.txt.lzo (I
    decompressed
    this with lzop -d to make sure the compression worked fine and
    everything
    looks good).
    If I run the exact same script provided above on the lzo file it
    works
    fine. However, this file is really small and doesn't need to use
    indexes.
    As a result, I wanted to
    have LZO support that worked with indexes. Based on this I
    decided
    to
    try
    out the elephant-bird branch for pig 0.7 located here (
    http://github.com/hirohanin/elephant-bird/) as
    recommended by Dimitriy.

    I created the following pig script that mirrors the above script
    but
    should
    hopefully work on LZO files (including indexed ones)

    REGISTER elephant-bird-1.0.jar
    REGISTER /usr/lib/elephant-bird/lib/google-collect-1.0.jar
    A = load '/usr/foo/input/test_input_chars.txt.lzo' USING
    com.twitter.elephantbird.pig.load.LzoTokenizedLoader('\t');
    DUMP A;

    When I run this script which uses the LzoTokenizedLoader there is
    no
    output. The script appears to run without errors but there are
    zero
    Records
    Written and 0 Bytes Written.

    Here is the exact output:

    grunt > DUMP A;
    [main] INFO com.twitter.elephantbird.pig.load.LzoTokenizedLoader
    -
    LzoTokenizedLoader with given delimited [ ]
    [main] INFO com.twitter.elephantbird.pig.load.LzoTokenizedLoader
    -
    LzoTokenizedLoader with given delimited [ ]
    [main] INFO com.twitter.elephantbird.pig.load.LzoTokenizedLoader
    -
    LzoTokenizedLoader with given delimited [ ]
    [main] INFO
    org.apache.pig.backend.hadoop.executionengine.HExecutionEngine
    -
    (Name:





    Store(hdfs://master:9000/tmp/temp-2052828736/tmp-1533645117:org.apache.pig.builtin.BinStorage)
    - 1-4 Operator Key: 1-4
    [main] INFO





    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
    - MR plan size before optimization: 1
    [main] INFO





    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
    - MR plan size after optimization: 1
    [main] INFO





    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
    - mapred.job.reduce.markreset.buffer.percent is not set, set to
    default
    0.3
    [main] INFO





    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
    - Setting up single store job
    [main] INFO





    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
    - 1 map-reduce job(s) waiting for submission.
    [Thread-12] WARN org.apache.hadoop.mapred.JobClient - Use
    GenericOptionsParser for parsing the arguments. Applications
    should
    implement Tool for the same.
    [Thread-12] INFO
    com.twitter.elephantbird.pig.load.LzoTokenizedLoader
    -
    LzoTokenizedLoader with given delimiter [ ]
    [Thread-12] INFO
    org.apache.hadoop.mapreduce.lib.input.FileInputFormat
    -
    Total input paths to process : 1
    [main] INFO





    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
    - 0% complete
    [main] INFO





    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
    - HadoopJobId: job_201009101108_0151
    [main] INFO





    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
    - More information at
    http://master:50030/jobdetails.jsp?jobid=job_201009101108_0151
    [main] INFO





    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
    - 50% complete
    [main] INFO





    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
    - 100% complete
    [main] INFO





    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
    - Succesfully stored result in
    "hdfs://amb-hadoop-01:9000/tmp/temp-2052828736/tmp-1533645117
    [main] INFO





    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
    - Records written: 0
    [main] INFO





    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
    - Bytes written: 0
    [main] INFO





    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
    - Spillable Memory Manager spill count : 0
    [main] INFO





    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
    - Proactive spill count : 0
    [main] INFO





    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
    - Success!
    [main] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat
    -
    Total
    input paths to process: 1
    [main] INFO
    org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil -
    Total input paths to process: 1
    grunt >

    I'm not sure if I'm doing something wrong in my use of
    LzoTokenizedLoader
    or
    if there is a problem with the class itself (most likely the
    problem
    is
    with
    my code heh) Thank you for any help!

    ~Ed



    .



    The information contained in this communication is intended

    solely for
    the

    use of the individual or entity to whom it is addressed and
    others
    authorized to receive it. It may contain confidential or legally
    privileged
    information. If you are not the intended recipient you are hereby
    notified
    that any disclosure, copying, distribution or taking any action in
    reliance
    on the contents of this information is strictly prohibited and may
    be
    unlawful. If you have received this communication in error, please
    notify us
    immediately by responding to this email and then delete it from
    your
    system.
    The firm is neither liable for the proper and complete transmission
    of
    the
    information contained in this communication nor for any delay in
    its
    receipt.
    .




    The information contained in this communication is intended solely
    for

    the

    use of the individual or entity to whom it is addressed and others
    authorized to receive it. It may contain confidential or legally
    privileged
    information. If you are not the intended recipient you are hereby
    notified
    that any disclosure, copying, distribution or taking any action in
    reliance
    on the contents of this information is strictly prohibited and may
    be
    unlawful. If you have received this communication in error, please
    notify
    us
    immediately by responding to this email and then delete it from your
    system.
    The firm is neither liable for the proper and complete transmission
    of
    the
    information contained in this communication nor for any delay in its
    receipt.


    .



    The information contained in this communication is intended solely
    for

    the
    use of the individual or entity to whom it is addressed and others
    authorized to receive it. It may contain confidential or legally
    privileged
    information. If you are not the intended recipient you are hereby
    notified
    that any disclosure, copying, distribution or taking any action in
    reliance
    on the contents of this information is strictly prohibited and may be
    unlawful. If you have received this communication in error, please
    notify us
    immediately by responding to this email and then delete it from your
    system.
    The firm is neither liable for the proper and complete transmission of
    the
    information contained in this communication nor for any delay in its
    receipt.


    .

    The information contained in this communication is intended solely for
    the
    use of the individual or entity to whom it is addressed and others
    authorized to receive it. It may contain confidential or legally
    privileged
    information. If you are not the intended recipient you are hereby
    notified
    that any disclosure, copying, distribution or taking any action in
    reliance
    on the contents of this information is strictly prohibited and may be
    unlawful. If you have received this communication in error, please
    notify us
    immediately by responding to this email and then delete it from your
    system.
    The firm is neither liable for the proper and complete transmission of
    the
    information contained in this communication nor for any delay in its
    receipt.


    .
    The information contained in this communication is intended solely for the
    use of the individual or entity to whom it is addressed and others
    authorized to receive it. It may contain confidential or legally privileged
    information. If you are not the intended recipient you are hereby notified
    that any disclosure, copying, distribution or taking any action in reliance
    on the contents of this information is strictly prohibited and may be
    unlawful. If you have received this communication in error, please notify us
    immediately by responding to this email and then delete it from your system.
    The firm is neither liable for the proper and complete transmission of the
    information contained in this communication nor for any delay in its
    receipt.
    .

    The information contained in this communication is intended solely for the use of the individual or entity to whom it is addressed and others authorized to receive it. It may contain confidential or legally privileged information. If you are not the intended recipient you are hereby notified that any disclosure, copying, distribution or taking any action in reliance on the contents of this information is strictly prohibited and may be unlawful. If you have received this communication in error, please notify us immediately by responding to this email and then delete it from your system. The firm is neither liable for the proper and complete transmission of the information contained in this communication nor for any delay in its receipt.
  • Ed at Sep 30, 2010 at 12:52 pm
    Hmm,

    It'd be good to find one other person having problems with it on Cloudera's
    distro as I could just be making a mistake somewhere and it has nothing to
    do with the fact that I'm using cloudera's distro.

    ~Ed
    On Thu, Sep 30, 2010 at 8:47 AM, Rohan Rai wrote:

    The validation done by me was on
    Apache 0.20.2 and Apache Pig 0.7..

    I haven't tried it with Cloudera's version.

    Can we verify that it doesnt work with them too


    Regards
    Rohan

    ed wrote:
    Hello,

    I'm using Cloudera's Hadoop CDH3B2--Hadoop-0.20.2+320 (based on Apache
    Hadoop 20.2) with Pig 0.7 (from Cloudera's distro).

    Thank you!

    ~Ed


    On Wed, Sep 29, 2010 at 11:56 PM, Rohan Rai wrote:


    Hi
    Which Hadoop/ PIg version are you using ??

    Regards
    Rohan


    ed wrote:


    Hello,
    I tested the newest push to the hirohanin elephant-bird branch (for pig
    0.7)
    and had an error when trying to use LzoTokenizedLoader with the
    following
    pig script:

    REGISTER elephant-bird-1.0.jar
    REGISTER /usr/lib/elephant-bird/lib/google-collect-1.0.jar
    A = load '/usr/foo/input/test_input_chars.txt.lzo' USING
    com.twitter.elephantbird.pig.load.LzoTokenizedLoader('\t');
    DUMP A;

    The error I get is in the mapper logs and is as follows:

    INFO com.hadoop.compression.lzo.GPLNativeCodeLoader: Loaded native gpl
    library
    INFO com.hadoop.compression.lzo.LzoCodec: Succesfully loaded &
    initialized
    native-lzo library
    INFO com.twitter.elephantbird.pig.load.LzoTokenizedLoader:
    LzoTokenizedLoader with given delimiter [ ]
    INFO com.twitter.elephantbird.mapreduce.input.LzoRecordReader: Seeking
    to
    split start at pos 0
    FATAL org.apache.hadoop.mapred.TaskTracker: Error running child :
    java.lang.NoSuchMethodError:


    org.apache.pig.backend.executionengine.mapReduceLayer.PigHadoopLogger.getTaskIOCContext()Lorg/apache/hadoop/mapreduce/TaskInputOutputContext;
    at com.twitter.elephantbird.pig.util.PigCounterHelper.getTIOC
    (Unknown
    Source)
    at com.twitter.elephantbird.pig.util.PigCounterHelper.incrCounter
    (Unknown Source)
    at com.twitter.elephantbird.pig.load.LzoBaseLoadFunc.incrCounter
    (Unknown Source)
    at com.twitter.elephantbird.pig.load.LzoTokenizedLoader.getNext
    (Unknown Source)
    at


    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.nextKeyValue
    (PigRecordReader.java:142)
    at


    org.apache.hadoop.mapred.MapTask$NewTrackignRecordReader.nextKeyValue(MapTask.java:423)
    at org.apache.hadoop.mapreduce.MapContext.nextKeyValue
    (MapContent.java:67)
    at org.apache.hadoop.mapreduce.Mapper.run (Mapper.java:143)
    at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
    at org.apache.hadoop.mapred.Child.main(Child.java:170)

    Do you think I'm forgetting some required library?

    Thank you!

    ~Ed

    On Tue, Sep 28, 2010 at 2:10 PM, ed wrote:


    Thank you Rohan, I really appreciate your help! I'll give it shot and

    post back if it works.
    ~Ed


    On Mon, Sep 27, 2010 at 11:51 PM, Rohan Rai <rohan.rai@inmobi.com>
    wrote:


    Just corrected/tested and pushed LzoTokenizedLoader to the personal
    fork

    Hopefully it works now

    Regards
    Rohan

    Dmitriy Ryaboy wrote:


    lzop should work.

    On Mon, Sep 27, 2010 at 10:59 AM, Rohan Rai <rohan.rai@inmobi.com>
    wrote:


    Well

    I haven't tried (rather I don't remember) compressing via lzop and

    then
    putting on cluster...
    So cant tell you about that...Here is what works for me.

    I do it by first putting the file on cluster and then doing Stream
    Compression.

    And yes it need not be indexed (I guess it doesn't matter for small
    test file, otherwise it is unwise
    for one loses the benefit of parallelism)

    Regards
    Rohan


    pig wrote:


    Hi Rohan,

    The test file (test_input_chars.txt.lzo) is not indexed. I created

    it
    using
    the command

    'lzop test_input_chars.txt'

    It's a really small file (only 6 lines) so I didn't think it needed
    to
    be
    index. Do all files regardless of size need to be indexed for the
    LzoTokenizedLoader to work?

    Thank you!

    ~Ed

    On Mon, Sep 27, 2010 at 1:25 AM, Rohan Rai <rohan.rai@inmobi.com>
    wrote:


    Oh Sorry I am completely out of sync...

    Can you tell how did you lzo'ed and indexed the file

    Regards

    Rohan
    Rohan Rai wrote:


    Oh Sorry I did not see this mail ...

    Its not an official patch/release

    But here is a fork on elephant-bird which works with pig 0.7

    for normal LZOText Loading etc
    (NOt HbaseLoader)

    Regards
    Rohan

    Dmitriy Ryaboy wrote:

    The 0.7 branch is not tested.. it's quite likely it doesn't
    actually
    work

    :).

    Rohan Rai was working on it.. Rohan, think you can take a look
    and

    help

    Ed
    out?

    Ed, you may want to check if the same input works when you use
    Pig
    0.6
    (and
    the official elephant-bird, on Kevin Weil's github).

    -D

    On Thu, Sep 23, 2010 at 6:49 AM, pig <hadoopnode@gmail.com>
    wrote:



    Hello,

    After getting all the errors to go away with LZO libraries not
    being

    found

    and missing jar files for elephant-bird I've run into a new

    problem
    when
    using the elephant-bird branch for pig 0.7

    The following simple pig script works as expected

    REGISTER elephant-bird-1.0.jar
    REGISTER /usr/lib/elephant-bird/lib/google-collect-1.0.jar
    A = load '/usr/foo/input/test_input_chars.txt';
    DUMP A;

    This just dumps out the contents of the test_input_chars.txt
    file
    which
    is
    tab delimited. The output looks like:

    (1,a,a,a,a,a,a)
    (2,b,b,b,b,b,b)
    (3,c,c,c,c,c,c)
    (4,d,d,d,d,d,d)
    (5,e,e,e,e,e,e)

    I then lzop the test file to get test_input_chars.txt.lzo (I
    decompressed
    this with lzop -d to make sure the compression worked fine and
    everything
    looks good).
    If I run the exact same script provided above on the lzo file
    it
    works
    fine. However, this file is really small and doesn't need to
    use
    indexes.
    As a result, I wanted to
    have LZO support that worked with indexes. Based on this I
    decided
    to
    try
    out the elephant-bird branch for pig 0.7 located here (
    http://github.com/hirohanin/elephant-bird/) as
    recommended by Dimitriy.

    I created the following pig script that mirrors the above
    script
    but
    should
    hopefully work on LZO files (including indexed ones)

    REGISTER elephant-bird-1.0.jar
    REGISTER /usr/lib/elephant-bird/lib/google-collect-1.0.jar
    A = load '/usr/foo/input/test_input_chars.txt.lzo' USING
    com.twitter.elephantbird.pig.load.LzoTokenizedLoader('\t');
    DUMP A;

    When I run this script which uses the LzoTokenizedLoader there
    is
    no
    output. The script appears to run without errors but there are
    zero
    Records
    Written and 0 Bytes Written.

    Here is the exact output:

    grunt > DUMP A;
    [main] INFO
    com.twitter.elephantbird.pig.load.LzoTokenizedLoader
    -
    LzoTokenizedLoader with given delimited [ ]
    [main] INFO
    com.twitter.elephantbird.pig.load.LzoTokenizedLoader
    -
    LzoTokenizedLoader with given delimited [ ]
    [main] INFO
    com.twitter.elephantbird.pig.load.LzoTokenizedLoader
    -
    LzoTokenizedLoader with given delimited [ ]
    [main] INFO
    org.apache.pig.backend.hadoop.executionengine.HExecutionEngine
    -
    (Name:






    Store(hdfs://master:9000/tmp/temp-2052828736/tmp-1533645117:org.apache.pig.builtin.BinStorage)
    - 1-4 Operator Key: 1-4
    [main] INFO






    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
    - MR plan size before optimization: 1
    [main] INFO






    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
    - MR plan size after optimization: 1
    [main] INFO






    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
    - mapred.job.reduce.markreset.buffer.percent is not set, set to
    default
    0.3
    [main] INFO






    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
    - Setting up single store job
    [main] INFO






    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
    - 1 map-reduce job(s) waiting for submission.
    [Thread-12] WARN org.apache.hadoop.mapred.JobClient - Use
    GenericOptionsParser for parsing the arguments. Applications
    should
    implement Tool for the same.
    [Thread-12] INFO
    com.twitter.elephantbird.pig.load.LzoTokenizedLoader
    -
    LzoTokenizedLoader with given delimiter [ ]
    [Thread-12] INFO
    org.apache.hadoop.mapreduce.lib.input.FileInputFormat
    -
    Total input paths to process : 1
    [main] INFO






    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
    - 0% complete
    [main] INFO






    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
    - HadoopJobId: job_201009101108_0151
    [main] INFO






    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
    - More information at
    http://master:50030/jobdetails.jsp?jobid=job_201009101108_0151
    [main] INFO






    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
    - 50% complete
    [main] INFO






    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
    - 100% complete
    [main] INFO






    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
    - Succesfully stored result in
    "hdfs://amb-hadoop-01:9000/tmp/temp-2052828736/tmp-1533645117
    [main] INFO






    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
    - Records written: 0
    [main] INFO






    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
    - Bytes written: 0
    [main] INFO






    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
    - Spillable Memory Manager spill count : 0
    [main] INFO






    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
    - Proactive spill count : 0
    [main] INFO






    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
    - Success!
    [main] INFO
    org.apache.hadoop.mapreduce.lib.input.FileInputFormat
    -
    Total
    input paths to process: 1
    [main] INFO
    org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil -
    Total input paths to process: 1
    grunt >

    I'm not sure if I'm doing something wrong in my use of
    LzoTokenizedLoader
    or
    if there is a problem with the class itself (most likely the
    problem
    is
    with
    my code heh) Thank you for any help!

    ~Ed



    .



    The information contained in this communication is intended

    solely for

    the
    use of the individual or entity to whom it is addressed and

    others
    authorized to receive it. It may contain confidential or legally
    privileged
    information. If you are not the intended recipient you are hereby
    notified
    that any disclosure, copying, distribution or taking any action
    in
    reliance
    on the contents of this information is strictly prohibited and
    may
    be
    unlawful. If you have received this communication in error,
    please
    notify us
    immediately by responding to this email and then delete it from
    your
    system.
    The firm is neither liable for the proper and complete
    transmission
    of
    the
    information contained in this communication nor for any delay in
    its
    receipt.
    .




    The information contained in this communication is intended
    solely
    for

    the

    use of the individual or entity to whom it is addressed and
    others

    authorized to receive it. It may contain confidential or legally
    privileged
    information. If you are not the intended recipient you are hereby
    notified
    that any disclosure, copying, distribution or taking any action in
    reliance
    on the contents of this information is strictly prohibited and may
    be
    unlawful. If you have received this communication in error, please
    notify
    us
    immediately by responding to this email and then delete it from
    your
    system.
    The firm is neither liable for the proper and complete
    transmission
    of
    the
    information contained in this communication nor for any delay in
    its
    receipt.


    .



    The information contained in this communication is intended
    solely

    for
    the

    use of the individual or entity to whom it is addressed and others
    authorized to receive it. It may contain confidential or legally
    privileged
    information. If you are not the intended recipient you are hereby
    notified
    that any disclosure, copying, distribution or taking any action in
    reliance
    on the contents of this information is strictly prohibited and may
    be
    unlawful. If you have received this communication in error, please
    notify us
    immediately by responding to this email and then delete it from your
    system.
    The firm is neither liable for the proper and complete transmission
    of
    the
    information contained in this communication nor for any delay in its
    receipt.


    .



    The information contained in this communication is intended solely
    for

    the
    use of the individual or entity to whom it is addressed and others
    authorized to receive it. It may contain confidential or legally
    privileged
    information. If you are not the intended recipient you are hereby
    notified
    that any disclosure, copying, distribution or taking any action in
    reliance
    on the contents of this information is strictly prohibited and may be
    unlawful. If you have received this communication in error, please
    notify us
    immediately by responding to this email and then delete it from your
    system.
    The firm is neither liable for the proper and complete transmission of
    the
    information contained in this communication nor for any delay in its
    receipt.




    .
    The information contained in this communication is intended solely for
    the
    use of the individual or entity to whom it is addressed and others
    authorized to receive it. It may contain confidential or legally
    privileged
    information. If you are not the intended recipient you are hereby
    notified
    that any disclosure, copying, distribution or taking any action in
    reliance
    on the contents of this information is strictly prohibited and may be
    unlawful. If you have received this communication in error, please notify
    us
    immediately by responding to this email and then delete it from your
    system.
    The firm is neither liable for the proper and complete transmission of
    the
    information contained in this communication nor for any delay in its
    receipt.


    .
    The information contained in this communication is intended solely for the
    use of the individual or entity to whom it is addressed and others
    authorized to receive it. It may contain confidential or legally privileged
    information. If you are not the intended recipient you are hereby notified
    that any disclosure, copying, distribution or taking any action in reliance
    on the contents of this information is strictly prohibited and may be
    unlawful. If you have received this communication in error, please notify us
    immediately by responding to this email and then delete it from your system.
    The firm is neither liable for the proper and complete transmission of the
    information contained in this communication nor for any delay in its
    receipt.
  • Dmitriy Ryaboy at Sep 30, 2010 at 9:01 pm
    Rohan, there was a regression in Pig 7 that made the counters not work, I
    fixed that in Pig 8 (there are now explicit methods Pig provides for using
    counters, instead of the hacky things PigCounterHelper was doing in
    elephant-bird). Did you do anything with the counters when migrating to
    0.7?

    -D
    On Thu, Sep 30, 2010 at 5:52 AM, ed wrote:

    Hmm,

    It'd be good to find one other person having problems with it on Cloudera's
    distro as I could just be making a mistake somewhere and it has nothing to
    do with the fact that I'm using cloudera's distro.

    ~Ed
    On Thu, Sep 30, 2010 at 8:47 AM, Rohan Rai wrote:

    The validation done by me was on
    Apache 0.20.2 and Apache Pig 0.7..

    I haven't tried it with Cloudera's version.

    Can we verify that it doesnt work with them too


    Regards
    Rohan

    ed wrote:
    Hello,

    I'm using Cloudera's Hadoop CDH3B2--Hadoop-0.20.2+320 (based on Apache
    Hadoop 20.2) with Pig 0.7 (from Cloudera's distro).

    Thank you!

    ~Ed


    On Wed, Sep 29, 2010 at 11:56 PM, Rohan Rai wrote:


    Hi
    Which Hadoop/ PIg version are you using ??

    Regards
    Rohan


    ed wrote:


    Hello,
    I tested the newest push to the hirohanin elephant-bird branch (for
    pig
    0.7)
    and had an error when trying to use LzoTokenizedLoader with the
    following
    pig script:

    REGISTER elephant-bird-1.0.jar
    REGISTER /usr/lib/elephant-bird/lib/google-collect-1.0.jar
    A = load '/usr/foo/input/test_input_chars.txt.lzo' USING
    com.twitter.elephantbird.pig.load.LzoTokenizedLoader('\t');
    DUMP A;

    The error I get is in the mapper logs and is as follows:

    INFO com.hadoop.compression.lzo.GPLNativeCodeLoader: Loaded native gpl
    library
    INFO com.hadoop.compression.lzo.LzoCodec: Succesfully loaded &
    initialized
    native-lzo library
    INFO com.twitter.elephantbird.pig.load.LzoTokenizedLoader:
    LzoTokenizedLoader with given delimiter [ ]
    INFO com.twitter.elephantbird.mapreduce.input.LzoRecordReader: Seeking
    to
    split start at pos 0
    FATAL org.apache.hadoop.mapred.TaskTracker: Error running child :
    java.lang.NoSuchMethodError:

    org.apache.pig.backend.executionengine.mapReduceLayer.PigHadoopLogger.getTaskIOCContext()Lorg/apache/hadoop/mapreduce/TaskInputOutputContext;
    at com.twitter.elephantbird.pig.util.PigCounterHelper.getTIOC
    (Unknown
    Source)
    at com.twitter.elephantbird.pig.util.PigCounterHelper.incrCounter
    (Unknown Source)
    at com.twitter.elephantbird.pig.load.LzoBaseLoadFunc.incrCounter
    (Unknown Source)
    at com.twitter.elephantbird.pig.load.LzoTokenizedLoader.getNext
    (Unknown Source)
    at

    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.nextKeyValue
    (PigRecordReader.java:142)
    at

    org.apache.hadoop.mapred.MapTask$NewTrackignRecordReader.nextKeyValue(MapTask.java:423)
    at org.apache.hadoop.mapreduce.MapContext.nextKeyValue
    (MapContent.java:67)
    at org.apache.hadoop.mapreduce.Mapper.run (Mapper.java:143)
    at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
    at org.apache.hadoop.mapred.Child.main(Child.java:170)

    Do you think I'm forgetting some required library?

    Thank you!

    ~Ed

    On Tue, Sep 28, 2010 at 2:10 PM, ed wrote:


    Thank you Rohan, I really appreciate your help! I'll give it shot
    and
    post back if it works.
    ~Ed


    On Mon, Sep 27, 2010 at 11:51 PM, Rohan Rai <rohan.rai@inmobi.com>
    wrote:


    Just corrected/tested and pushed LzoTokenizedLoader to the personal
    fork

    Hopefully it works now

    Regards
    Rohan

    Dmitriy Ryaboy wrote:


    lzop should work.

    On Mon, Sep 27, 2010 at 10:59 AM, Rohan Rai <rohan.rai@inmobi.com>
    wrote:


    Well

    I haven't tried (rather I don't remember) compressing via lzop and

    then
    putting on cluster...
    So cant tell you about that...Here is what works for me.

    I do it by first putting the file on cluster and then doing Stream
    Compression.

    And yes it need not be indexed (I guess it doesn't matter for
    small
    test file, otherwise it is unwise
    for one loses the benefit of parallelism)

    Regards
    Rohan


    pig wrote:


    Hi Rohan,

    The test file (test_input_chars.txt.lzo) is not indexed. I
    created
    it
    using
    the command

    'lzop test_input_chars.txt'

    It's a really small file (only 6 lines) so I didn't think it
    needed
    to
    be
    index. Do all files regardless of size need to be indexed for
    the
    LzoTokenizedLoader to work?

    Thank you!

    ~Ed

    On Mon, Sep 27, 2010 at 1:25 AM, Rohan Rai <rohan.rai@inmobi.com
    wrote:


    Oh Sorry I am completely out of sync...

    Can you tell how did you lzo'ed and indexed the file

    Regards

    Rohan
    Rohan Rai wrote:


    Oh Sorry I did not see this mail ...

    Its not an official patch/release

    But here is a fork on elephant-bird which works with pig 0.7

    for normal LZOText Loading etc
    (NOt HbaseLoader)

    Regards
    Rohan

    Dmitriy Ryaboy wrote:

    The 0.7 branch is not tested.. it's quite likely it doesn't
    actually
    work

    :).

    Rohan Rai was working on it.. Rohan, think you can take a look
    and

    help

    Ed
    out?

    Ed, you may want to check if the same input works when you use
    Pig
    0.6
    (and
    the official elephant-bird, on Kevin Weil's github).

    -D

    On Thu, Sep 23, 2010 at 6:49 AM, pig <hadoopnode@gmail.com>
    wrote:



    Hello,

    After getting all the errors to go away with LZO libraries
    not
    being

    found

    and missing jar files for elephant-bird I've run into a new

    problem
    when
    using the elephant-bird branch for pig 0.7

    The following simple pig script works as expected

    REGISTER elephant-bird-1.0.jar
    REGISTER /usr/lib/elephant-bird/lib/google-collect-1.0.jar
    A = load '/usr/foo/input/test_input_chars.txt';
    DUMP A;

    This just dumps out the contents of the test_input_chars.txt
    file
    which
    is
    tab delimited. The output looks like:

    (1,a,a,a,a,a,a)
    (2,b,b,b,b,b,b)
    (3,c,c,c,c,c,c)
    (4,d,d,d,d,d,d)
    (5,e,e,e,e,e,e)

    I then lzop the test file to get test_input_chars.txt.lzo (I
    decompressed
    this with lzop -d to make sure the compression worked fine
    and
    everything
    looks good).
    If I run the exact same script provided above on the lzo file
    it
    works
    fine. However, this file is really small and doesn't need to
    use
    indexes.
    As a result, I wanted to
    have LZO support that worked with indexes. Based on this I
    decided
    to
    try
    out the elephant-bird branch for pig 0.7 located here (
    http://github.com/hirohanin/elephant-bird/) as
    recommended by Dimitriy.

    I created the following pig script that mirrors the above
    script
    but
    should
    hopefully work on LZO files (including indexed ones)

    REGISTER elephant-bird-1.0.jar
    REGISTER /usr/lib/elephant-bird/lib/google-collect-1.0.jar
    A = load '/usr/foo/input/test_input_chars.txt.lzo' USING
    com.twitter.elephantbird.pig.load.LzoTokenizedLoader('\t');
    DUMP A;

    When I run this script which uses the LzoTokenizedLoader
    there
    is
    no
    output. The script appears to run without errors but there
    are
    zero
    Records
    Written and 0 Bytes Written.

    Here is the exact output:

    grunt > DUMP A;
    [main] INFO
    com.twitter.elephantbird.pig.load.LzoTokenizedLoader
    -
    LzoTokenizedLoader with given delimited [ ]
    [main] INFO
    com.twitter.elephantbird.pig.load.LzoTokenizedLoader
    -
    LzoTokenizedLoader with given delimited [ ]
    [main] INFO
    com.twitter.elephantbird.pig.load.LzoTokenizedLoader
    -
    LzoTokenizedLoader with given delimited [ ]
    [main] INFO
    org.apache.pig.backend.hadoop.executionengine.HExecutionEngine
    -
    (Name:





    Store(hdfs://master:9000/tmp/temp-2052828736/tmp-1533645117:org.apache.pig.builtin.BinStorage)
    - 1-4 Operator Key: 1-4
    [main] INFO





    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
    - MR plan size before optimization: 1
    [main] INFO





    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
    - MR plan size after optimization: 1
    [main] INFO





    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
    - mapred.job.reduce.markreset.buffer.percent is not set, set
    to
    default
    0.3
    [main] INFO





    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
    - Setting up single store job
    [main] INFO





    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
    - 1 map-reduce job(s) waiting for submission.
    [Thread-12] WARN org.apache.hadoop.mapred.JobClient - Use
    GenericOptionsParser for parsing the arguments. Applications
    should
    implement Tool for the same.
    [Thread-12] INFO
    com.twitter.elephantbird.pig.load.LzoTokenizedLoader
    -
    LzoTokenizedLoader with given delimiter [ ]
    [Thread-12] INFO
    org.apache.hadoop.mapreduce.lib.input.FileInputFormat
    -
    Total input paths to process : 1
    [main] INFO





    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
    - 0% complete
    [main] INFO





    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
    - HadoopJobId: job_201009101108_0151
    [main] INFO





    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
    - More information at
    http://master:50030/jobdetails.jsp?jobid=job_201009101108_0151
    [main] INFO





    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
    - 50% complete
    [main] INFO





    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
    - 100% complete
    [main] INFO





    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
    - Succesfully stored result in
    "hdfs://amb-hadoop-01:9000/tmp/temp-2052828736/tmp-1533645117
    [main] INFO





    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
    - Records written: 0
    [main] INFO





    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
    - Bytes written: 0
    [main] INFO





    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
    - Spillable Memory Manager spill count : 0
    [main] INFO





    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
    - Proactive spill count : 0
    [main] INFO





    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
    - Success!
    [main] INFO
    org.apache.hadoop.mapreduce.lib.input.FileInputFormat
    -
    Total
    input paths to process: 1
    [main] INFO
    org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil
    -
    Total input paths to process: 1
    grunt >

    I'm not sure if I'm doing something wrong in my use of
    LzoTokenizedLoader
    or
    if there is a problem with the class itself (most likely the
    problem
    is
    with
    my code heh) Thank you for any help!

    ~Ed



    .



    The information contained in this communication is intended

    solely for

    the
    use of the individual or entity to whom it is addressed and

    others
    authorized to receive it. It may contain confidential or
    legally
    privileged
    information. If you are not the intended recipient you are
    hereby
    notified
    that any disclosure, copying, distribution or taking any action
    in
    reliance
    on the contents of this information is strictly prohibited and
    may
    be
    unlawful. If you have received this communication in error,
    please
    notify us
    immediately by responding to this email and then delete it from
    your
    system.
    The firm is neither liable for the proper and complete
    transmission
    of
    the
    information contained in this communication nor for any delay
    in
    its
    receipt.
    .




    The information contained in this communication is intended
    solely
    for

    the

    use of the individual or entity to whom it is addressed and
    others

    authorized to receive it. It may contain confidential or
    legally
    privileged
    information. If you are not the intended recipient you are
    hereby
    notified
    that any disclosure, copying, distribution or taking any action
    in
    reliance
    on the contents of this information is strictly prohibited and
    may
    be
    unlawful. If you have received this communication in error,
    please
    notify
    us
    immediately by responding to this email and then delete it from
    your
    system.
    The firm is neither liable for the proper and complete
    transmission
    of
    the
    information contained in this communication nor for any delay in
    its
    receipt.


    .



    The information contained in this communication is intended
    solely

    for
    the

    use of the individual or entity to whom it is addressed and
    others
    authorized to receive it. It may contain confidential or legally
    privileged
    information. If you are not the intended recipient you are hereby
    notified
    that any disclosure, copying, distribution or taking any action in
    reliance
    on the contents of this information is strictly prohibited and may
    be
    unlawful. If you have received this communication in error, please
    notify us
    immediately by responding to this email and then delete it from
    your
    system.
    The firm is neither liable for the proper and complete
    transmission
    of
    the
    information contained in this communication nor for any delay in
    its
    receipt.


    .



    The information contained in this communication is intended
    solely
    for

    the
    use of the individual or entity to whom it is addressed and others
    authorized to receive it. It may contain confidential or legally
    privileged
    information. If you are not the intended recipient you are hereby
    notified
    that any disclosure, copying, distribution or taking any action in
    reliance
    on the contents of this information is strictly prohibited and may
    be
    unlawful. If you have received this communication in error, please
    notify us
    immediately by responding to this email and then delete it from your
    system.
    The firm is neither liable for the proper and complete transmission
    of
    the
    information contained in this communication nor for any delay in its
    receipt.




    .
    The information contained in this communication is intended solely
    for
    the
    use of the individual or entity to whom it is addressed and others
    authorized to receive it. It may contain confidential or legally
    privileged
    information. If you are not the intended recipient you are hereby
    notified
    that any disclosure, copying, distribution or taking any action in
    reliance
    on the contents of this information is strictly prohibited and may be
    unlawful. If you have received this communication in error, please
    notify
    us
    immediately by responding to this email and then delete it from your
    system.
    The firm is neither liable for the proper and complete transmission of
    the
    information contained in this communication nor for any delay in its
    receipt.


    .
    The information contained in this communication is intended solely for the
    use of the individual or entity to whom it is addressed and others
    authorized to receive it. It may contain confidential or legally
    privileged
    information. If you are not the intended recipient you are hereby notified
    that any disclosure, copying, distribution or taking any action in reliance
    on the contents of this information is strictly prohibited and may be
    unlawful. If you have received this communication in error, please notify us
    immediately by responding to this email and then delete it from your system.
    The firm is neither liable for the proper and complete transmission of the
    information contained in this communication nor for any delay in its
    receipt.
  • Marko Musnjak at Oct 25, 2010 at 12:55 pm
    Hi,
    Same happening here. Is there a workaround for Cloudera's cdh3b2 (pig 0.7)?
    Or something that could be done in the hirohanin branch?

    M.
    On Thu, Sep 30, 2010 at 22:59, Dmitriy Ryaboy wrote:

    Rohan, there was a regression in Pig 7 that made the counters not work, I
    fixed that in Pig 8 (there are now explicit methods Pig provides for using
    counters, instead of the hacky things PigCounterHelper was doing in
    elephant-bird). Did you do anything with the counters when migrating to
    0.7?

    -D
    On Thu, Sep 30, 2010 at 5:52 AM, ed wrote:

    Hmm,

    It'd be good to find one other person having problems with it on
    Cloudera's
    distro as I could just be making a mistake somewhere and it has nothing to
    do with the fact that I'm using cloudera's distro.

    ~Ed
    On Thu, Sep 30, 2010 at 8:47 AM, Rohan Rai wrote:

    The validation done by me was on
    Apache 0.20.2 and Apache Pig 0.7..

    I haven't tried it with Cloudera's version.

    Can we verify that it doesnt work with them too


    Regards
    Rohan

    ed wrote:
    Hello,

    I'm using Cloudera's Hadoop CDH3B2--Hadoop-0.20.2+320 (based on Apache
    Hadoop 20.2) with Pig 0.7 (from Cloudera's distro).

    Thank you!

    ~Ed


    On Wed, Sep 29, 2010 at 11:56 PM, Rohan Rai <rohan.rai@inmobi.com>
    wrote:

    Hi
    Which Hadoop/ PIg version are you using ??

    Regards
    Rohan


    ed wrote:


    Hello,
    I tested the newest push to the hirohanin elephant-bird branch (for
    pig
    0.7)
    and had an error when trying to use LzoTokenizedLoader with the
    following
    pig script:

    REGISTER elephant-bird-1.0.jar
    REGISTER /usr/lib/elephant-bird/lib/google-collect-1.0.jar
    A = load '/usr/foo/input/test_input_chars.txt.lzo' USING
    com.twitter.elephantbird.pig.load.LzoTokenizedLoader('\t');
    DUMP A;

    The error I get is in the mapper logs and is as follows:

    INFO com.hadoop.compression.lzo.GPLNativeCodeLoader: Loaded native
    gpl
    library
    INFO com.hadoop.compression.lzo.LzoCodec: Succesfully loaded &
    initialized
    native-lzo library
    INFO com.twitter.elephantbird.pig.load.LzoTokenizedLoader:
    LzoTokenizedLoader with given delimiter [ ]
    INFO com.twitter.elephantbird.mapreduce.input.LzoRecordReader:
    Seeking
    to
    split start at pos 0
    FATAL org.apache.hadoop.mapred.TaskTracker: Error running child :
    java.lang.NoSuchMethodError:

    org.apache.pig.backend.executionengine.mapReduceLayer.PigHadoopLogger.getTaskIOCContext()Lorg/apache/hadoop/mapreduce/TaskInputOutputContext;
    at com.twitter.elephantbird.pig.util.PigCounterHelper.getTIOC
    (Unknown
    Source)
    at com.twitter.elephantbird.pig.util.PigCounterHelper.incrCounter
    (Unknown Source)
    at com.twitter.elephantbird.pig.load.LzoBaseLoadFunc.incrCounter
    (Unknown Source)
    at com.twitter.elephantbird.pig.load.LzoTokenizedLoader.getNext
    (Unknown Source)
    at

    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.nextKeyValue
    (PigRecordReader.java:142)
    at

    org.apache.hadoop.mapred.MapTask$NewTrackignRecordReader.nextKeyValue(MapTask.java:423)
    at org.apache.hadoop.mapreduce.MapContext.nextKeyValue
    (MapContent.java:67)
    at org.apache.hadoop.mapreduce.Mapper.run (Mapper.java:143)
    at
    org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
    at org.apache.hadoop.mapred.Child.main(Child.java:170)

    Do you think I'm forgetting some required library?

    Thank you!

    ~Ed

    On Tue, Sep 28, 2010 at 2:10 PM, ed wrote:


    Thank you Rohan, I really appreciate your help! I'll give it shot
    and
    post back if it works.
    ~Ed


    On Mon, Sep 27, 2010 at 11:51 PM, Rohan Rai <rohan.rai@inmobi.com>
    wrote:


    Just corrected/tested and pushed LzoTokenizedLoader to the
    personal
    fork

    Hopefully it works now

    Regards
    Rohan

    Dmitriy Ryaboy wrote:


    lzop should work.

    On Mon, Sep 27, 2010 at 10:59 AM, Rohan Rai <
    rohan.rai@inmobi.com>
    wrote:


    Well

    I haven't tried (rather I don't remember) compressing via lzop
    and
    then
    putting on cluster...
    So cant tell you about that...Here is what works for me.

    I do it by first putting the file on cluster and then doing
    Stream
    Compression.

    And yes it need not be indexed (I guess it doesn't matter for
    small
    test file, otherwise it is unwise
    for one loses the benefit of parallelism)

    Regards
    Rohan


    pig wrote:


    Hi Rohan,

    The test file (test_input_chars.txt.lzo) is not indexed. I
    created
    it
    using
    the command

    'lzop test_input_chars.txt'

    It's a really small file (only 6 lines) so I didn't think it
    needed
    to
    be
    index. Do all files regardless of size need to be indexed for
    the
    LzoTokenizedLoader to work?

    Thank you!

    ~Ed

    On Mon, Sep 27, 2010 at 1:25 AM, Rohan Rai <
    rohan.rai@inmobi.com
    wrote:


    Oh Sorry I am completely out of sync...

    Can you tell how did you lzo'ed and indexed the file

    Regards

    Rohan
    Rohan Rai wrote:


    Oh Sorry I did not see this mail ...

    Its not an official patch/release

    But here is a fork on elephant-bird which works with pig 0.7

    for normal LZOText Loading etc
    (NOt HbaseLoader)

    Regards
    Rohan

    Dmitriy Ryaboy wrote:

    The 0.7 branch is not tested.. it's quite likely it doesn't
    actually
    work

    :).

    Rohan Rai was working on it.. Rohan, think you can take a
    look
    and

    help

    Ed
    out?

    Ed, you may want to check if the same input works when you
    use
    Pig
    0.6
    (and
    the official elephant-bird, on Kevin Weil's github).

    -D

    On Thu, Sep 23, 2010 at 6:49 AM, pig <hadoopnode@gmail.com>
    wrote:



    Hello,

    After getting all the errors to go away with LZO libraries
    not
    being

    found

    and missing jar files for elephant-bird I've run into a new

    problem
    when
    using the elephant-bird branch for pig 0.7

    The following simple pig script works as expected

    REGISTER elephant-bird-1.0.jar
    REGISTER /usr/lib/elephant-bird/lib/google-collect-1.0.jar
    A = load '/usr/foo/input/test_input_chars.txt';
    DUMP A;

    This just dumps out the contents of the
    test_input_chars.txt
    file
    which
    is
    tab delimited. The output looks like:

    (1,a,a,a,a,a,a)
    (2,b,b,b,b,b,b)
    (3,c,c,c,c,c,c)
    (4,d,d,d,d,d,d)
    (5,e,e,e,e,e,e)

    I then lzop the test file to get test_input_chars.txt.lzo
    (I
    decompressed
    this with lzop -d to make sure the compression worked fine
    and
    everything
    looks good).
    If I run the exact same script provided above on the lzo
    file
    it
    works
    fine. However, this file is really small and doesn't need
    to
    use
    indexes.
    As a result, I wanted to
    have LZO support that worked with indexes. Based on this I
    decided
    to
    try
    out the elephant-bird branch for pig 0.7 located here (
    http://github.com/hirohanin/elephant-bird/) as
    recommended by Dimitriy.

    I created the following pig script that mirrors the above
    script
    but
    should
    hopefully work on LZO files (including indexed ones)

    REGISTER elephant-bird-1.0.jar
    REGISTER /usr/lib/elephant-bird/lib/google-collect-1.0.jar
    A = load '/usr/foo/input/test_input_chars.txt.lzo' USING
    com.twitter.elephantbird.pig.load.LzoTokenizedLoader('\t');
    DUMP A;

    When I run this script which uses the LzoTokenizedLoader
    there
    is
    no
    output. The script appears to run without errors but there
    are
    zero
    Records
    Written and 0 Bytes Written.

    Here is the exact output:

    grunt > DUMP A;
    [main] INFO
    com.twitter.elephantbird.pig.load.LzoTokenizedLoader
    -
    LzoTokenizedLoader with given delimited [ ]
    [main] INFO
    com.twitter.elephantbird.pig.load.LzoTokenizedLoader
    -
    LzoTokenizedLoader with given delimited [ ]
    [main] INFO
    com.twitter.elephantbird.pig.load.LzoTokenizedLoader
    -
    LzoTokenizedLoader with given delimited [ ]
    [main] INFO
    org.apache.pig.backend.hadoop.executionengine.HExecutionEngine
    -
    (Name:





    Store(hdfs://master:9000/tmp/temp-2052828736/tmp-1533645117:org.apache.pig.builtin.BinStorage)
    - 1-4 Operator Key: 1-4
    [main] INFO





    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
    - MR plan size before optimization: 1
    [main] INFO





    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
    - MR plan size after optimization: 1
    [main] INFO





    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
    - mapred.job.reduce.markreset.buffer.percent is not set,
    set
    to
    default
    0.3
    [main] INFO





    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
    - Setting up single store job
    [main] INFO





    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
    - 1 map-reduce job(s) waiting for submission.
    [Thread-12] WARN org.apache.hadoop.mapred.JobClient - Use
    GenericOptionsParser for parsing the arguments.
    Applications
    should
    implement Tool for the same.
    [Thread-12] INFO
    com.twitter.elephantbird.pig.load.LzoTokenizedLoader
    -
    LzoTokenizedLoader with given delimiter [ ]
    [Thread-12] INFO
    org.apache.hadoop.mapreduce.lib.input.FileInputFormat
    -
    Total input paths to process : 1
    [main] INFO





    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
    - 0% complete
    [main] INFO





    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
    - HadoopJobId: job_201009101108_0151
    [main] INFO





    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
    - 50% complete
    [main] INFO





    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
    - 100% complete
    [main] INFO





    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
    - Succesfully stored result in
    "hdfs://amb-hadoop-01:9000/tmp/temp-2052828736/tmp-1533645117
    [main] INFO





    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
    - Records written: 0
    [main] INFO





    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
    - Bytes written: 0
    [main] INFO





    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
    - Spillable Memory Manager spill count : 0
    [main] INFO





    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
    - Proactive spill count : 0
    [main] INFO





    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
    - Success!
    [main] INFO
    org.apache.hadoop.mapreduce.lib.input.FileInputFormat
    -
    Total
    input paths to process: 1
    [main] INFO
    org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil
    -
    Total input paths to process: 1
    grunt >

    I'm not sure if I'm doing something wrong in my use of
    LzoTokenizedLoader
    or
    if there is a problem with the class itself (most likely
    the
    problem
    is
    with
    my code heh) Thank you for any help!

    ~Ed



    .



    The information contained in this communication is
    intended
    solely for

    the
    use of the individual or entity to whom it is addressed and

    others
    authorized to receive it. It may contain confidential or
    legally
    privileged
    information. If you are not the intended recipient you are
    hereby
    notified
    that any disclosure, copying, distribution or taking any
    action
    in
    reliance
    on the contents of this information is strictly prohibited
    and
    may
    be
    unlawful. If you have received this communication in error,
    please
    notify us
    immediately by responding to this email and then delete it
    from
    your
    system.
    The firm is neither liable for the proper and complete
    transmission
    of
    the
    information contained in this communication nor for any delay
    in
    its
    receipt.
    .




    The information contained in this communication is intended
    solely
    for

    the

    use of the individual or entity to whom it is addressed and
    others

    authorized to receive it. It may contain confidential or
    legally
    privileged
    information. If you are not the intended recipient you are
    hereby
    notified
    that any disclosure, copying, distribution or taking any
    action
    in
    reliance
    on the contents of this information is strictly prohibited and
    may
    be
    unlawful. If you have received this communication in error,
    please
    notify
    us
    immediately by responding to this email and then delete it
    from
    your
    system.
    The firm is neither liable for the proper and complete
    transmission
    of
    the
    information contained in this communication nor for any delay
    in
    its
    receipt.


    .



    The information contained in this communication is intended
    solely

    for
    the

    use of the individual or entity to whom it is addressed and
    others
    authorized to receive it. It may contain confidential or legally
    privileged
    information. If you are not the intended recipient you are
    hereby
    notified
    that any disclosure, copying, distribution or taking any action
    in
    reliance
    on the contents of this information is strictly prohibited and
    may
    be
    unlawful. If you have received this communication in error,
    please
    notify us
    immediately by responding to this email and then delete it from
    your
    system.
    The firm is neither liable for the proper and complete
    transmission
    of
    the
    information contained in this communication nor for any delay in
    its
    receipt.


    .



    The information contained in this communication is intended
    solely
    for

    the
    use of the individual or entity to whom it is addressed and others
    authorized to receive it. It may contain confidential or legally
    privileged
    information. If you are not the intended recipient you are hereby
    notified
    that any disclosure, copying, distribution or taking any action in
    reliance
    on the contents of this information is strictly prohibited and may
    be
    unlawful. If you have received this communication in error, please
    notify us
    immediately by responding to this email and then delete it from
    your
    system.
    The firm is neither liable for the proper and complete
    transmission
    of
    the
    information contained in this communication nor for any delay in
    its
    receipt.




    .
    The information contained in this communication is intended solely
    for
    the
    use of the individual or entity to whom it is addressed and others
    authorized to receive it. It may contain confidential or legally
    privileged
    information. If you are not the intended recipient you are hereby
    notified
    that any disclosure, copying, distribution or taking any action in
    reliance
    on the contents of this information is strictly prohibited and may be
    unlawful. If you have received this communication in error, please
    notify
    us
    immediately by responding to this email and then delete it from your
    system.
    The firm is neither liable for the proper and complete transmission
    of
    the
    information contained in this communication nor for any delay in its
    receipt.


    .
    The information contained in this communication is intended solely for the
    use of the individual or entity to whom it is addressed and others
    authorized to receive it. It may contain confidential or legally
    privileged
    information. If you are not the intended recipient you are hereby notified
    that any disclosure, copying, distribution or taking any action in reliance
    on the contents of this information is strictly prohibited and may be
    unlawful. If you have received this communication in error, please
    notify
    us
    immediately by responding to this email and then delete it from your system.
    The firm is neither liable for the proper and complete transmission of the
    information contained in this communication nor for any delay in its
    receipt.

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupuser @
categoriespig, hadoop
postedSep 23, '10 at 1:50p
activeOct 25, '10 at 12:55p
posts18
users4
websitepig.apache.org

People

Translate

site design / logo © 2021 Grokbase