Grokbase Groups Pig user March 2011
FAQ
I've been playing with pig this week and I'm running into an issue that
seems like it should be trivial. I'm basically reading data from hbase and
and performing a count of sessions associated with a cookie.

I'm running on Pig 0.8

My script looks like the following

raw = LOAD 'hbase://sport_user'

USING org.apache.pig.backend.hadoop.hbase.HBaseStorage(

'session:*', '-loadKey true')

AS (id:bytearray, session_map:map[]);



-- Convert maps to bags

B = FOREACH raw GENERATE id, mapToBag(session_map) AS session_bag;



--dump B;


-- Count the number of session

C = FOREACH B GENERATE id,

COUNT(session_bag) as sess_count;



describe C ;

dump C ;



This works fine. when I dump "C" I see the cg cookie and num of sessions.


For Example


(ANON_Cg+5EUka4wFOAAAAtRg,2)

(ANON_Cg+5EUknSmmLAAAA5CU,1)

(ANON_Cg+5EUlHWwwNAAAALQQ,1)

(ANON_Cg+5EUlSDOIJAAAAygw,1)

(ANON_Cg+5EUlgDESHAAAAWQ0,1)

(ANON_Cg+5EUli1UHBAAAA/xg,4)

(ANON_Cg+5EUmSc3sPAAAAsg4,2)

(ANON_Cg+5EUmo6i8PAAAAwxo,2)

(ANON_Cg+5EUn2X6HOAAAAWSM,1)

(ANON_Cg+5EUn5PmRCAQAA1xA,4)

(ANON_Cg+5EUnUT9+NAAAA0RE,3)

(ANON_Cg+5EUnjSD0BAAAACx0,1)

(ANON_Cg+5EUoJF82PAAAAkgI,1)

(ANON_Cg+5EUoWJW9GAAAAcx4,1)

(ANON_Cg+5EUorklmHAAAAxRk,1)

(ANON_Cg+5EUp1bXGFAAAAPwA,1)

(ANON_Cg+5EUp55I5OAAAAmR4,2)

(ANON_Cg+5EUp9XkHFAAAAYQ8,2)

(ANON_Cg+5EUpK/koEAAAAcRs,3)

(ANON_Cg+5EUpd/aDJAAAABBw,3)


If I then do a desc sort on the alias "C" I get an error when I dump it


D = ORDER C BY sess_count DESC ;


dump D ;


2011-03-10 16:10:59,325 [Thread-57] WARN
org.apache.hadoop.mapred.LocalJobRunner - job_local_0004

java.lang.RuntimeException:
org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path does
not exist:
file:/Users/keric/Documents/workspace/_Java/cnwk-hadoop/pigsample_368958259_1299791458629

at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.partitioners.WeightedRangePartitioner.setConf(WeightedRangePartitioner.java:139)

at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:62)

at
org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)

at
org.apache.hadoop.mapred.MapTask$NewOutputCollector.(MapTask.java:613)

at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)

at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177)

Caused by: org.apache.hadoop.mapreduce.lib.input.InvalidInputException:
Input path does not exist:
file:/Users/keric/Documents/workspace/_Java/cnwk-hadoop/pigsample_368958259_1299791458629

at
org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:224)

at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigFileInputFormat.listStatus(PigFileInputFormat.java:37)

at
org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:241)

at org.apache.pig.impl.io.ReadToEndLoader.init(ReadToEndLoader.java:153)

at org.apache.pig.impl.io.ReadToEndLoader.(WeightedRangePartitioner.java:112)

... 6 more

any thoughts ?


Thanks


Keric

Search Discussions

  • Thejas M Nair at Mar 11, 2011 at 9:46 pm
    For some reason pig fails to find the samples files created in the sampling MR job of the order-by.
    You seem to be running in local mode, is this error seen in map-reduce mode as well?
    -Thejas



    On 3/11/11 8:35 AM, "Keric Donnelly" wrote:

    I've been playing with pig this week and I'm running into an issue that
    seems like it should be trivial. I'm basically reading data from hbase and
    and performing a count of sessions associated with a cookie.

    I'm running on Pig 0.8

    My script looks like the following

    raw = LOAD 'hbase://sport_user'

    USING org.apache.pig.backend.hadoop.hbase.HBaseStorage(

    'session:*', '-loadKey true')

    AS (id:bytearray, session_map:map[]);



    -- Convert maps to bags

    B = FOREACH raw GENERATE id, mapToBag(session_map) AS session_bag;



    --dump B;


    -- Count the number of session

    C = FOREACH B GENERATE id,

    COUNT(session_bag) as sess_count;



    describe C ;

    dump C ;



    This works fine. when I dump "C" I see the cg cookie and num of sessions.


    For Example


    (ANON_Cg+5EUka4wFOAAAAtRg,2)

    (ANON_Cg+5EUknSmmLAAAA5CU,1)

    (ANON_Cg+5EUlHWwwNAAAALQQ,1)

    (ANON_Cg+5EUlSDOIJAAAAygw,1)

    (ANON_Cg+5EUlgDESHAAAAWQ0,1)

    (ANON_Cg+5EUli1UHBAAAA/xg,4)

    (ANON_Cg+5EUmSc3sPAAAAsg4,2)

    (ANON_Cg+5EUmo6i8PAAAAwxo,2)

    (ANON_Cg+5EUn2X6HOAAAAWSM,1)

    (ANON_Cg+5EUn5PmRCAQAA1xA,4)

    (ANON_Cg+5EUnUT9+NAAAA0RE,3)

    (ANON_Cg+5EUnjSD0BAAAACx0,1)

    (ANON_Cg+5EUoJF82PAAAAkgI,1)

    (ANON_Cg+5EUoWJW9GAAAAcx4,1)

    (ANON_Cg+5EUorklmHAAAAxRk,1)

    (ANON_Cg+5EUp1bXGFAAAAPwA,1)

    (ANON_Cg+5EUp55I5OAAAAmR4,2)

    (ANON_Cg+5EUp9XkHFAAAAYQ8,2)

    (ANON_Cg+5EUpK/koEAAAAcRs,3)

    (ANON_Cg+5EUpd/aDJAAAABBw,3)


    If I then do a desc sort on the alias "C" I get an error when I dump it


    D = ORDER C BY sess_count DESC ;


    dump D ;


    2011-03-10 16:10:59,325 [Thread-57] WARN
    org.apache.hadoop.mapred.LocalJobRunner - job_local_0004

    java.lang.RuntimeException:
    org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path does
    not exist:
    file:/Users/keric/Documents/workspace/_Java/cnwk-hadoop/pigsample_368958259_1299791458629

    at
    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.partitioners.WeightedRangePartitioner.setConf(WeightedRangePartitioner.java:139)

    at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:62)

    at
    org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)

    at
    org.apache.hadoop.mapred.MapTask$NewOutputCollector.(MapTask.java:613)

    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)

    at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177)

    Caused by: org.apache.hadoop.mapreduce.lib.input.InvalidInputException:
    Input path does not exist:
    file:/Users/keric/Documents/workspace/_Java/cnwk-hadoop/pigsample_368958259_1299791458629

    at
    org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:224)

    at
    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigFileInputFormat.listStatus(PigFileInputFormat.java:37)

    at
    org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:241)

    at org.apache.pig.impl.io.ReadToEndLoader.init(ReadToEndLoader.java:153)

    at org.apache.pig.impl.io.ReadToEndLoader.(WeightedRangePartitioner.java:112)

    ... 6 more

    any thoughts ?


    Thanks


    Keric
  • Keric Donnelly at Mar 14, 2011 at 1:17 pm
    I was running in local mode, or so I thought. I did not had the "pig -x
    local" set when executing. Once I added the switch, the script ran
    correctly.

    Thanks.

    Keric
    On Fri, Mar 11, 2011 at 4:44 PM, Thejas M Nair wrote:

    For some reason pig fails to find the samples files created in the
    sampling MR job of the order-by.
    You seem to be running in local mode, is this error seen in map-reduce mode
    as well?
    -Thejas




    On 3/11/11 8:35 AM, "Keric Donnelly" wrote:

    I've been playing with pig this week and I'm running into an issue that
    seems like it should be trivial. I'm basically reading data from hbase and
    and performing a count of sessions associated with a cookie.

    I'm running on Pig 0.8

    My script looks like the following

    raw = LOAD 'hbase://sport_user'

    USING org.apache.pig.backend.hadoop.hbase.HBaseStorage(

    'session:*', '-loadKey true')

    AS (id:bytearray, session_map:map[]);



    -- Convert maps to bags

    B = FOREACH raw GENERATE id, mapToBag(session_map) AS session_bag;



    --dump B;


    -- Count the number of session

    C = FOREACH B GENERATE id,

    COUNT(session_bag) as sess_count;



    describe C ;

    dump C ;



    This works fine. when I dump "C" I see the cg cookie and num of sessions.


    For Example


    (ANON_Cg+5EUka4wFOAAAAtRg,2)

    (ANON_Cg+5EUknSmmLAAAA5CU,1)

    (ANON_Cg+5EUlHWwwNAAAALQQ,1)

    (ANON_Cg+5EUlSDOIJAAAAygw,1)

    (ANON_Cg+5EUlgDESHAAAAWQ0,1)

    (ANON_Cg+5EUli1UHBAAAA/xg,4)

    (ANON_Cg+5EUmSc3sPAAAAsg4,2)

    (ANON_Cg+5EUmo6i8PAAAAwxo,2)

    (ANON_Cg+5EUn2X6HOAAAAWSM,1)

    (ANON_Cg+5EUn5PmRCAQAA1xA,4)

    (ANON_Cg+5EUnUT9+NAAAA0RE,3)

    (ANON_Cg+5EUnjSD0BAAAACx0,1)

    (ANON_Cg+5EUoJF82PAAAAkgI,1)

    (ANON_Cg+5EUoWJW9GAAAAcx4,1)

    (ANON_Cg+5EUorklmHAAAAxRk,1)

    (ANON_Cg+5EUp1bXGFAAAAPwA,1)

    (ANON_Cg+5EUp55I5OAAAAmR4,2)

    (ANON_Cg+5EUp9XkHFAAAAYQ8,2)

    (ANON_Cg+5EUpK/koEAAAAcRs,3)

    (ANON_Cg+5EUpd/aDJAAAABBw,3)


    If I then do a desc sort on the alias "C" I get an error when I dump it


    D = ORDER C BY sess_count DESC ;


    dump D ;


    2011-03-10 16:10:59,325 [Thread-57] WARN
    org.apache.hadoop.mapred.LocalJobRunner - job_local_0004

    java.lang.RuntimeException:
    org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path
    does
    not exist:

    file:/Users/keric/Documents/workspace/_Java/cnwk-hadoop/pigsample_368958259_1299791458629

    at

    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.partitioners.WeightedRangePartitioner.setConf(WeightedRangePartitioner.java:139)

    at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:62)

    at

    org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)

    at

    org.apache.hadoop.mapred.MapTask$NewOutputCollector.<init>(MapTask.java:527)

    at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:613)

    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)

    at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177)

    Caused by: org.apache.hadoop.mapreduce.lib.input.InvalidInputException:
    Input path does not exist:

    file:/Users/keric/Documents/workspace/_Java/cnwk-hadoop/pigsample_368958259_1299791458629

    at

    org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:224)

    at

    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigFileInputFormat.listStatus(PigFileInputFormat.java:37)

    at

    org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:241)

    at org.apache.pig.impl.io.ReadToEndLoader.init(ReadToEndLoader.java:153)

    at org.apache.pig.impl.io.ReadToEndLoader.<init>(ReadToEndLoader.java:115)

    at

    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.partitioners.WeightedRangePartitioner.setConf(WeightedRangePartitioner.java:112)

    ... 6 more

    any thoughts ?


    Thanks


    Keric


    --

    *Keric Donnelly*
    Senior Data Architect
    T 954-689-3291 C 954.683.5445

    1401 West Cypress Creek Road, Fort Lauderdale, FL 33309

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupuser @
categoriespig, hadoop
postedMar 11, '11 at 4:36p
activeMar 14, '11 at 1:17p
posts3
users2
websitepig.apache.org

2 users in discussion

Keric Donnelly: 2 posts Thejas M Nair: 1 post

People

Translate

site design / logo © 2021 Grokbase