Grokbase Groups Pig user March 2011
FAQ
The following pig script runs fine without the 2GB memory setting (see in yellow). But fails with memory setting. I am not sure what's happening. It's a simple operation of joining one tuple(of 1 row) with the other tuple.
Here is what I am trying to do:

1. grouping all SELECT HIT TIME DATA into a single tuple by doing a GROUP ALL.
2. getting the min and max of that set and putting it into MIN HIT DATA. This is a tuple with a single row.
3. then grouping SELECT MAX VISIT TIME DATA by visid,
4. then generating DUMMY_KEY for every row, along with MAX of start time.
5. then try to join the single tuple in 2 with all tuples generated in 4 to get a min time and a max time

Code:
Shell prompt:
## setting heap size to 2 GB
PIG_OPTS="$PIG_OPTS -Dmapred.child.java.opts=-Xmx2048m"
export PIG_OPTS

Pig/Grunt

RAW_DATA = LOAD '/omniture_test_qa/cleansed_output_1/2011/01/05/wdgesp360/wdgesp360_2011-01-05*.tsv.gz' USING PigStorage('\t');
FILTER_EXCLUDES_DATA = FILTER RAW_DATA BY $6 <= 0;
SELECT_CAST_DATA = FOREACH FILTER_EXCLUDES_DATA GENERATE 'DUMMYKEY' AS DUMMY_KEY,(int)$0 AS hit_time_gmt, (long)$2 AS visid_high, (long)$3 AS visid_low, (chararray)$5 AS truncated_hit;
SELECT_DATA = FILTER SELECT_CAST_DATA BY truncated_hit =='N';
--MIN AND MAX_HIT_TIME_GMT FOR THE FILE/SUITE
SELECT_HIT_TIME_DATA = FOREACH SELECT_DATA GENERATE (int)hit_time_gmt;
GROUPED_ALL_DATA = GROUP SELECT_HIT_TIME_DATA ALL PARALLEL 100;
MIN_HIT_DATA = FOREACH GROUPED_ALL_DATA GENERATE 'DUMMYKEY'AS DUMMY_KEY,MIN(SELECT_HIT_TIME_DATA.hit_time_gmt) AS MIN_HIT_TIME_GMT,MAX(SELECT_HIT_TIME_DATA.hit_time_gmt) AS MAX_HIT_TIME_GMT;
---MAX_VISIT_START_TIME BY VISITOR_ID
SELECT_MAX_VISIT_TIME_DATA = FOREACH SELECT_DATA GENERATE visid_high,visid_low,visit_start_time_gmt;
GROUP_BY_VISID_MAX_VISIT_TIME_DATA = GROUP SELECT_MAX_VISIT_TIME_DATA BY (visid_high,visid_low) PARALLEL 100;
MAX_VISIT_TIME = FOREACH GROUP_BY_VISID_MAX_VISIT_TIME_DATA GENERATE 'DUMMYKEY' AS DUMMY_KEY,FLATTEN(group.visid_high) AS visid_high,FLATTEN(group.visid_low) AS visid_low, MAX(SELECT_MAX_VISIT_TIME_DATA.visit_start_time_gmt) AS MAX_VISIT_START_TIME;
JOINED_MAX_VISIT_TIME_DATA = COGROUP MAX_VISIT_TIME BY DUMMY_KEY OUTER,MIN_HIT_DATA BY DUMMY_KEY OUTER PARALLEL 100;
MIN_MAX_VISIT_HIT_TIME = FOREACH JOINED_MAX_VISIT_TIME_DATA GENERATE FLATTEN(MAX_VISIT_TIME.visid_high),FLATTEN(MAX_VISIT_TIME.visid_low),FLATTEN(MAX_VISIT_TIME.MAX_VISIT_START_TIME),FLATTEN(MIN_HIT_DATA.MIN_HIT_TIME_GMT),FLATTEN(MIN_HIT_DATA.MAX_HIT_TIME_GMT);
DUMP MIN_MAX_VISIT_HIT_TIME;


Can any one please guide me through this problem?
Thanks
Sri

Search Discussions

  • Thejas M Nair at Mar 14, 2011 at 11:19 pm
    What version of pig are you using ? There have been some memory utilization
    fixes in 0.8 . For this use case, you can also use the new scalar feature in
    0.8 -
    http://pig.apache.org/docs/r0.8.0/piglatin_ref2.html#Casting+Relations+to+Sc
    alars . That query plan will be more efficient.

    You might want to build a new version of pig from svn 0.8 branch because
    there have been some bug fixes after the release -

    svn co http://svn.apache.org/repos/asf/pig/branches/branch-0.8
    cd branch-0.8
    ant

    -Thejas

    On 3/14/11 1:40 PM, "Paltheru, Srikanth" wrote:

    The following pig script runs fine without the 2GB memory setting (see in
    yellow). But fails with memory setting. I am not sure what's happening. It's a
    simple operation of joining one tuple(of 1 row) with the other tuple.
    Here is what I am trying to do:

    1. grouping all SELECT HIT TIME DATA into a single tuple by doing a GROUP
    ALL.
    2. getting the min and max of that set and putting it into MIN HIT DATA.
    This is a tuple with a single row.
    3. then grouping SELECT MAX VISIT TIME DATA by visid,
    4. then generating DUMMY_KEY for every row, along with MAX of start time.
    5. then try to join the single tuple in 2 with all tuples generated in 4 to
    get a min time and a max time

    Code:
    Shell prompt:
    ## setting heap size to 2 GB
    PIG_OPTS="$PIG_OPTS -Dmapred.child.java.opts=-Xmx2048m"
    export PIG_OPTS

    Pig/Grunt

    RAW_DATA = LOAD
    '/omniture_test_qa/cleansed_output_1/2011/01/05/wdgesp360/wdgesp360_2011-01-05
    *.tsv.gz' USING PigStorage('\t');
    FILTER_EXCLUDES_DATA = FILTER RAW_DATA BY $6 <= 0;
    SELECT_CAST_DATA = FOREACH FILTER_EXCLUDES_DATA GENERATE 'DUMMYKEY' AS
    DUMMY_KEY,(int)$0 AS hit_time_gmt, (long)$2 AS visid_high, (long)$3 AS
    visid_low, (chararray)$5 AS truncated_hit;
    SELECT_DATA = FILTER SELECT_CAST_DATA BY truncated_hit =='N';
    --MIN AND MAX_HIT_TIME_GMT FOR THE FILE/SUITE
    SELECT_HIT_TIME_DATA = FOREACH SELECT_DATA GENERATE (int)hit_time_gmt;
    GROUPED_ALL_DATA = GROUP SELECT_HIT_TIME_DATA ALL PARALLEL 100;
    MIN_HIT_DATA = FOREACH GROUPED_ALL_DATA GENERATE 'DUMMYKEY'AS
    DUMMY_KEY,MIN(SELECT_HIT_TIME_DATA.hit_time_gmt) AS
    MIN_HIT_TIME_GMT,MAX(SELECT_HIT_TIME_DATA.hit_time_gmt) AS MAX_HIT_TIME_GMT;
    ---MAX_VISIT_START_TIME BY VISITOR_ID
    SELECT_MAX_VISIT_TIME_DATA = FOREACH SELECT_DATA GENERATE
    visid_high,visid_low,visit_start_time_gmt;
    GROUP_BY_VISID_MAX_VISIT_TIME_DATA = GROUP SELECT_MAX_VISIT_TIME_DATA BY
    (visid_high,visid_low) PARALLEL 100;
    MAX_VISIT_TIME = FOREACH GROUP_BY_VISID_MAX_VISIT_TIME_DATA GENERATE
    'DUMMYKEY' AS DUMMY_KEY,FLATTEN(group.visid_high) AS
    visid_high,FLATTEN(group.visid_low) AS visid_low,
    MAX(SELECT_MAX_VISIT_TIME_DATA.visit_start_time_gmt) AS MAX_VISIT_START_TIME;
    JOINED_MAX_VISIT_TIME_DATA = COGROUP MAX_VISIT_TIME BY DUMMY_KEY
    OUTER,MIN_HIT_DATA BY DUMMY_KEY OUTER PARALLEL 100;
    MIN_MAX_VISIT_HIT_TIME = FOREACH JOINED_MAX_VISIT_TIME_DATA GENERATE
    FLATTEN(MAX_VISIT_TIME.visid_high),FLATTEN(MAX_VISIT_TIME.visid_low),FLATTEN(M
    AX_VISIT_TIME.MAX_VISIT_START_TIME),FLATTEN(MIN_HIT_DATA.MIN_HIT_TIME_GMT),FLA
    TTEN(MIN_HIT_DATA.MAX_HIT_TIME_GMT);
    DUMP MIN_MAX_VISIT_HIT_TIME;


    Can any one please guide me through this problem?
    Thanks
    Sri
  • Paltheru, Srikanth at Mar 14, 2011 at 11:28 pm
    I am using Pig 0.5 version. We don't have plans to upgrade it to a newer version. But the problem I have is the script runs for some files(both larger and smaller than the ones mentioned) but not for this particular one. I get "GC overhead limit" Error.
    Thanks
    Sri


    -----Original Message-----
    From: Thejas M Nair
    Sent: Monday, March 14, 2011 4:18 PM
    To: user@pig.apache.org; Paltheru, Srikanth
    Subject: Re: Problems with Join in pig

    What version of pig are you using ? There have been some memory utilization fixes in 0.8 . For this use case, you can also use the new scalar feature in
    0.8 -
    http://pig.apache.org/docs/r0.8.0/piglatin_ref2.html#Casting+Relations+to+Sc
    alars . That query plan will be more efficient.

    You might want to build a new version of pig from svn 0.8 branch because there have been some bug fixes after the release -

    svn co http://svn.apache.org/repos/asf/pig/branches/branch-0.8
    cd branch-0.8
    ant

    -Thejas

    On 3/14/11 1:40 PM, "Paltheru, Srikanth" wrote:

    The following pig script runs fine without the 2GB memory setting (see
    in yellow). But fails with memory setting. I am not sure what's
    happening. It's a simple operation of joining one tuple(of 1 row) with the other tuple.
    Here is what I am trying to do:

    1. grouping all SELECT HIT TIME DATA into a single tuple by doing a
    GROUP ALL.
    2. getting the min and max of that set and putting it into MIN HIT DATA.
    This is a tuple with a single row.
    3. then grouping SELECT MAX VISIT TIME DATA by visid, 4. then
    generating DUMMY_KEY for every row, along with MAX of start time.
    5. then try to join the single tuple in 2 with all tuples generated
    in 4 to get a min time and a max time

    Code:
    Shell prompt:
    ## setting heap size to 2 GB
    PIG_OPTS="$PIG_OPTS -Dmapred.child.java.opts=-Xmx2048m"
    export PIG_OPTS

    Pig/Grunt

    RAW_DATA = LOAD
    '/omniture_test_qa/cleansed_output_1/2011/01/05/wdgesp360/wdgesp360_20
    11-01-05
    *.tsv.gz' USING PigStorage('\t');
    FILTER_EXCLUDES_DATA = FILTER RAW_DATA BY $6 <= 0; SELECT_CAST_DATA =
    FOREACH FILTER_EXCLUDES_DATA GENERATE 'DUMMYKEY' AS
    DUMMY_KEY,(int)$0 AS hit_time_gmt, (long)$2 AS visid_high, (long)$3 AS
    visid_low, (chararray)$5 AS truncated_hit; SELECT_DATA = FILTER
    SELECT_CAST_DATA BY truncated_hit =='N'; --MIN AND MAX_HIT_TIME_GMT
    FOR THE FILE/SUITE SELECT_HIT_TIME_DATA = FOREACH SELECT_DATA GENERATE
    (int)hit_time_gmt; GROUPED_ALL_DATA = GROUP SELECT_HIT_TIME_DATA ALL
    PARALLEL 100; MIN_HIT_DATA = FOREACH GROUPED_ALL_DATA GENERATE
    'DUMMYKEY'AS
    DUMMY_KEY,MIN(SELECT_HIT_TIME_DATA.hit_time_gmt) AS
    MIN_HIT_TIME_GMT,MAX(SELECT_HIT_TIME_DATA.hit_time_gmt) AS
    MAX_HIT_TIME_GMT; ---MAX_VISIT_START_TIME BY VISITOR_ID
    SELECT_MAX_VISIT_TIME_DATA = FOREACH SELECT_DATA GENERATE
    visid_high,visid_low,visit_start_time_gmt;
    GROUP_BY_VISID_MAX_VISIT_TIME_DATA = GROUP SELECT_MAX_VISIT_TIME_DATA
    BY
    (visid_high,visid_low) PARALLEL 100;
    MAX_VISIT_TIME = FOREACH GROUP_BY_VISID_MAX_VISIT_TIME_DATA GENERATE
    'DUMMYKEY' AS DUMMY_KEY,FLATTEN(group.visid_high) AS
    visid_high,FLATTEN(group.visid_low) AS visid_low,
    MAX(SELECT_MAX_VISIT_TIME_DATA.visit_start_time_gmt) AS
    MAX_VISIT_START_TIME; JOINED_MAX_VISIT_TIME_DATA = COGROUP
    MAX_VISIT_TIME BY DUMMY_KEY OUTER,MIN_HIT_DATA BY DUMMY_KEY OUTER
    PARALLEL 100; MIN_MAX_VISIT_HIT_TIME = FOREACH
    JOINED_MAX_VISIT_TIME_DATA GENERATE
    FLATTEN(MAX_VISIT_TIME.visid_high),FLATTEN(MAX_VISIT_TIME.visid_low),F
    LATTEN(M
    AX_VISIT_TIME.MAX_VISIT_START_TIME),FLATTEN(MIN_HIT_DATA.MIN_HIT_TIME_
    GMT),FLA
    TTEN(MIN_HIT_DATA.MAX_HIT_TIME_GMT);
    DUMP MIN_MAX_VISIT_HIT_TIME;


    Can any one please guide me through this problem?
    Thanks
    Sri
  • Olga Natkovich at Mar 15, 2011 at 12:24 am
    Hi Sri,

    You guys should consider moving to the new version. This way you would get a better performing and more stable code as well better support since more people would be using the same code as you.

    Olga

    -----Original Message-----
    From: Paltheru, Srikanth
    Sent: Monday, March 14, 2011 4:21 PM
    To: Thejas M Nair; user@pig.apache.org
    Subject: RE: Problems with Join in pig

    I am using Pig 0.5 version. We don't have plans to upgrade it to a newer version. But the problem I have is the script runs for some files(both larger and smaller than the ones mentioned) but not for this particular one. I get "GC overhead limit" Error.
    Thanks
    Sri


    -----Original Message-----
    From: Thejas M Nair
    Sent: Monday, March 14, 2011 4:18 PM
    To: user@pig.apache.org; Paltheru, Srikanth
    Subject: Re: Problems with Join in pig

    What version of pig are you using ? There have been some memory utilization fixes in 0.8 . For this use case, you can also use the new scalar feature in
    0.8 -
    http://pig.apache.org/docs/r0.8.0/piglatin_ref2.html#Casting+Relations+to+Sc
    alars . That query plan will be more efficient.

    You might want to build a new version of pig from svn 0.8 branch because there have been some bug fixes after the release -

    svn co http://svn.apache.org/repos/asf/pig/branches/branch-0.8
    cd branch-0.8
    ant

    -Thejas

    On 3/14/11 1:40 PM, "Paltheru, Srikanth" wrote:

    The following pig script runs fine without the 2GB memory setting (see
    in yellow). But fails with memory setting. I am not sure what's
    happening. It's a simple operation of joining one tuple(of 1 row) with the other tuple.
    Here is what I am trying to do:

    1. grouping all SELECT HIT TIME DATA into a single tuple by doing a
    GROUP ALL.
    2. getting the min and max of that set and putting it into MIN HIT DATA.
    This is a tuple with a single row.
    3. then grouping SELECT MAX VISIT TIME DATA by visid, 4. then
    generating DUMMY_KEY for every row, along with MAX of start time.
    5. then try to join the single tuple in 2 with all tuples generated
    in 4 to get a min time and a max time

    Code:
    Shell prompt:
    ## setting heap size to 2 GB
    PIG_OPTS="$PIG_OPTS -Dmapred.child.java.opts=-Xmx2048m"
    export PIG_OPTS

    Pig/Grunt

    RAW_DATA = LOAD
    '/omniture_test_qa/cleansed_output_1/2011/01/05/wdgesp360/wdgesp360_20
    11-01-05
    *.tsv.gz' USING PigStorage('\t');
    FILTER_EXCLUDES_DATA = FILTER RAW_DATA BY $6 <= 0; SELECT_CAST_DATA =
    FOREACH FILTER_EXCLUDES_DATA GENERATE 'DUMMYKEY' AS
    DUMMY_KEY,(int)$0 AS hit_time_gmt, (long)$2 AS visid_high, (long)$3 AS
    visid_low, (chararray)$5 AS truncated_hit; SELECT_DATA = FILTER
    SELECT_CAST_DATA BY truncated_hit =='N'; --MIN AND MAX_HIT_TIME_GMT
    FOR THE FILE/SUITE SELECT_HIT_TIME_DATA = FOREACH SELECT_DATA GENERATE
    (int)hit_time_gmt; GROUPED_ALL_DATA = GROUP SELECT_HIT_TIME_DATA ALL
    PARALLEL 100; MIN_HIT_DATA = FOREACH GROUPED_ALL_DATA GENERATE
    'DUMMYKEY'AS
    DUMMY_KEY,MIN(SELECT_HIT_TIME_DATA.hit_time_gmt) AS
    MIN_HIT_TIME_GMT,MAX(SELECT_HIT_TIME_DATA.hit_time_gmt) AS
    MAX_HIT_TIME_GMT; ---MAX_VISIT_START_TIME BY VISITOR_ID
    SELECT_MAX_VISIT_TIME_DATA = FOREACH SELECT_DATA GENERATE
    visid_high,visid_low,visit_start_time_gmt;
    GROUP_BY_VISID_MAX_VISIT_TIME_DATA = GROUP SELECT_MAX_VISIT_TIME_DATA
    BY
    (visid_high,visid_low) PARALLEL 100;
    MAX_VISIT_TIME = FOREACH GROUP_BY_VISID_MAX_VISIT_TIME_DATA GENERATE
    'DUMMYKEY' AS DUMMY_KEY,FLATTEN(group.visid_high) AS
    visid_high,FLATTEN(group.visid_low) AS visid_low,
    MAX(SELECT_MAX_VISIT_TIME_DATA.visit_start_time_gmt) AS
    MAX_VISIT_START_TIME; JOINED_MAX_VISIT_TIME_DATA = COGROUP
    MAX_VISIT_TIME BY DUMMY_KEY OUTER,MIN_HIT_DATA BY DUMMY_KEY OUTER
    PARALLEL 100; MIN_MAX_VISIT_HIT_TIME = FOREACH
    JOINED_MAX_VISIT_TIME_DATA GENERATE
    FLATTEN(MAX_VISIT_TIME.visid_high),FLATTEN(MAX_VISIT_TIME.visid_low),F
    LATTEN(M
    AX_VISIT_TIME.MAX_VISIT_START_TIME),FLATTEN(MIN_HIT_DATA.MIN_HIT_TIME_
    GMT),FLA
    TTEN(MIN_HIT_DATA.MAX_HIT_TIME_GMT);
    DUMP MIN_MAX_VISIT_HIT_TIME;


    Can any one please guide me through this problem?
    Thanks
    Sri
  • Thejas M Nair at Mar 15, 2011 at 12:30 am
    Fragment-replicate join will also produce an efficient query plan for this use case - http://pig.apache.org/docs/r0.8.0/piglatin_ref1.html#Replicated+Joins . It is available in 0.5 as well.
    -Thejas



    On 3/14/11 3:20 PM, "Paltheru, Srikanth" wrote:

    I am using Pig 0.5 version. We don't have plans to upgrade it to a newer version. But the problem I have is the script runs for some files(both larger and smaller than the ones mentioned) but not for this particular one. I get "GC overhead limit" Error.
    Thanks
    Sri


    -----Original Message-----
    From: Thejas M Nair
    Sent: Monday, March 14, 2011 4:18 PM
    To: user@pig.apache.org; Paltheru, Srikanth
    Subject: Re: Problems with Join in pig

    What version of pig are you using ? There have been some memory utilization fixes in 0.8 . For this use case, you can also use the new scalar feature in
    0.8 -
    http://pig.apache.org/docs/r0.8.0/piglatin_ref2.html#Casting+Relations+to+Sc
    alars . That query plan will be more efficient.

    You might want to build a new version of pig from svn 0.8 branch because there have been some bug fixes after the release -

    svn co http://svn.apache.org/repos/asf/pig/branches/branch-0.8
    cd branch-0.8
    ant

    -Thejas

    On 3/14/11 1:40 PM, "Paltheru, Srikanth" wrote:

    The following pig script runs fine without the 2GB memory setting (see
    in yellow). But fails with memory setting. I am not sure what's
    happening. It's a simple operation of joining one tuple(of 1 row) with the other tuple.
    Here is what I am trying to do:

    1. grouping all SELECT HIT TIME DATA into a single tuple by doing a
    GROUP ALL.
    2. getting the min and max of that set and putting it into MIN HIT DATA.
    This is a tuple with a single row.
    3. then grouping SELECT MAX VISIT TIME DATA by visid, 4. then
    generating DUMMY_KEY for every row, along with MAX of start time.
    5. then try to join the single tuple in 2 with all tuples generated
    in 4 to get a min time and a max time

    Code:
    Shell prompt:
    ## setting heap size to 2 GB
    PIG_OPTS="$PIG_OPTS -Dmapred.child.java.opts=-Xmx2048m"
    export PIG_OPTS

    Pig/Grunt

    RAW_DATA = LOAD
    '/omniture_test_qa/cleansed_output_1/2011/01/05/wdgesp360/wdgesp360_20
    11-01-05
    *.tsv.gz' USING PigStorage('\t');
    FILTER_EXCLUDES_DATA = FILTER RAW_DATA BY $6 <= 0; SELECT_CAST_DATA =
    FOREACH FILTER_EXCLUDES_DATA GENERATE 'DUMMYKEY' AS
    DUMMY_KEY,(int)$0 AS hit_time_gmt, (long)$2 AS visid_high, (long)$3 AS
    visid_low, (chararray)$5 AS truncated_hit; SELECT_DATA = FILTER
    SELECT_CAST_DATA BY truncated_hit =='N'; --MIN AND MAX_HIT_TIME_GMT
    FOR THE FILE/SUITE SELECT_HIT_TIME_DATA = FOREACH SELECT_DATA GENERATE
    (int)hit_time_gmt; GROUPED_ALL_DATA = GROUP SELECT_HIT_TIME_DATA ALL
    PARALLEL 100; MIN_HIT_DATA = FOREACH GROUPED_ALL_DATA GENERATE
    'DUMMYKEY'AS
    DUMMY_KEY,MIN(SELECT_HIT_TIME_DATA.hit_time_gmt) AS
    MIN_HIT_TIME_GMT,MAX(SELECT_HIT_TIME_DATA.hit_time_gmt) AS
    MAX_HIT_TIME_GMT; ---MAX_VISIT_START_TIME BY VISITOR_ID
    SELECT_MAX_VISIT_TIME_DATA = FOREACH SELECT_DATA GENERATE
    visid_high,visid_low,visit_start_time_gmt;
    GROUP_BY_VISID_MAX_VISIT_TIME_DATA = GROUP SELECT_MAX_VISIT_TIME_DATA
    BY
    (visid_high,visid_low) PARALLEL 100;
    MAX_VISIT_TIME = FOREACH GROUP_BY_VISID_MAX_VISIT_TIME_DATA GENERATE
    'DUMMYKEY' AS DUMMY_KEY,FLATTEN(group.visid_high) AS
    visid_high,FLATTEN(group.visid_low) AS visid_low,
    MAX(SELECT_MAX_VISIT_TIME_DATA.visit_start_time_gmt) AS
    MAX_VISIT_START_TIME; JOINED_MAX_VISIT_TIME_DATA = COGROUP
    MAX_VISIT_TIME BY DUMMY_KEY OUTER,MIN_HIT_DATA BY DUMMY_KEY OUTER
    PARALLEL 100; MIN_MAX_VISIT_HIT_TIME = FOREACH
    JOINED_MAX_VISIT_TIME_DATA GENERATE
    FLATTEN(MAX_VISIT_TIME.visid_high),FLATTEN(MAX_VISIT_TIME.visid_low),F
    LATTEN(M
    AX_VISIT_TIME.MAX_VISIT_START_TIME),FLATTEN(MIN_HIT_DATA.MIN_HIT_TIME_
    GMT),FLA
    TTEN(MIN_HIT_DATA.MAX_HIT_TIME_GMT);
    DUMP MIN_MAX_VISIT_HIT_TIME;


    Can any one please guide me through this problem?
    Thanks
    Sri
  • Dmitriy Ryaboy at Mar 15, 2011 at 12:37 am
    If they are on 5 that means they have bigger problems. They are on Hadoop
    18.

    D
    On Mon, Mar 14, 2011 at 5:29 PM, Thejas M Nair wrote:

    Fragment-replicate join will also produce an efficient query plan for this
    use case -
    http://pig.apache.org/docs/r0.8.0/piglatin_ref1.html#Replicated+Joins . It
    is available in 0.5 as well.
    -Thejas



    On 3/14/11 3:20 PM, "Paltheru, Srikanth" wrote:

    I am using Pig 0.5 version. We don't have plans to upgrade it to a newer
    version. But the problem I have is the script runs for some files(both
    larger and smaller than the ones mentioned) but not for this particular one.
    I get "GC overhead limit" Error.
    Thanks
    Sri


    -----Original Message-----
    From: Thejas M Nair
    Sent: Monday, March 14, 2011 4:18 PM
    To: user@pig.apache.org; Paltheru, Srikanth
    Subject: Re: Problems with Join in pig

    What version of pig are you using ? There have been some memory utilization
    fixes in 0.8 . For this use case, you can also use the new scalar feature in
    0.8 -

    http://pig.apache.org/docs/r0.8.0/piglatin_ref2.html#Casting+Relations+to+Sc
    alars . That query plan will be more efficient.

    You might want to build a new version of pig from svn 0.8 branch because
    there have been some bug fixes after the release -

    svn co http://svn.apache.org/repos/asf/pig/branches/branch-0.8
    cd branch-0.8
    ant

    -Thejas

    On 3/14/11 1:40 PM, "Paltheru, Srikanth" wrote:

    The following pig script runs fine without the 2GB memory setting (see
    in yellow). But fails with memory setting. I am not sure what's
    happening. It's a simple operation of joining one tuple(of 1 row) with
    the other tuple.
    Here is what I am trying to do:

    1. grouping all SELECT HIT TIME DATA into a single tuple by doing a
    GROUP ALL.
    2. getting the min and max of that set and putting it into MIN HIT DATA.
    This is a tuple with a single row.
    3. then grouping SELECT MAX VISIT TIME DATA by visid, 4. then
    generating DUMMY_KEY for every row, along with MAX of start time.
    5. then try to join the single tuple in 2 with all tuples generated
    in 4 to get a min time and a max time

    Code:
    Shell prompt:
    ## setting heap size to 2 GB
    PIG_OPTS="$PIG_OPTS -Dmapred.child.java.opts=-Xmx2048m"
    export PIG_OPTS

    Pig/Grunt

    RAW_DATA = LOAD
    '/omniture_test_qa/cleansed_output_1/2011/01/05/wdgesp360/wdgesp360_20
    11-01-05
    *.tsv.gz' USING PigStorage('\t');
    FILTER_EXCLUDES_DATA = FILTER RAW_DATA BY $6 <= 0; SELECT_CAST_DATA =
    FOREACH FILTER_EXCLUDES_DATA GENERATE 'DUMMYKEY' AS
    DUMMY_KEY,(int)$0 AS hit_time_gmt, (long)$2 AS visid_high, (long)$3 AS
    visid_low, (chararray)$5 AS truncated_hit; SELECT_DATA = FILTER
    SELECT_CAST_DATA BY truncated_hit =='N'; --MIN AND MAX_HIT_TIME_GMT
    FOR THE FILE/SUITE SELECT_HIT_TIME_DATA = FOREACH SELECT_DATA GENERATE
    (int)hit_time_gmt; GROUPED_ALL_DATA = GROUP SELECT_HIT_TIME_DATA ALL
    PARALLEL 100; MIN_HIT_DATA = FOREACH GROUPED_ALL_DATA GENERATE
    'DUMMYKEY'AS
    DUMMY_KEY,MIN(SELECT_HIT_TIME_DATA.hit_time_gmt) AS
    MIN_HIT_TIME_GMT,MAX(SELECT_HIT_TIME_DATA.hit_time_gmt) AS
    MAX_HIT_TIME_GMT; ---MAX_VISIT_START_TIME BY VISITOR_ID
    SELECT_MAX_VISIT_TIME_DATA = FOREACH SELECT_DATA GENERATE
    visid_high,visid_low,visit_start_time_gmt;
    GROUP_BY_VISID_MAX_VISIT_TIME_DATA = GROUP SELECT_MAX_VISIT_TIME_DATA
    BY
    (visid_high,visid_low) PARALLEL 100;
    MAX_VISIT_TIME = FOREACH GROUP_BY_VISID_MAX_VISIT_TIME_DATA GENERATE
    'DUMMYKEY' AS DUMMY_KEY,FLATTEN(group.visid_high) AS
    visid_high,FLATTEN(group.visid_low) AS visid_low,
    MAX(SELECT_MAX_VISIT_TIME_DATA.visit_start_time_gmt) AS
    MAX_VISIT_START_TIME; JOINED_MAX_VISIT_TIME_DATA = COGROUP
    MAX_VISIT_TIME BY DUMMY_KEY OUTER,MIN_HIT_DATA BY DUMMY_KEY OUTER
    PARALLEL 100; MIN_MAX_VISIT_HIT_TIME = FOREACH
    JOINED_MAX_VISIT_TIME_DATA GENERATE
    FLATTEN(MAX_VISIT_TIME.visid_high),FLATTEN(MAX_VISIT_TIME.visid_low),F
    LATTEN(M
    AX_VISIT_TIME.MAX_VISIT_START_TIME),FLATTEN(MIN_HIT_DATA.MIN_HIT_TIME_
    GMT),FLA
    TTEN(MIN_HIT_DATA.MAX_HIT_TIME_GMT);
    DUMP MIN_MAX_VISIT_HIT_TIME;


    Can any one please guide me through this problem?
    Thanks
    Sri


  • Dmitriy Ryaboy at Mar 15, 2011 at 12:39 am
    Uh no I am wrong. They are on 20, 18 was 0.4

    Yea Srikanth you guys should just upgrade. 0.5 to 0.6 is relatively
    painless. The jump to 0.7-0.8 is harder, but worth it.

    D
    On Mon, Mar 14, 2011 at 5:37 PM, Dmitriy Ryaboy wrote:

    If they are on 5 that means they have bigger problems. They are on Hadoop
    18.

    D

    On Mon, Mar 14, 2011 at 5:29 PM, Thejas M Nair wrote:

    Fragment-replicate join will also produce an efficient query plan for this
    use case -
    http://pig.apache.org/docs/r0.8.0/piglatin_ref1.html#Replicated+Joins .
    It is available in 0.5 as well.
    -Thejas



    On 3/14/11 3:20 PM, "Paltheru, Srikanth" <Srikanth.Paltheru@disney.com>
    wrote:

    I am using Pig 0.5 version. We don't have plans to upgrade it to a newer
    version. But the problem I have is the script runs for some files(both
    larger and smaller than the ones mentioned) but not for this particular one.
    I get "GC overhead limit" Error.
    Thanks
    Sri


    -----Original Message-----
    From: Thejas M Nair
    Sent: Monday, March 14, 2011 4:18 PM
    To: user@pig.apache.org; Paltheru, Srikanth
    Subject: Re: Problems with Join in pig

    What version of pig are you using ? There have been some memory
    utilization fixes in 0.8 . For this use case, you can also use the new
    scalar feature in
    0.8 -

    http://pig.apache.org/docs/r0.8.0/piglatin_ref2.html#Casting+Relations+to+Sc
    alars . That query plan will be more efficient.

    You might want to build a new version of pig from svn 0.8 branch because
    there have been some bug fixes after the release -

    svn co http://svn.apache.org/repos/asf/pig/branches/branch-0.8
    cd branch-0.8
    ant

    -Thejas


    On 3/14/11 1:40 PM, "Paltheru, Srikanth" <Srikanth.Paltheru@disney.com>
    wrote:
    The following pig script runs fine without the 2GB memory setting (see
    in yellow). But fails with memory setting. I am not sure what's
    happening. It's a simple operation of joining one tuple(of 1 row) with
    the other tuple.
    Here is what I am trying to do:

    1. grouping all SELECT HIT TIME DATA into a single tuple by doing a
    GROUP ALL.
    2. getting the min and max of that set and putting it into MIN HIT DATA.
    This is a tuple with a single row.
    3. then grouping SELECT MAX VISIT TIME DATA by visid, 4. then
    generating DUMMY_KEY for every row, along with MAX of start time.
    5. then try to join the single tuple in 2 with all tuples generated
    in 4 to get a min time and a max time

    Code:
    Shell prompt:
    ## setting heap size to 2 GB
    PIG_OPTS="$PIG_OPTS -Dmapred.child.java.opts=-Xmx2048m"
    export PIG_OPTS

    Pig/Grunt

    RAW_DATA = LOAD
    '/omniture_test_qa/cleansed_output_1/2011/01/05/wdgesp360/wdgesp360_20
    11-01-05
    *.tsv.gz' USING PigStorage('\t');
    FILTER_EXCLUDES_DATA = FILTER RAW_DATA BY $6 <= 0; SELECT_CAST_DATA =
    FOREACH FILTER_EXCLUDES_DATA GENERATE 'DUMMYKEY' AS
    DUMMY_KEY,(int)$0 AS hit_time_gmt, (long)$2 AS visid_high, (long)$3 AS
    visid_low, (chararray)$5 AS truncated_hit; SELECT_DATA = FILTER
    SELECT_CAST_DATA BY truncated_hit =='N'; --MIN AND MAX_HIT_TIME_GMT
    FOR THE FILE/SUITE SELECT_HIT_TIME_DATA = FOREACH SELECT_DATA GENERATE
    (int)hit_time_gmt; GROUPED_ALL_DATA = GROUP SELECT_HIT_TIME_DATA ALL
    PARALLEL 100; MIN_HIT_DATA = FOREACH GROUPED_ALL_DATA GENERATE
    'DUMMYKEY'AS
    DUMMY_KEY,MIN(SELECT_HIT_TIME_DATA.hit_time_gmt) AS
    MIN_HIT_TIME_GMT,MAX(SELECT_HIT_TIME_DATA.hit_time_gmt) AS
    MAX_HIT_TIME_GMT; ---MAX_VISIT_START_TIME BY VISITOR_ID
    SELECT_MAX_VISIT_TIME_DATA = FOREACH SELECT_DATA GENERATE
    visid_high,visid_low,visit_start_time_gmt;
    GROUP_BY_VISID_MAX_VISIT_TIME_DATA = GROUP SELECT_MAX_VISIT_TIME_DATA
    BY
    (visid_high,visid_low) PARALLEL 100;
    MAX_VISIT_TIME = FOREACH GROUP_BY_VISID_MAX_VISIT_TIME_DATA GENERATE
    'DUMMYKEY' AS DUMMY_KEY,FLATTEN(group.visid_high) AS
    visid_high,FLATTEN(group.visid_low) AS visid_low,
    MAX(SELECT_MAX_VISIT_TIME_DATA.visit_start_time_gmt) AS
    MAX_VISIT_START_TIME; JOINED_MAX_VISIT_TIME_DATA = COGROUP
    MAX_VISIT_TIME BY DUMMY_KEY OUTER,MIN_HIT_DATA BY DUMMY_KEY OUTER
    PARALLEL 100; MIN_MAX_VISIT_HIT_TIME = FOREACH
    JOINED_MAX_VISIT_TIME_DATA GENERATE
    FLATTEN(MAX_VISIT_TIME.visid_high),FLATTEN(MAX_VISIT_TIME.visid_low),F
    LATTEN(M
    AX_VISIT_TIME.MAX_VISIT_START_TIME),FLATTEN(MIN_HIT_DATA.MIN_HIT_TIME_
    GMT),FLA
    TTEN(MIN_HIT_DATA.MAX_HIT_TIME_GMT);
    DUMP MIN_MAX_VISIT_HIT_TIME;


    Can any one please guide me through this problem?
    Thanks
    Sri


  • Paltheru, Srikanth at Mar 15, 2011 at 12:48 am
    I tried using replicated-join in pig 0.5 it does not work. The feature I am trying to use is supported in 0.5 version as well. It just works for some datasets and doesn't for others.

    From: Dmitriy Ryaboy
    Sent: Monday, March 14, 2011 5:39 PM
    To: user@pig.apache.org
    Cc: Thejas M Nair; Paltheru, Srikanth
    Subject: Re: Problems with Join in pig

    Uh no I am wrong. They are on 20, 18 was 0.4

    Yea Srikanth you guys should just upgrade. 0.5 to 0.6 is relatively painless. The jump to 0.7-0.8 is harder, but worth it.

    D
    On Mon, Mar 14, 2011 at 5:37 PM, Dmitriy Ryaboy wrote:
    If they are on 5 that means they have bigger problems. They are on Hadoop 18.

    D

    On Mon, Mar 14, 2011 at 5:29 PM, Thejas M Nair wrote:
    Fragment-replicate join will also produce an efficient query plan for this use case - http://pig.apache.org/docs/r0.8.0/piglatin_ref1.html#Replicated+Joins . It is available in 0.5 as well.
    -Thejas



    On 3/14/11 3:20 PM, "Paltheru, Srikanth" wrote:

    I am using Pig 0.5 version. We don't have plans to upgrade it to a newer version. But the problem I have is the script runs for some files(both larger and smaller than the ones mentioned) but not for this particular one. I get "GC overhead limit" Error.
    Thanks
    Sri


    -----Original Message-----
    From: Thejas M Nair
    Sent: Monday, March 14, 2011 4:18 PM
    To: user@pig.apache.org ; Paltheru, Srikanth
    Subject: Re: Problems with Join in pig

    What version of pig are you using ? There have been some memory utilization fixes in 0.8 . For this use case, you can also use the new scalar feature in
    0.8 -
    http://pig.apache.org/docs/r0.8.0/piglatin_ref2.html#Casting+Relations+to+Sc
    alars . That query plan will be more efficient.

    You might want to build a new version of pig from svn 0.8 branch because there have been some bug fixes after the release -

    svn co http://svn.apache.org/repos/asf/pig/branches/branch-0.8
    cd branch-0.8
    ant

    -Thejas

    On 3/14/11 1:40 PM, "Paltheru, Srikanth" wrote:

    The following pig script runs fine without the 2GB memory setting (see
    in yellow). But fails with memory setting. I am not sure what's
    happening. It's a simple operation of joining one tuple(of 1 row) with the other tuple.
    Here is what I am trying to do:

    1. grouping all SELECT HIT TIME DATA into a single tuple by doing a
    GROUP ALL.
    2. getting the min and max of that set and putting it into MIN HIT DATA.
    This is a tuple with a single row.
    3. then grouping SELECT MAX VISIT TIME DATA by visid, 4. then
    generating DUMMY_KEY for every row, along with MAX of start time.
    5. then try to join the single tuple in 2 with all tuples generated
    in 4 to get a min time and a max time

    Code:
    Shell prompt:
    ## setting heap size to 2 GB
    PIG_OPTS="$PIG_OPTS -Dmapred.child.java.opts=-Xmx2048m"
    export PIG_OPTS

    Pig/Grunt

    RAW_DATA = LOAD
    '/omniture_test_qa/cleansed_output_1/2011/01/05/wdgesp360/wdgesp360_20
    11-01-05
    *.tsv.gz' USING PigStorage('\t');
    FILTER_EXCLUDES_DATA = FILTER RAW_DATA BY $6 <= 0; SELECT_CAST_DATA =
    FOREACH FILTER_EXCLUDES_DATA GENERATE 'DUMMYKEY' AS
    DUMMY_KEY,(int)$0 AS hit_time_gmt, (long)$2 AS visid_high, (long)$3 AS
    visid_low, (chararray)$5 AS truncated_hit; SELECT_DATA = FILTER
    SELECT_CAST_DATA BY truncated_hit =='N'; --MIN AND MAX_HIT_TIME_GMT
    FOR THE FILE/SUITE SELECT_HIT_TIME_DATA = FOREACH SELECT_DATA GENERATE
    (int)hit_time_gmt; GROUPED_ALL_DATA = GROUP SELECT_HIT_TIME_DATA ALL
    PARALLEL 100; MIN_HIT_DATA = FOREACH GROUPED_ALL_DATA GENERATE
    'DUMMYKEY'AS
    DUMMY_KEY,MIN(SELECT_HIT_TIME_DATA.hit_time_gmt) AS
    MIN_HIT_TIME_GMT,MAX(SELECT_HIT_TIME_DATA.hit_time_gmt) AS
    MAX_HIT_TIME_GMT; ---MAX_VISIT_START_TIME BY VISITOR_ID
    SELECT_MAX_VISIT_TIME_DATA = FOREACH SELECT_DATA GENERATE
    visid_high,visid_low,visit_start_time_gmt;
    GROUP_BY_VISID_MAX_VISIT_TIME_DATA = GROUP SELECT_MAX_VISIT_TIME_DATA
    BY
    (visid_high,visid_low) PARALLEL 100;
    MAX_VISIT_TIME = FOREACH GROUP_BY_VISID_MAX_VISIT_TIME_DATA GENERATE
    'DUMMYKEY' AS DUMMY_KEY,FLATTEN(group.visid_high) AS
    visid_high,FLATTEN(group.visid_low) AS visid_low,
    MAX(SELECT_MAX_VISIT_TIME_DATA.visit_start_time_gmt) AS
    MAX_VISIT_START_TIME; JOINED_MAX_VISIT_TIME_DATA = COGROUP
    MAX_VISIT_TIME BY DUMMY_KEY OUTER,MIN_HIT_DATA BY DUMMY_KEY OUTER
    PARALLEL 100; MIN_MAX_VISIT_HIT_TIME = FOREACH
    JOINED_MAX_VISIT_TIME_DATA GENERATE
    FLATTEN(MAX_VISIT_TIME.visid_high),FLATTEN(MAX_VISIT_TIME.visid_low),F
    LATTEN(M
    AX_VISIT_TIME.MAX_VISIT_START_TIME),FLATTEN(MIN_HIT_DATA.MIN_HIT_TIME_
    GMT),FLA
    TTEN(MIN_HIT_DATA.MAX_HIT_TIME_GMT);
    DUMP MIN_MAX_VISIT_HIT_TIME;


    Can any one please guide me through this problem?
    Thanks
    Sri
  • Thejas M Nair at Mar 15, 2011 at 12:55 am
    Replicated-join will only work if the right most relation in join is small enough to fit in available memory, so it will not work with all data sets. But in this case you have one relation which has only one record, that should fit into memory.

    The cogroup in your query might be running into some memory issue which might have been fixed in recent versions of pig.

    -Thejas



    On 3/14/11 4:41 PM, "Paltheru, Srikanth" wrote:

    I tried using replicated-join in pig 0.5 it does not work. The feature I am trying to use is supported in 0.5 version as well. It just works for some datasets and doesn't for others.

    From: Dmitriy Ryaboy
    Sent: Monday, March 14, 2011 5:39 PM
    To: user@pig.apache.org
    Cc: Thejas M Nair; Paltheru, Srikanth
    Subject: Re: Problems with Join in pig

    Uh no I am wrong. They are on 20, 18 was 0.4



    Yea Srikanth you guys should just upgrade. 0.5 to 0.6 is relatively painless. The jump to 0.7-0.8 is harder, but worth it.



    D

    On Mon, Mar 14, 2011 at 5:37 PM, Dmitriy Ryaboy wrote:
    If they are on 5 that means they have bigger problems. They are on Hadoop 18.



    D



    On Mon, Mar 14, 2011 at 5:29 PM, Thejas M Nair wrote:
    Fragment-replicate join will also produce an efficient query plan for this use case - http://pig.apache.org/docs/r0.8.0/piglatin_ref1.html#Replicated+Joins <http://pig.apache.org/docs/r0.8.0/piglatin_ref1.html#Replicated&#43;Joins><http://pig.apache.org/docs/r0.8.0/piglatin_ref1.html#Replicated+Joins> . It is available in 0.5 as well.
    -Thejas




    On 3/14/11 3:20 PM, "Paltheru, Srikanth" wrote:

    I am using Pig 0.5 version. We don't have plans to upgrade it to a newer version. But the problem I have is the script runs for some files(both larger and smaller than the ones mentioned) but not for this particular one. I get "GC overhead limit" Error.
    Thanks
    Sri


    -----Original Message-----
    From: Thejas M Nair
    Sent: Monday, March 14, 2011 4:18 PM
    To: user@pig.apache.org; Paltheru, Srikanth
    Subject: Re: Problems with Join in pig

    What version of pig are you using ? There have been some memory utilization fixes in 0.8 . For this use case, you can also use the new scalar feature in
    0.8 -
    http://pig.apache.org/docs/r0.8.0/piglatin_ref2.html#Casting+Relations+to+Sc <http://pig.apache.org/docs/r0.8.0/piglatin_ref2.html#Casting&#43;Relations&#43;to&#43;Sc><http://pig.apache.org/docs/r0.8.0/piglatin_ref2.html#Casting+Relations+to+Sc>
    alars . That query plan will be more efficient.

    You might want to build a new version of pig from svn 0.8 branch because there have been some bug fixes after the release -

    svn co http://svn.apache.org/repos/asf/pig/branches/branch-0.8
    cd branch-0.8
    ant

    -Thejas

    On 3/14/11 1:40 PM, "Paltheru, Srikanth" wrote:

    The following pig script runs fine without the 2GB memory setting (see
    in yellow). But fails with memory setting. I am not sure what's
    happening. It's a simple operation of joining one tuple(of 1 row) with the other tuple.
    Here is what I am trying to do:

    1. grouping all SELECT HIT TIME DATA into a single tuple by doing a
    GROUP ALL.
    2. getting the min and max of that set and putting it into MIN HIT DATA.
    This is a tuple with a single row.
    3. then grouping SELECT MAX VISIT TIME DATA by visid, 4. then
    generating DUMMY_KEY for every row, along with MAX of start time.
    5. then try to join the single tuple in 2 with all tuples generated
    in 4 to get a min time and a max time

    Code:
    Shell prompt:
    ## setting heap size to 2 GB
    PIG_OPTS="$PIG_OPTS -Dmapred.child.java.opts=-Xmx2048m"
    export PIG_OPTS

    Pig/Grunt

    RAW_DATA = LOAD
    '/omniture_test_qa/cleansed_output_1/2011/01/05/wdgesp360/wdgesp360_20
    11-01-05
    *.tsv.gz' USING PigStorage('\t');
    FILTER_EXCLUDES_DATA = FILTER RAW_DATA BY $6 <= 0; SELECT_CAST_DATA =
    FOREACH FILTER_EXCLUDES_DATA GENERATE 'DUMMYKEY' AS
    DUMMY_KEY,(int)$0 AS hit_time_gmt, (long)$2 AS visid_high, (long)$3 AS
    visid_low, (chararray)$5 AS truncated_hit; SELECT_DATA = FILTER
    SELECT_CAST_DATA BY truncated_hit =='N'; --MIN AND MAX_HIT_TIME_GMT
    FOR THE FILE/SUITE SELECT_HIT_TIME_DATA = FOREACH SELECT_DATA GENERATE
    (int)hit_time_gmt; GROUPED_ALL_DATA = GROUP SELECT_HIT_TIME_DATA ALL
    PARALLEL 100; MIN_HIT_DATA = FOREACH GROUPED_ALL_DATA GENERATE
    'DUMMYKEY'AS
    DUMMY_KEY,MIN(SELECT_HIT_TIME_DATA.hit_time_gmt) AS
    MIN_HIT_TIME_GMT,MAX(SELECT_HIT_TIME_DATA.hit_time_gmt) AS
    MAX_HIT_TIME_GMT; ---MAX_VISIT_START_TIME BY VISITOR_ID
    SELECT_MAX_VISIT_TIME_DATA = FOREACH SELECT_DATA GENERATE
    visid_high,visid_low,visit_start_time_gmt;
    GROUP_BY_VISID_MAX_VISIT_TIME_DATA = GROUP SELECT_MAX_VISIT_TIME_DATA
    BY
    (visid_high,visid_low) PARALLEL 100;
    MAX_VISIT_TIME = FOREACH GROUP_BY_VISID_MAX_VISIT_TIME_DATA GENERATE
    'DUMMYKEY' AS DUMMY_KEY,FLATTEN(group.visid_high) AS
    visid_high,FLATTEN(group.visid_low) AS visid_low,
    MAX(SELECT_MAX_VISIT_TIME_DATA.visit_start_time_gmt) AS
    MAX_VISIT_START_TIME; JOINED_MAX_VISIT_TIME_DATA = COGROUP
    MAX_VISIT_TIME BY DUMMY_KEY OUTER,MIN_HIT_DATA BY DUMMY_KEY OUTER
    PARALLEL 100; MIN_MAX_VISIT_HIT_TIME = FOREACH
    JOINED_MAX_VISIT_TIME_DATA GENERATE
    FLATTEN(MAX_VISIT_TIME.visid_high),FLATTEN(MAX_VISIT_TIME.visid_low),F
    LATTEN(M
    AX_VISIT_TIME.MAX_VISIT_START_TIME),FLATTEN(MIN_HIT_DATA.MIN_HIT_TIME_
    GMT),FLA
    TTEN(MIN_HIT_DATA.MAX_HIT_TIME_GMT);
    DUMP MIN_MAX_VISIT_HIT_TIME;


    Can any one please guide me through this problem?
    Thanks
    Sri

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupuser @
categoriespig, hadoop
postedMar 14, '11 at 9:50p
activeMar 15, '11 at 12:55a
posts9
users4
websitepig.apache.org

People

Translate

site design / logo © 2021 Grokbase