Grokbase Groups Pig user April 2011
FAQ
Hi,

First, I group 2 tables using a key (named sid):

rich_sessions = GROUP sessions BY sid, activities BY sid;

After this operation, all the tuples in the bag "activities" start
with the same "sid" field.
This field is long (64 bytes) and I would like to remove it from all
activity tuples in order to save space before storing this
rich_sessions in a file.

Is there any way to do this ?

Thank for your help,

Search Discussions

  • Sven Krasser at Apr 20, 2011 at 4:12 pm
    Sounds like the "Nested Projection" example in
    http://pig.apache.org/docs/r0.8.0/piglatin_ref2.html#FOREACH is what you're
    looking for.
    -Sven
    On Wed, Apr 20, 2011 at 7:07 AM, Vincent Barat wrote:

    Hi,

    First, I group 2 tables using a key (named sid):

    rich_sessions = GROUP sessions BY sid, activities BY sid;

    After this operation, all the tuples in the bag "activities" start with the
    same "sid" field.
    This field is long (64 bytes) and I would like to remove it from all
    activity tuples in order to save space before storing this rich_sessions in
    a file.

    Is there any way to do this ?

    Thank for your help,
  • Vincent Barat at Apr 20, 2011 at 7:56 pm
    I will try this, it seems to be what I was looking for.
    Thanks !

    Le 20/04/11 18:12, Sven Krasser a écrit :
    Sounds like the "Nested Projection" example in
    http://pig.apache.org/docs/r0.8.0/piglatin_ref2.html#FOREACH is what you're
    looking for.
    -Sven

    On Wed, Apr 20, 2011 at 7:07 AM, Vincent Baratwrote:
    Hi,

    First, I group 2 tables using a key (named sid):

    rich_sessions = GROUP sessions BY sid, activities BY sid;

    After this operation, all the tuples in the bag "activities" start with the
    same "sid" field.
    This field is long (64 bytes) and I would like to remove it from all
    activity tuples in order to save space before storing this rich_sessions in
    a file.

    Is there any way to do this ?

    Thank for your help,
    --

    *Vincent BARAT, UBIKOD, CTO*


    vbarat@ubikod.com Mob +33 (0)6 15 41 15 18

    UBIKOD Paris, c/o ESSEC VENTURES, Avenue Bernard Hirsch, 95021
    Cergy-Pontoise cedex, FRANCE, Tel +33 (0)1 34 43 28 89

    UBIKOD Rennes, 10 rue Duhamel, 35000 Rennes, FRANCE, Tel. +33 (0)2
    99 65 69 13


    www.ubikod.com <http://www.ubikod.com/>@ubikod
    <http://twitter.com/ubikod>

    www.capptain.com <http://www.capptain.com/>@capptain_hq
    <http://twitter.com/capptain_hq>


    IMPORTANT NOTICE -- UBIKOD and CAPPTAIN are registered trademarks of
    UBIKOD S.A.R.L., all copyrights are reserved. The contents of this
    email and attachments are confidential and may be subject to legal
    privilege and/or protected by copyright. Copying or communicating
    any part of it to others is prohibited and may be unlawful. If you
    are not the intended recipient you must not use, copy, distribute or
    rely on this email and should please return it immediately or notify
    us by telephone. At present the integrity of email across the
    Internet cannot be guaranteed. Therefore UBIKOD S.A.R.L. will not
    accept liability for any claims arising as a result of the use of
    this medium for transmissions by or to UBIKOD S.A.R.L.. UBIKOD
    S.A.R.L. may exercise any of its rights under relevant law, to
    monitor the content of all electronic communications. You should
    therefore be aware that this communication and any responses might
    have been monitored, and may be accessed by UBIKOD S.A.R.L. The
    views expressed in this document are that of the individual and may
    not necessarily constitute or imply its endorsement or
    recommendation by UBIKOD S.A.R.L. The content of this electronic
    mail may be subject to the confidentiality terms of a
    "Non-Disclosure Agreement" (NDA).
  • Vincent Barat at Apr 21, 2011 at 7:23 am
    It works ! Thanks a lot.

    Le 20/04/11 18:12, Sven Krasser a écrit :
    Sounds like the "Nested Projection" example in
    http://pig.apache.org/docs/r0.8.0/piglatin_ref2.html#FOREACH is what you're
    looking for.
    -Sven

    On Wed, Apr 20, 2011 at 7:07 AM, Vincent Baratwrote:
    Hi,

    First, I group 2 tables using a key (named sid):

    rich_sessions = GROUP sessions BY sid, activities BY sid;

    After this operation, all the tuples in the bag "activities" start with the
    same "sid" field.
    This field is long (64 bytes) and I would like to remove it from all
    activity tuples in order to save space before storing this rich_sessions in
    a file.

    Is there any way to do this ?

    Thank for your help,
    --

    *Vincent BARAT, UBIKOD, CTO*


    vbarat@ubikod.com Mob +33 (0)6 15 41 15 18

    UBIKOD Paris, c/o ESSEC VENTURES, Avenue Bernard Hirsch, 95021
    Cergy-Pontoise cedex, FRANCE, Tel +33 (0)1 34 43 28 89

    UBIKOD Rennes, 10 rue Duhamel, 35000 Rennes, FRANCE, Tel. +33 (0)2
    99 65 69 13


    www.ubikod.com <http://www.ubikod.com/>@ubikod
    <http://twitter.com/ubikod>

    www.capptain.com <http://www.capptain.com/>@capptain_hq
    <http://twitter.com/capptain_hq>


    IMPORTANT NOTICE -- UBIKOD and CAPPTAIN are registered trademarks of
    UBIKOD S.A.R.L., all copyrights are reserved. The contents of this
    email and attachments are confidential and may be subject to legal
    privilege and/or protected by copyright. Copying or communicating
    any part of it to others is prohibited and may be unlawful. If you
    are not the intended recipient you must not use, copy, distribute or
    rely on this email and should please return it immediately or notify
    us by telephone. At present the integrity of email across the
    Internet cannot be guaranteed. Therefore UBIKOD S.A.R.L. will not
    accept liability for any claims arising as a result of the use of
    this medium for transmissions by or to UBIKOD S.A.R.L.. UBIKOD
    S.A.R.L. may exercise any of its rights under relevant law, to
    monitor the content of all electronic communications. You should
    therefore be aware that this communication and any responses might
    have been monitored, and may be accessed by UBIKOD S.A.R.L. The
    views expressed in this document are that of the individual and may
    not necessarily constitute or imply its endorsement or
    recommendation by UBIKOD S.A.R.L. The content of this electronic
    mail may be subject to the confidentiality terms of a
    "Non-Disclosure Agreement" (NDA).
  • Sumit ghosh at Apr 21, 2011 at 3:54 am
    Hi,

    Did you get a chance to look into the PiggyBank String functions?

    http://pig.apache.org/docs/r0.7.0/api/org/apache/pig/piggybank/evaluation/string/package-summary.html

    I guess you need to use the substring function.

    REGISTER <path-to-piggybank>/piggybank.jar;
    DEFINE StrSub org.apache.pig.piggybank.evaluation.string.SUBSTRING();

    ... now you can use the SUBSTRING function as StrSub.
    B = ForEach A generate StrSub(sid,1,64);

    Hope it Helps.
    Sumit



    ________________________________
    From: Vincent Barat <vincent.barat@gmail.com>
    To: "pig-user@hadoop.apache.org" <pig-user@hadoop.apache.org>
    Sent: Wed, 20 April, 2011 7:37:03 PM
    Subject: How to remove the field key from bags tuples after a GROUP ?

    Hi,

    First, I group 2 tables using a key (named sid):

    rich_sessions = GROUP sessions BY sid, activities BY sid;

    After this operation, all the tuples in the bag "activities" start with the same
    "sid" field.
    This field is long (64 bytes) and I would like to remove it from all activity
    tuples in order to save space before storing this rich_sessions in a file.

    Is there any way to do this ?

    Thank for your help,

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupuser @
categoriespig, hadoop
postedApr 20, '11 at 4:05p
activeApr 21, '11 at 7:23a
posts5
users4
websitepig.apache.org

People

Translate

site design / logo © 2021 Grokbase