FAQ
Hi

I am doing an inner join on two relations say A, B.

A has fields - Word1:chararray, Word2:chararray, Word3:chararray,
Metric1:long, Metric2:long
B has fields - UniqueWord1:chararray, UniqueID:long

Facts about the relations:

- Relation B has no duplicates, no NULLs in either fields.
- Relation A has 840K records
- Relation B has 340 records


Join statement:
join_A_B = JOIN A BY Word1, B BY UniqueWord1;

I expected the join to have <= 840K records. However the join returns 860K
records. While I debug, I just thought of asking here. Any thoughts on what
could be wrong?

Thanks much.

Arun

Search Discussions

  • Xiaomeng Wan at Feb 7, 2011 at 5:17 pm
    Hi Arun,

    When you say "Relation B has no duplicates", do you mean no duplicate
    (UniqueWord1, UniqueID) pair? or no duplicate UniqueWord1? Because you
    are joining on UniqueWord1 only, if these are duplicates (in other
    words, many UniqueIDs can have the same UniqueWord1), you should get
    more than 840k records.

    Shawn
    On Sat, Feb 5, 2011 at 3:11 PM, Arun A K wrote:
    Hi

    I am doing an inner join on two relations say A, B.

    A has fields - Word1:chararray, Word2:chararray, Word3:chararray,
    Metric1:long, Metric2:long
    B has fields - UniqueWord1:chararray, UniqueID:long

    Facts about the relations:

    - Relation B has no duplicates, no NULLs in either fields.
    - Relation A has 840K records
    - Relation B has 340 records


    Join statement:
    join_A_B = JOIN A BY Word1, B BY UniqueWord1;

    I expected the join to have <= 840K records. However the join returns 860K
    records. While I debug, I just thought of asking here. Any thoughts on what
    could be wrong?

    Thanks much.

    Arun

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupuser @
categoriespig, hadoop
postedFeb 5, '11 at 10:12p
activeFeb 7, '11 at 5:17p
posts2
users2
websitepig.apache.org

2 users in discussion

Arun A K: 1 post Xiaomeng Wan: 1 post

People

Translate

site design / logo © 2021 Grokbase