Grokbase Groups Pig user October 2010
FAQ
Assume that I would like to write this pig script:

REGISTER myudfs.jar;
A = LOAD 'hist_data' AS (id: chararray, word: chararray, count : float );
B = GROUP A BY id
C = CROSS B, B
D = FOREACH C GENERATE $0, $2, myudfs.HIST($1,$3);
F = ORDER D BY DESC $2
DUMP C;

I take (id, histogram) pairs and I would like to perform a all-to-all comparison

The cross operation is an overkill because my measure myudfs.HIST($1,$3) is symmetric thus ( could cut by half the comparisons), but it will do.

My Real Question is :
Where I can find a template for the description of this myudfs.HIST($1,$3) ?

For example:


package myudfs;
import java.io.IOException;
import org.apache.pig.EvalFunc;
import org.apache.pig.data.Tuple;
import org.apache.pig.impl.util.WrappedIOException;
import Anomaly;

public class HIST extends EvalFunc (?) implements Algebraic
{

public String getInitial() {return Initial.class.getName();}
public String getIntermed() {return Intermed.class.getName();}
public String getFinal() {return Final.class.getName();}
static public class Initial extends EvalFunc (Tuple) {
public Tuple exec(Tuple input) throws IOException {return TupleFactory.getInstance().newTuple(count(input));}
}
static public class Intermed extends EvalFunc (Tuple) {
public Tuple exec(Tuple input) throws IOException {return TupleFactory.getInstance().newTuple(sum(input));}
}
static public class Final extends EvalFunc (Long) {
public Tuple exec(Tuple input) throws IOException {return sum(input);}
}

public Float exec(?) throws IOException {
if (input == null || input.size() == 0)
return null;
try{
String[] words1 = new String[]; // need to retrieve the words from $1 code above
double[] counts1 = new double[]; // need to retrieve the counts from $1 from above
String[] words2 = new String[]; // from $3
double[] counts2 = new double[];

return Anomaly.dist(words1, counts1,words2,count2);
}catch(Exception e){
throw WrappedIOException.wrap("Caught exception processing input row ", e);
}
}
}

Search Discussions

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupuser @
categoriespig, hadoop
postedOct 22, '10 at 9:49p
activeOct 22, '10 at 9:49p
posts1
users1
websitepig.apache.org

1 user in discussion

Paolo D'alberto: 1 post

People

Translate

site design / logo © 2021 Grokbase