Wait - Habermaas like in Critical Theory????
From: Habermaas, William
Sent: Monday, June 27, 2011 2:55 PM
Subject: RE: How to select random n records using mapreduce ?
I did something similar. Basically I had a random sampling algorithm
that I called from the mapper. If it returned true I would collect the
data, otherwise I would discard it.
From: firstname.lastname@example.org On Behalf Of Niels Basjes
Sent: Monday, June 27, 2011 3:29 PM
Subject: Re: How to select random n records using mapreduce ?
The only solution I can think of is by creating a counter in Hadoop
that is incremented each time a mapper lets a record through.
As soon as the value reaches a preselected value the mappers simply
discard the additional input they receive.
Note that this will not at all be random.... yet it's the best I can
come up with right now.
On Mon, Jun 27, 2011 at 09:11, Jeff Zhang wrote:
I'd like to select random N records from a large amount of data using
hadoop, just wonder how can I archive this ? Currently my idea is that let
each mapper task select N / mapper_number records. Does anyone has such
Best regards / Met vriendelijke groeten,