We should be able to control this (specify exact mapper count) once hadoop-4565 and hive-74 are resolved (these are being worked on actively).
From: Zheng Shao
Sent: Sunday, January 11, 2009 9:16 PM
Subject: Re: Number of Mappers
Currently the only way to do it is to use a reducer.
SELECT TRANSFORM(actor_id) USING '/my/script' AS (actor_id, percentile, count) FROM (SELECT actor_id FROM activities CLUSTER BY actor_id) a;
On Sun, Jan 11, 2009 at 8:45 PM, Josh Ferguson wrote:
If I'm running a query like this:
hive> SELECT TRANSFORM(actor_id) USING '/my/script' AS (actor_id, percentile, count) FROM activities;
It creates a map job for each file. I need every row that is in the table to be run through a single instance of the script since certain parts require global list information. Do I need to rework this query to use a reducer or can I change some configuration variable to load in all of my data from this table and run it through /my/script all at once?