FAQ
If I'm running a query like this:

hive> SELECT TRANSFORM(actor_id) USING '/my/script' AS (actor_id,
percentile, count) FROM activities;

It creates a map job for each file. I need every row that is in the
table to be run through a single instance of the script since certain
parts require global list information. Do I need to rework this query
to use a reducer or can I change some configuration variable to load
in all of my data from this table and run it through /my/script all at
once?

Josh F.

Search Discussions

  • Zheng Shao at Jan 12, 2009 at 5:16 am
    Currently the only way to do it is to use a reducer.

    set mapred.reduce.tasks=1;
    SELECT TRANSFORM(actor_id) USING '/my/script' AS (actor_id, percentile,
    count) FROM (SELECT actor_id FROM activities CLUSTER BY actor_id) a;
    On Sun, Jan 11, 2009 at 8:45 PM, Josh Ferguson wrote:

    If I'm running a query like this:

    hive> SELECT TRANSFORM(actor_id) USING '/my/script' AS (actor_id,
    percentile, count) FROM activities;

    It creates a map job for each file. I need every row that is in the table
    to be run through a single instance of the script since certain parts
    require global list information. Do I need to rework this query to use a
    reducer or can I change some configuration variable to load in all of my
    data from this table and run it through /my/script all at once?

    Josh F.


    --
    Yours,
    Zheng
  • Joydeep Sen Sarma at Jan 12, 2009 at 7:00 am
    We should be able to control this (specify exact mapper count) once hadoop-4565 and hive-74 are resolved (these are being worked on actively).

    ________________________________
    From: Zheng Shao
    Sent: Sunday, January 11, 2009 9:16 PM
    To: hive-user@hadoop.apache.org
    Subject: Re: Number of Mappers

    Currently the only way to do it is to use a reducer.

    set mapred.reduce.tasks=1;
    SELECT TRANSFORM(actor_id) USING '/my/script' AS (actor_id, percentile, count) FROM (SELECT actor_id FROM activities CLUSTER BY actor_id) a;
    On Sun, Jan 11, 2009 at 8:45 PM, Josh Ferguson wrote:
    If I'm running a query like this:

    hive> SELECT TRANSFORM(actor_id) USING '/my/script' AS (actor_id, percentile, count) FROM activities;

    It creates a map job for each file. I need every row that is in the table to be run through a single instance of the script since certain parts require global list information. Do I need to rework this query to use a reducer or can I change some configuration variable to load in all of my data from this table and run it through /my/script all at once?

    Josh F.



    --
    Yours,
    Zheng
  • Josh Ferguson at Jan 12, 2009 at 7:07 am
    The reducer method is a pretty low-cost (in terms of developer time)
    workaround, I wouldn't make it too high of a priority. It seems like a
    throughput optimization at most and that's only for a certain class of
    mapper script that actually reduces the input set in some way.

    Josh
    On Jan 11, 2009, at 10:59 PM, Joydeep Sen Sarma wrote:

    We should be able to control this (specify exact mapper count) once
    hadoop-4565 and hive-74 are resolved (these are being worked on
    actively).

    From: Zheng Shao
    Sent: Sunday, January 11, 2009 9:16 PM
    To: hive-user@hadoop.apache.org
    Subject: Re: Number of Mappers

    Currently the only way to do it is to use a reducer.

    set mapred.reduce.tasks=1;
    SELECT TRANSFORM(actor_id) USING '/my/script' AS (actor_id,
    percentile, count) FROM (SELECT actor_id FROM activities CLUSTER BY
    actor_id) a;
    On Sun, Jan 11, 2009 at 8:45 PM, Josh Ferguson wrote:
    If I'm running a query like this:

    hive> SELECT TRANSFORM(actor_id) USING '/my/script' AS (actor_id,
    percentile, count) FROM activities;

    It creates a map job for each file. I need every row that is in the
    table to be run through a single instance of the script since
    certain parts require global list information. Do I need to rework
    this query to use a reducer or can I change some configuration
    variable to load in all of my data from this table and run it
    through /my/script all at once?

    Josh F.



    --
    Yours,
    Zheng

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupuser @
categorieshive, hadoop
postedJan 12, '09 at 4:45a
activeJan 12, '09 at 7:07a
posts4
users3
websitehive.apache.org

People

Translate

site design / logo © 2021 Grokbase