FAQ
An update on this.

I've finished doing changes in Oozie Hive-action to work with Hive 0.7.

As mentioned before the problem is that not all needed Hive & dependent JARs
are available in public Maven repos.

Early next week the Cloudera Maven repositories should have beta versions of
these JARs (currently I'm building against SNAPSHOTs).

As soon as the beta JARs are available I'll post a patch using those JAR
versions.

Thanks.

Alejandro
On Thu, Feb 10, 2011 at 4:51 PM, Alejandro Abdelnur wrote:

Hi Balaji,

The latest patch of the Hive action does not bundle hive-default.xml (got
same feedback from Carl), you'll be responsible for bundling it in the WF
directory until Hive JARs bundles it.

I'll upload the new patch early next week and then ask Oozie it integrate
it.

Still the problem I have is that, AFAIK, not all Hadoop and Hive JARs are
available in public Maven repositories currently used by Oozie build. I'll
submit as part o the PR a separate commit that configures Oozie build to
pull for Cloudera's Maven repositories where all JARs are available.

Thanks.

Alejandro

On Thu, Feb 10, 2011 at 4:34 PM, Balaji Rajagopalan <
balajirg@yahoo-inc.com> wrote:
Alejandro,

I have used your hive action patch from tucu’s forked branch in yahoo
github and it works fine, when will your patch be available in the master
branch of yahoo github. Also I have a small suggestion if I may,
hive-default.xml is bundled with the oozie-core.jar, instead can we have the
hive-default.xml is the same folder of workflow.xml in the hdfs, so when I
change the hive-default.xml I don’t have to bundle the jar again.



Regards,

Balaji



*From:* Alejandro Abdelnur
*Sent:* Thursday, February 10, 2011 3:12 AM
*To:* user@hive.apache.org
*Subject:* Re: periodic execution



Hi Cam,



A bit of information that may be useful for you, Cloudera's Oozie has a
Hive action that you can use from workflow jobs.



Cheers



Alejandro



On Wed, Feb 9, 2011 at 11:44 AM, Cam Bazz wrote:

Hello,

I am looking over oozie's coordinator. But meanwhile, I managed to
write a simple java program to connect to hive using jdbc.

I can import data and execute queries.

I was wondering, somewhat for doing workflows, one needs to keep
metadata, i.e. which was the last file, partition processed etc.

I could do this usually using a database like db4o, and keeping a static
file.

Is the derby database that comes with hive is for this purpose? how do
people usually store state when using a hive application?

best regards,
-C.B.


On Wed, Feb 9, 2011 at 5:23 AM, Jeff Hammerbacher <hammer@cloudera.com>
wrote:
Hey Cam,
You should use Oozie's
Coordinator: https://github.com/yahoo/oozie/wiki/Oozie-Coord-Use-Cases.
Regards,
Jeff
On Tue, Feb 8, 2011 at 4:29 PM, Cam Bazz wrote:

Hello,

What kind of strategy must i follow, in order to periodically run
certain things.

For example, each hour, i want to look up log files from certain dir,
and for new files, i need to run:

load data local inpath '/home/cam/logs/log.2011310120' into table
item_view_raw partition (date_hour=2011310120);

FROM item_view_raw ivr INSERT OVERWRITE TABLE item_view partition
(date_hour=2011310120) SELECT ivr.view_time, ivr.ip_number,
ivr.session_id, ivr.session_cookie, ivr.eser_sid, ivr.sale_status,
ivr.maker_name, ivr.title WHERE ivr.log_tag = 'PROD' and
ivr.date_hour='2011310120';

obviously, i need to deduce which files are new, iterate over them,
and extract the time key, which will be used as a partition name, in
this case is: 2011310120

It seems like i can write a java program to deal with the
syncronization of all these tasks, but i was wondering, what would you
guys suggest?

Any ideas/recomendations/help greatly appreciated

Best Regards,
C.B.

Search Discussions

  • Edward Capriolo at Feb 17, 2011 at 1:14 am

    On Wed, Feb 16, 2011 at 8:11 PM, Alejandro Abdelnur wrote:
    An update on this.
    I've finished doing changes in Oozie Hive-action to work with Hive 0.7.
    As mentioned before the problem is that not all needed Hive & dependent JARs
    are available in public Maven repos.
    Early next week the Cloudera Maven repositories should have beta versions of
    these JARs (currently I'm building against SNAPSHOTs).
    As soon as the beta JARs are available I'll post a patch using those JAR
    versions.
    Thanks.
    Alejandro
    On Thu, Feb 10, 2011 at 4:51 PM, Alejandro Abdelnur wrote:

    Hi Balaji,
    The latest patch of the Hive action does not bundle hive-default.xml (got
    same feedback from Carl), you'll be responsible for bundling it in the WF
    directory until Hive JARs bundles it.
    I'll upload the new patch early next week and then ask Oozie it integrate
    it.
    Still the problem I have is that, AFAIK, not all Hadoop and Hive JARs are
    available in public Maven repositories currently used by Oozie build. I'll
    submit as part o the PR a separate commit that configures Oozie build to
    pull for Cloudera's Maven repositories where all JARs are available.
    Thanks.
    Alejandro
    On Thu, Feb 10, 2011 at 4:34 PM, Balaji Rajagopalan
    wrote:
    Alejandro,

    I have used your hive action patch from tucu’s forked branch in yahoo
    github and it works fine, when will your patch be  available in the master
    branch of yahoo github.  Also I have a small suggestion if I may,
    hive-default.xml is bundled with the oozie-core.jar, instead can we have the
    hive-default.xml is the same folder of workflow.xml in the hdfs, so when I
    change the hive-default.xml I don’t have to bundle the jar again.



    Regards,

    Balaji



    From: Alejandro Abdelnur
    Sent: Thursday, February 10, 2011 3:12 AM
    To: user@hive.apache.org
    Subject: Re: periodic execution



    Hi Cam,



    A bit of information that may be useful for you, Cloudera's Oozie has a
    Hive action that you can use from workflow jobs.



    Cheers



    Alejandro



    On Wed, Feb 9, 2011 at 11:44 AM, Cam Bazz wrote:

    Hello,

    I am looking over oozie's coordinator. But meanwhile, I managed to
    write a simple java program to connect to hive using jdbc.

    I can import data and execute queries.

    I was wondering, somewhat for doing workflows, one needs to keep
    metadata, i.e. which was the last file, partition processed etc.

    I could do this usually using a database like db4o, and keeping a static
    file.

    Is the derby database that comes with hive is for this purpose? how do
    people usually store state when using a hive application?

    best regards,
    -C.B.

    On Wed, Feb 9, 2011 at 5:23 AM, Jeff Hammerbacher <hammer@cloudera.com>
    wrote:
    Hey Cam,
    You should use Oozie's
    Coordinator: https://github.com/yahoo/oozie/wiki/Oozie-Coord-Use-Cases.
    Regards,
    Jeff
    On Tue, Feb 8, 2011 at 4:29 PM, Cam Bazz wrote:

    Hello,

    What kind of strategy must i follow, in order to periodically run
    certain things.

    For example, each hour, i want to look up log files from certain dir,
    and for new files, i need to run:

    load data local inpath '/home/cam/logs/log.2011310120' into table
    item_view_raw partition (date_hour=2011310120);

    FROM item_view_raw ivr INSERT OVERWRITE TABLE item_view partition
    (date_hour=2011310120) SELECT ivr.view_time, ivr.ip_number,
    ivr.session_id, ivr.session_cookie, ivr.eser_sid, ivr.sale_status,
    ivr.maker_name, ivr.title WHERE ivr.log_tag = 'PROD' and
    ivr.date_hour='2011310120';

    obviously, i need to deduce which files are new, iterate over them,
    and extract the time key, which will be used as a partition name, in
    this case is: 2011310120

    It seems like i can write a java program to deal with the
    syncronization of all these tasks, but i was wondering, what would you
    guys suggest?

    Any ideas/recomendations/help greatly appreciated

    Best Regards,
    C.B.
    Did support for hive variables (0.7.0) make it into this version of
    the oozie-action?

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupuser @
categorieshive, hadoop
postedFeb 17, '11 at 1:12a
activeFeb 17, '11 at 1:14a
posts2
users2
websitehive.apache.org

People

Translate

site design / logo © 2021 Grokbase