FAQ
Hello,

I was wondering if anyone managed to unit test Hive scripts and share
his/her experience? My first thought was to prepare sample data, run hive
scripts in order to generate output and then compare the generated output
with the expected output. Sounds fairly simple but it may be a bit
complicated if the data is read from S3 and stored in S3.

I was also wondering if anyone managed to run the tests on EMR? I found this
simple framework which may help with testing EMR:
http://entxtech.blogspot.com/2010/10/how-to-unit-test-apache-hive-scripts.html
However I am tempted to run tests on a real EMR rather than doing it
locally.

I am planning to integrate those tests with Jenkins (formerly Hudson).

Many thanks,
Radek

Search Discussions

  • Edward Capriolo at Feb 18, 2011 at 7:10 pm

    On Fri, Feb 18, 2011 at 6:58 AM, Radek Maciaszek wrote:
    Hello,
    I was wondering if anyone managed to unit test Hive scripts and share
    his/her experience? My first thought was to prepare sample data, run hive
    scripts in order to generate output and then compare the generated output
    with the expected output. Sounds fairly simple but it may be a bit
    complicated if the data is read from S3 and stored in S3.
    I was also wondering if anyone managed to run the tests on EMR? I found this
    simple framework which may help with testing EMR:
    http://entxtech.blogspot.com/2010/10/how-to-unit-test-apache-hive-scripts.html
    However I am tempted to run tests on a real EMR rather than doing it
    locally.
    I am planning to integrate those tests with Jenkins (formerly Hudson).
    Many thanks,
    Radek
    The process you described of diffing output is exactly how hives
    current unit testing works. It has its upsites being that it is good
    for catching regressions but the download is it is not really
    programatic. Look for .q files in the hive source and their
    corresponding results/q.out files.

    Edward
  • Kirk True at Feb 18, 2011 at 8:21 pm
    Hi Radek,

    I'm actually in the process of running the map-join unit tests against
    EMR as we speak. It's possible but dog slow :)

    Thanks,
    Kirk
    On 2/18/11 11:09 AM, Edward Capriolo wrote:
    On Fri, Feb 18, 2011 at 6:58 AM, Radek Maciaszek
    wrote:
    Hello,
    I was wondering if anyone managed to unit test Hive scripts and share
    his/her experience? My first thought was to prepare sample data, run hive
    scripts in order to generate output and then compare the generated output
    with the expected output. Sounds fairly simple but it may be a bit
    complicated if the data is read from S3 and stored in S3.
    I was also wondering if anyone managed to run the tests on EMR? I found this
    simple framework which may help with testing EMR:
    http://entxtech.blogspot.com/2010/10/how-to-unit-test-apache-hive-scripts.html
    However I am tempted to run tests on a real EMR rather than doing it
    locally.
    I am planning to integrate those tests with Jenkins (formerly Hudson).
    Many thanks,
    Radek
    The process you described of diffing output is exactly how hives
    current unit testing works. It has its upsites being that it is good
    for catching regressions but the download is it is not really
    programatic. Look for .q files in the hive source and their
    corresponding results/q.out files.

    Edward
  • Andrew Wilson at Feb 18, 2011 at 9:24 pm
    Hi Radek,

    I've been using the MiniMRCluster and MiniDFSCluster to run unit tests locally. That has been giving me decent cycle time. I have fixture tables in my test/resources which I can load into the MiniHiveCluster as part of test setup. I loosely based my code on QTestUtil in the Hive trunk (which I could not figure out how to use directly).

    Andrew
    On Feb 18, 2011, at 3:21 PM, Kirk True wrote:

    Hi Radek,

    I'm actually in the process of running the map-join unit tests against
    EMR as we speak. It's possible but dog slow :)

    Thanks,
    Kirk
    On 2/18/11 11:09 AM, Edward Capriolo wrote:
    On Fri, Feb 18, 2011 at 6:58 AM, Radek Maciaszek
    wrote:
    Hello,
    I was wondering if anyone managed to unit test Hive scripts and share
    his/her experience? My first thought was to prepare sample data, run hive
    scripts in order to generate output and then compare the generated output
    with the expected output. Sounds fairly simple but it may be a bit
    complicated if the data is read from S3 and stored in S3.
    I was also wondering if anyone managed to run the tests on EMR? I found this
    simple framework which may help with testing EMR:
    http://entxtech.blogspot.com/2010/10/how-to-unit-test-apache-hive-scripts.html
    However I am tempted to run tests on a real EMR rather than doing it
    locally.
    I am planning to integrate those tests with Jenkins (formerly Hudson).
    Many thanks,
    Radek
    The process you described of diffing output is exactly how hives
    current unit testing works. It has its upsites being that it is good
    for catching regressions but the download is it is not really
    programatic. Look for .q files in the hive source and their
    corresponding results/q.out files.

    Edward

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupuser @
categorieshive, hadoop
postedFeb 18, '11 at 11:59a
activeFeb 18, '11 at 9:24p
posts4
users4
websitehive.apache.org

People

Translate

site design / logo © 2022 Grokbase