I was wondering if anyone managed to unit test Hive scripts and share
his/her experience? My first thought was to prepare sample data, run hive
scripts in order to generate output and then compare the generated output
with the expected output. Sounds fairly simple but it may be a bit
complicated if the data is read from S3 and stored in S3.
I was also wondering if anyone managed to run the tests on EMR? I found this
simple framework which may help with testing EMR:
However I am tempted to run tests on a real EMR rather than doing it
I am planning to integrate those tests with Jenkins (formerly Hudson).