|| at Jun 25, 2011 at 12:15 am
We envisioned this as something external to the Hive codebase. It would consist of:
* datasets (synthetic such as TPC-H, plus real-world if possible, perhaps contributed by a company such as Facebook after sufficient anonymization had been applied)
* data loader scripts, plus scripts for other operations such as purging/archiving old data
* query scripts with expected results
* configurable test harness for running the various load/query scripts either individually or as concurrent mixed workloads; validating results; and collecting performance data
* processes for collection of system data such as cluster load, memory usage, etc
The idea is that this could be used for testing of changes to either Hive or Hadoop. If we were able to pool resources for a shared cluster, we could run through patches and configurations in order to catch regressions or potential problems early.
A good home for this might be the new BigTop project:http://wiki.apache.org/incubator/BigtopProposal
On Jun 24, 2011, at 1:31 AM, Chinna wrote:
In the following Hive Roadmap(http://wiki.apache.org/hadoop/Hive/Roadmap)
we have seen the following proposal
3.4. Test, Error Messages and Debugging
[P0] Heavy-duty test infrastructure
Our team is interested in working on this task.
We need some details about the expectations of this task.
Pls add u r valuable comments.
Chinna Rao lalam