Bejoy summarized it well. Even though Hive doesn't have referencial
integrity constraints, if you are loading data from MySQL that has
referencial integrity constraint, Hive preserves that property of the data.
Another thing to note is you can't update the data that is loaded into a
Hive warehouse. So once you migrate the data from MySQL, the referencial
integrity constraints will be preserved in the Hive warehouse.
Furthermore, the fact that Hive doesn't have referencial integrity
constraints shouldn't prevent you from running any of the queries that you
run on MySQL . However, some joins that are typically eliminated in an
RDBMS because of referencial integrity constraints are not eliminated.
On Mon, Oct 22, 2012 at 4:29 AM, Bejoy KS wrote:
You can import the existing tables to hive as such using sqoop. Hive is a
wrapper over mapreduce that gives you the flexibility to create optimized
mapreduce jobs using SQL like syntax. The is no relational style maintained
in hive and don't treat hive as a typical database/datawarehouse. In short
no referencial integrity.
However if you have all the data in hive almost all queries that work on
mysql would work in hive as well. Some queries may not but still you'll
have work arounds.
You can have whole data sets/tables in hive itself.
You don't need denormalization much I guess. Joins work well in hive.
Sent from handheld, please excuse typos.
*From: * Austin Chungath <firstname.lastname@example.org>
*Date: *Mon, 22 Oct 2012 16:47:04 +0530
*ReplyTo: * email@example.com
*Subject: *Implementing a star schema (facts & dimension model)
I am new to data warehousing in hadoop. This might be a trivial question
but I was unable to find any answers in the mailing list.
My questions are:
A person has an existing data warehouse that uses a star schema
(implemented in a mysql database).How to migrate it to Hadoop?
I can use sqoop to copy my tables to hive, that much I know.
But what happens to referential integrity? since there are no primary key
/ foreign key concepts.
I have seen that I can use Hive & Hbase together. Is there a method for
storing facts and dimension tables in hadoop using Hive & Hbase together?
Does putting dimensions in Hbase & facts in Hive make any sense? or should
it be the other way around?
Consider de-normalization is not an option.
What is the best practice to port an existing data warehouse to hadoop,
with minimum changes to the database model?
Please let me know with whatever views you have on this.