On Mon, May 27, 2013 at 1:02 PM, Marcel Kornacker wrote:On Mon, May 27, 2013 at 4:42 AM, Karthik wrote:
Hi
1)Is there any paper on how impala works?. Are those Hive queries converted
into
HBase scans internally? Or does impala have some other mechanism to retrieve
the hbase data?. Any internal paper describing the architecture in
detail?
Karthik, there is no published paper on the internal architecture of
Impala at the moment - we feel that's somewhat premature, considering
that things are still evolving at a fast pace, but we are planning on
providing an in-depth discussions of the internals once things have
settled down a bit.
Regarding your specific questions: a query against an hbase table ends
up executing scans against that table using the standard hbase client
library.
As Marcel says, when Impala is querying against HBase data it will leverage
the HBase region server and APIs to perform scans of the HBase tables. When
possible it will push down predicates such as primary key predicates to
minimize the scan ranges. Note that Impala can query HBase, but HBase is
not necessary for non-HBase data.
We don't yet have papers on Impala but you can find more material on Impala
in the following links:
- Impala 1.0 GA blog:
http://blog.cloudera.com/blog/2013/05/cloudera-impala-1-0-its-here-its-real-its-already-the-standard-for-sql-on-hadoop/
- Impala E-Learning Course:
http://training.cloudera.com/elearning/impala/
- Rich Report Impala overview:
http://hpcradio.blogspot.com/2013/05/technical-overview-of-cloudera-impala.html?utm_source=feedburner&utm_medium=feed&utm_campaign=Feed%3A+SunRadioHpcPodcast+(The+Rich+Report+HPC+Podcast)
- Tech dive into Impala webinar:
http://www.cloudera.com/content/cloudera/en/resources/library/recordedwebinar/cloudera-impala-a-modern-sql-engine-for-hadoop-video-recording.html
- Impala/Cloudera Enterprise RTQ site:
http://www.cloudera.com/impala - Impala Docs and FAQ:
http://www.cloudera.com/content/support/en/documentation/cloudera-impala/cloudera-impala-documentation-v1-latest.html
Regarding the question on Pentaho and Impala:
2)Has someone tried making impala work with mondrian? From googling,
i understand there were some pending issues with the impala driver from
the cloudera side. Can you kindly tell me whether these issues have been
fixed? What is the best way to go about integrating mondrian with impala?
http://jira.pentaho.com/browse/MONDRIAN-1424We've been working closely with Pentaho. We have a tested out a solution
but need to fix IMPALA-85 before the Pentaho on Impala is ready for
non-beta production usage. We're hoping to fix this integration issue this
summer.