Sure. For convenience:
What’s Next for Cloudera Impala?
It’s been an exciting month and a half since the launch of the Cloudera
Impala beta, and we thought it’d be a great time to provide an update about
what’s next for the project – including our product roadmap, release
schedule and open-source plan.
First of all, we’d like to thank you for your enthusiasm and valuable beta
feedback. We’re actively listening and have already fixed many of the bugs
reported, captured feature requests for the roadmap, and updated the Cloudera
on user input.
Our primary focus between now and general availability (GA) is making
Impala enterprise-ready for your production Hadoop clusters. This means
continued investments in product stability as well as product
- Additional file formats – specifically the Avro file format and
- Additional OS support – for the same supported 64-bit OS platforms as
RHEL/CentOS 5.7, Ubuntu, Debian, SLES, and Oracle Linux
- Straggler handling – enables Impala to give more work to faster
machines and less to slower machines for the fastest response times. In
large clusters you often see a large variance of performance across nodes
due to things like slow and faulty disks.
- JDBC driver – enables Java apps to interface with Impala. We’ll
leverage the JDBC driver from Apache Hive to provide a common SQL interface
for Java apps for both Impala and Hive.
- Data Definition Language (DDL) – enables users to create tables in the
shared Hive metastore from Impala as well as Hive. As of Impala beta
version 0.3, you can query from Impala but need to create your tables
through Hive first.
- Faster, bigger, and more memory efficient joins – through a
partitioned hash join, Impala will be able to partition the second table in
a join so only one copy of the table is partitioned across all the nodes in
the cluster. Currently Impala stores the second table in a join in each
node’s memory. Impala will use table statistics to determine which strategy
is most performant for each query.
- Faster, bigger, and more memory efficient aggregations – enables
pre-aggregation to occur distributed local to the data to offload work, and
thus memory consumption from the coordinator node that returns the final
- Broader SQL performance optimizations – enables more of Impala’s SQL
features and built-ins to return with lowest latency by expanding our usage
of LLVM code generation.
- Automatic metadata refresh – enables new tables and data to seamlessly
be available for Impala queries as they are added without having to issue a
manual refresh command to Impala.
- New Trevni columnar file format – enables even faster performance
through an optional columnar format like Google Dremel’s ColumnIO and those
of other analytical query engines. For a Hadoop user, Trevni will be
another file format so any processing framework can access data stored in
Trevni format like they do today with formats like Avro and SequenceFiles.
Post-GA Top Asks
We have a good list of additional enhancements that are important to us and
our users that are on our post-GA roadmap. The most notable and frequently
asked for items include:
- UDFs and extensibility – enables users to add their own custom
functionality. This is a frequent request and will take a more time than GA
to build the right model considering performance and isolation requirements.
- Cost-based join order optimization – avoids users having to correctly
order the joins based on size and selectivity of the tables.
- External joins using disk – enables joins between tables to spill to
disk for arbitrarily large joins.
- Nested data – enables queries on complex nested structures including
maps, structs, and arrays.
We are tentatively planning for the Impala 1.0 GA at the end of the first
quarter of 2013. During the beta period we will continue to ship Impala
beta updates every 2-4 weeks. These updates will include stability fixes as
well as features from our roadmap listed above as soon as they are ready.
For example, two of our top asks, additional OS platforms and a JDBC
driver, will be coming soon after the New Year.
For those of you involved in the Apache Hadoop community, we appreciate
your patience as we provide more transparency into our open-source
development. Our internal test code and issue tracking has some
confidential information from our early private beta customers. We need to
separate this out before we can push more of our infrastructure to public
Earlier this week we provided the second update to the Impala code base.
Going forward, the plan is to provide:
- Up-to-date source repositories – we’ll keep the repo more up-to-date
- Transparent issue tracking – we’ll be moving bug and feature request
tracking over to the public Jira we have set up for Impala.
We are eagerly listening to feedback and continuously adjusting our roadmap
to best meet the needs of our user base. As such, please note that as this
is a beta product, so the roadmap and timelines above may change.
*Justin Erickson is the product manager for Cloudera Impala.*
On Fri, Dec 14, 2012 at 11:48 AM, Marcel Kornacker wrote:
You think it might be a good idea to post the blog post directly, instead
of a link?
On Fri, Dec 14, 2012 at 10:24 AM, Justin Erickson wrote:
Hi Impala Users,
Please take a look our latest blog for the current roadmap, release
schedule, and community plan for the Cloudera Impala project:http://blog.cloudera.com/blog/2012/12/whats-next-for-cloudera-impala/
We also updated the Impala FAQ based on the questions from mailing lists,
beta customers, and presentations:https://ccp.cloudera.com/display/IMPALA10BETADOC/Impala+Frequently+Asked+Questions
In case you missed it, here are a few more resources on Impala:
- Intro to Impala webinar:http://www.cloudera.com/content/cloudera/en/resources/library/recordedwebinar/impala-real-time-queries-in-hadoop-webinar-slides.html
- Tech dive into Impala:http://www.meetup.com/Chicago-Big-Data/messages/boards/thread/29356612/post/89419992/
- Impala E-Learning Course:http://training.cloudera.com/elearning/impala/
- External blogs on Impala:http://blog.cloudera.com/blog/2012/11/external-observations-about-cloudera-impala/
We're excited by your continued feedback and enthusiasm for the project.