FAQ
Hi Impala Users,

Please take a look our latest blog for the current roadmap, release
schedule, and community plan for the Cloudera Impala project:
http://blog.cloudera.com/blog/2012/12/whats-next-for-cloudera-impala/

We also updated the Impala FAQ based on the questions from mailing lists,
beta customers, and presentations:
https://ccp.cloudera.com/display/IMPALA10BETADOC/Impala+Frequently+Asked+Questions

In case you missed it, here are a few more resources on Impala:

- Intro to Impala webinar:
http://www.cloudera.com/content/cloudera/en/resources/library/recordedwebinar/impala-real-time-queries-in-hadoop-webinar-slides.html
- Tech dive into Impala:
http://www.meetup.com/Chicago-Big-Data/messages/boards/thread/29356612/post/89419992/
- Impala E-Learning Course:
http://training.cloudera.com/elearning/impala/
- External blogs on Impala:
http://blog.cloudera.com/blog/2012/11/external-observations-about-cloudera-impala/

We're excited by your continued feedback and enthusiasm for the project.

Thanks,
Justin

--

Search Discussions

  • Marcel Kornacker at Dec 14, 2012 at 7:48 pm
    You think it might be a good idea to post the blog post directly, instead
    of a link?
    On Fri, Dec 14, 2012 at 10:24 AM, Justin Erickson wrote:

    Hi Impala Users,

    Please take a look our latest blog for the current roadmap, release
    schedule, and community plan for the Cloudera Impala project:
    http://blog.cloudera.com/blog/2012/12/whats-next-for-cloudera-impala/

    We also updated the Impala FAQ based on the questions from mailing lists,
    beta customers, and presentations:

    https://ccp.cloudera.com/display/IMPALA10BETADOC/Impala+Frequently+Asked+Questions

    In case you missed it, here are a few more resources on Impala:

    - Intro to Impala webinar:
    http://www.cloudera.com/content/cloudera/en/resources/library/recordedwebinar/impala-real-time-queries-in-hadoop-webinar-slides.html
    - Tech dive into Impala:
    http://www.meetup.com/Chicago-Big-Data/messages/boards/thread/29356612/post/89419992/
    - Impala E-Learning Course:
    http://training.cloudera.com/elearning/impala/
    - External blogs on Impala:
    http://blog.cloudera.com/blog/2012/11/external-observations-about-cloudera-impala/

    We're excited by your continued feedback and enthusiasm for the project.

    Thanks,
    Justin

    --

    --
  • Justin Erickson at Dec 15, 2012 at 1:39 am
    Sure. For convenience:

    What’s Next for Cloudera Impala?

    It’s been an exciting month and a half since the launch of the Cloudera
    Impala beta, and we thought it’d be a great time to provide an update about
    what’s next for the project – including our product roadmap, release
    schedule and open-source plan.

    First of all, we’d like to thank you for your enthusiasm and valuable beta
    feedback. We’re actively listening and have already fixed many of the bugs
    reported, captured feature requests for the roadmap, and updated the Cloudera
    Impala FAQ<https://ccp.cloudera.com/display/IMPALA10BETADOC/Impala+Frequently+Asked+Questions>
    based
    on user input.
    GA Roadmap

    Our primary focus between now and general availability (GA) is making
    Impala enterprise-ready for your production Hadoop clusters. This means
    continued investments in product stability as well as product
    functionality, including:

    - Additional file formats – specifically the Avro file format and
    LZO-compressed TextFiles
    - Additional OS support – for the same supported 64-bit OS platforms as
    CDH4 <https://ccp.cloudera.com/display/CDH4DOC/CDH4+Requirements+and+Supported+Versions>including
    RHEL/CentOS 5.7, Ubuntu, Debian, SLES, and Oracle Linux
    - Straggler handling – enables Impala to give more work to faster
    machines and less to slower machines for the fastest response times. In
    large clusters you often see a large variance of performance across nodes
    due to things like slow and faulty disks.
    - JDBC driver – enables Java apps to interface with Impala. We’ll
    leverage the JDBC driver from Apache Hive to provide a common SQL interface
    for Java apps for both Impala and Hive.
    - Data Definition Language (DDL) – enables users to create tables in the
    shared Hive metastore from Impala as well as Hive. As of Impala beta
    version 0.3, you can query from Impala but need to create your tables
    through Hive first.
    - Faster, bigger, and more memory efficient joins – through a
    partitioned hash join, Impala will be able to partition the second table in
    a join so only one copy of the table is partitioned across all the nodes in
    the cluster. Currently Impala stores the second table in a join in each
    node’s memory. Impala will use table statistics to determine which strategy
    is most performant for each query.
    - Faster, bigger, and more memory efficient aggregations – enables
    pre-aggregation to occur distributed local to the data to offload work, and
    thus memory consumption from the coordinator node that returns the final
    results.
    - Broader SQL performance optimizations – enables more of Impala’s SQL
    features and built-ins to return with lowest latency by expanding our usage
    of LLVM code generation.
    - Automatic metadata refresh – enables new tables and data to seamlessly
    be available for Impala queries as they are added without having to issue a
    manual refresh command to Impala.
    - New Trevni columnar file format – enables even faster performance
    through an optional columnar format like Google Dremel’s ColumnIO and those
    of other analytical query engines. For a Hadoop user, Trevni will be
    another file format so any processing framework can access data stored in
    Trevni format like they do today with formats like Avro and SequenceFiles.

    Post-GA Top Asks

    We have a good list of additional enhancements that are important to us and
    our users that are on our post-GA roadmap. The most notable and frequently
    asked for items include:

    - UDFs and extensibility – enables users to add their own custom
    functionality. This is a frequent request and will take a more time than GA
    to build the right model considering performance and isolation requirements.
    - Cost-based join order optimization – avoids users having to correctly
    order the joins based on size and selectivity of the tables.
    - External joins using disk – enables joins between tables to spill to
    disk for arbitrarily large joins.
    - Nested data – enables queries on complex nested structures including
    maps, structs, and arrays.

    Release Plan

    We are tentatively planning for the Impala 1.0 GA at the end of the first
    quarter of 2013. During the beta period we will continue to ship Impala
    beta updates every 2-4 weeks. These updates will include stability fixes as
    well as features from our roadmap listed above as soon as they are ready.
    For example, two of our top asks, additional OS platforms and a JDBC
    driver, will be coming soon after the New Year.
    Open-Source Process

    For those of you involved in the Apache Hadoop community, we appreciate
    your patience as we provide more transparency into our open-source
    development. Our internal test code and issue tracking has some
    confidential information from our early private beta customers. We need to
    separate this out before we can push more of our infrastructure to public
    systems.

    Earlier this week we provided the second update to the Impala code base.
    Going forward, the plan is to provide:

    - Up-to-date source repositories – we’ll keep the repo more up-to-date
    going forward.
    - Transparent issue tracking – we’ll be moving bug and feature request
    tracking over to the public Jira we have set up for Impala.

    We are eagerly listening to feedback and continuously adjusting our roadmap
    to best meet the needs of our user base. As such, please note that as this
    is a beta product, so the roadmap and timelines above may change.

    *Justin Erickson is the product manager for Cloudera Impala.*

    On Fri, Dec 14, 2012 at 11:48 AM, Marcel Kornacker wrote:

    You think it might be a good idea to post the blog post directly, instead
    of a link?
    On Fri, Dec 14, 2012 at 10:24 AM, Justin Erickson wrote:

    Hi Impala Users,

    Please take a look our latest blog for the current roadmap, release
    schedule, and community plan for the Cloudera Impala project:
    http://blog.cloudera.com/blog/2012/12/whats-next-for-cloudera-impala/

    We also updated the Impala FAQ based on the questions from mailing lists,
    beta customers, and presentations:

    https://ccp.cloudera.com/display/IMPALA10BETADOC/Impala+Frequently+Asked+Questions

    In case you missed it, here are a few more resources on Impala:

    - Intro to Impala webinar:
    http://www.cloudera.com/content/cloudera/en/resources/library/recordedwebinar/impala-real-time-queries-in-hadoop-webinar-slides.html
    - Tech dive into Impala:
    http://www.meetup.com/Chicago-Big-Data/messages/boards/thread/29356612/post/89419992/
    - Impala E-Learning Course:
    http://training.cloudera.com/elearning/impala/
    - External blogs on Impala:
    http://blog.cloudera.com/blog/2012/11/external-observations-about-cloudera-impala/

    We're excited by your continued feedback and enthusiasm for the project.

    Thanks,
    Justin

    --

    --

    --

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupimpala-user @
categorieshadoop
postedDec 14, '12 at 6:24p
activeDec 15, '12 at 1:39a
posts3
users2
websitecloudera.com
irc#hadoop

People

Translate

site design / logo © 2022 Grokbase