FAQ

On Sun, Mar 10, 2013 at 10:57 PM, Lake Chang wrote:
Thanks for the reply.
the impalad process needs to load the metadata
It's unexpected that loading the metadata costs so much of time (several
minutes), and the time varies according to the number of partitions. Does it
mean that the first time the impalad loads the meadata, it scans all of the
partitions? And why?
It doesn't scan the partitions, but it gets the all of the relevant
partition data, which also includes locations of block replicas and
volume ids. This data is cached in order to avoid having to do this
for every single query.

Right now, this is done per-partition, but we're going to change that
to coalesce that into a single call per table.
Yours,
- Aaron
On Monday, March 11, 2013 11:58:58 AM UTC+8, Marcel Kornacker wrote:
On Sun, Mar 10, 2013 at 7:47 PM, Lake Chang wrote:
Hi Impala Users,

I'm very glad to join this group and to talk with all of you.
Impala is new to me, and I encountered a problem when I tried to use
Impala
on an existing hive table which had many partitions. Let's name the
table
"tbl_some_table". The problem is that, when I queried "describe
tbl_some_table", it took very long a time to respond. From the log I saw
that it seemed to scan all the partitions of the table.
Does anyone know why did it do this? How to avoid the problem and make
"impala describe" as fast as hive does?
The first time after startup you run "describe" (or any query, for
that matter), the impalad process needs to load the metadata.
Subsequent "describe" commands should run much faster.

Thanks,
- Aaron

Search Discussions

Discussion Posts

Previous

Follow ups

Related Discussions

Discussion Navigation
viewthread | post
posts ‹ prev | 3 of 7 | next ›
Discussion Overview
groupimpala-user @
categorieshadoop
postedMar 11, '13 at 2:47a
activeMar 13, '13 at 3:29p
posts7
users4
websitecloudera.com
irc#hadoop

People

Translate

site design / logo © 2023 Grokbase