FAQ
thats's OK. i am just used to have a look at the schema before composing
sql.
在 2013-3-13 PM11:25,"Darren Lo" <[email protected]>写道:
You can also use Hive CLI or Hue for a faster describe if this is a big
issue.

On Tue, Mar 12, 2013 at 9:09 PM, Marcel Kornacker wrote:
On Tue, Mar 12, 2013 at 7:26 PM, Lake Chang wrote:
Thanks for Marcel and Lenni's replies!
I still have some doubts.
1.
but it gets the all of the relevant partition data, which also
includes
locations of block replicas and volume ids.
I don't think the information of "locations of block replicas" should be
collected before giving the answer of a describe request. Can we separate
the process of getting the meta info and getting the locations of block
replicas?
That would be very inconvenient, given how the metadata is organized
internally.
2.
will be improved once we move to using a single call per-table (rather
than per-partition) to gather this information.
I don't know how the information of "locations of block replicas" is stored,
I just wonder can we "using a single call per-table" to get all the
locations of block replicas of all the partitions?

Thanks,
- Aaron


On Mon, Mar 11, 2013 at 11:34 PM, Lenni Kuff <[email protected]>
wrote:
To add to what Marcel said:

Hive does not currently make use of the block replica location
metadata so
it does need need to load/cache this information. This is why the
initial
DESCRIBE takes longer in Impala than Hive. As Marcel mentioned, the
performance will be improved once we move to using a single call
per-table
(rather than per-partition) to gather this information.

Thanks,
Lenni
Software Engineer - Cloudera


On Mon, Mar 11, 2013 at 7:01 AM, Marcel Kornacker <[email protected]
wrote:
On Sun, Mar 10, 2013 at 10:57 PM, Lake Chang <[email protected]>
wrote:
Thanks for the reply.
the impalad process needs to load the metadata
It's unexpected that loading the metadata costs so much of time
(several
minutes), and the time varies according to the number of partitions.
Does it
mean that the first time the impalad loads the meadata, it scans
all of
the
partitions? And why?
It doesn't scan the partitions, but it gets the all of the relevant
partition data, which also includes locations of block replicas and
volume ids. This data is cached in order to avoid having to do this
for every single query.

Right now, this is done per-partition, but we're going to change that
to coalesce that into a single call per table.
Yours,
- Aaron
On Monday, March 11, 2013 11:58:58 AM UTC+8, Marcel Kornacker wrote:

On Sun, Mar 10, 2013 at 7:47 PM, Lake Chang <[email protected]>
wrote:
Hi Impala Users,

I'm very glad to join this group and to talk with all of you.
Impala is new to me, and I encountered a problem when I tried to
use
Impala
on an existing hive table which had many partitions. Let's name
the
table
"tbl_some_table". The problem is that, when I queried "describe
tbl_some_table", it took very long a time to respond. From the
log I
saw
that it seemed to scan all the partitions of the table.
Does anyone know why did it do this? How to avoid the problem and
make
"impala describe" as fast as hive does?
The first time after startup you run "describe" (or any query, for
that matter), the impalad process needs to load the metadata.
Subsequent "describe" commands should run much faster.

Thanks,
- Aaron


--
Thanks,
Darren

Search Discussions

Discussion Posts

Previous

Related Discussions

Discussion Navigation
viewthread | post
posts ‹ prev | 7 of 7 | next ›
Discussion Overview
groupimpala-user @
categorieshadoop
postedMar 11, '13 at 2:47a
activeMar 13, '13 at 3:29p
posts7
users4
websitecloudera.com
irc#hadoop

People

Translate

site design / logo © 2023 Grokbase