sql.
在 2013-3-13 PM11:25,"Darren Lo" <[email protected]>写道:
You can also use Hive CLI or Hue for a faster describe if this is a big
issue.
--
Thanks,
Darren
issue.
On Tue, Mar 12, 2013 at 9:09 PM, Marcel Kornacker wrote:
I don't think the information of "locations of block replicas" should be
collected before giving the answer of a describe request. Can we separate
the process of getting the meta info and getting the locations of block
replicas?That would be very inconvenient, given how the metadata is organized
internally.
metadata so
initial
per-table
wrote:
all of
partition data, which also includes locations of block replicas and
volume ids. This data is cached in order to avoid having to do this
for every single query.
Right now, this is done per-partition, but we're going to change that
to coalesce that into a single call per table.
wrote:
use
the
log I
that matter), the impalad process needs to load the metadata.
Subsequent "describe" commands should run much faster.
On Tue, Mar 12, 2013 at 7:26 PM, Lake Chang wrote:
Thanks for Marcel and Lenni's replies!
I still have some doubts.
1.
includesThanks for Marcel and Lenni's replies!
I still have some doubts.
1.
but it gets the all of the relevant partition data, which also
locations of block replicas and volume ids.
collected before giving the answer of a describe request. Can we separate
the process of getting the meta info and getting the locations of block
replicas?
internally.
2.
I don't know how the information of "locations of block replicas" is stored,
I just wonder can we "using a single call per-table" to get all the
locations of block replicas of all the partitions?
Thanks,
- Aaron
On Mon, Mar 11, 2013 at 11:34 PM, Lenni Kuff <[email protected]>
wrote:will be improved once we move to using a single call per-table (rather
than per-partition) to gather this information.
than per-partition) to gather this information.
I just wonder can we "using a single call per-table" to get all the
locations of block replicas of all the partitions?
Thanks,
- Aaron
On Mon, Mar 11, 2013 at 11:34 PM, Lenni Kuff <[email protected]>
To add to what Marcel said:
Hive does not currently make use of the block replica location
Hive does not currently make use of the block replica location
it does need need to load/cache this information. This is why the
DESCRIBE takes longer in Impala than Hive. As Marcel mentioned, the
performance will be improved once we move to using a single call
performance will be improved once we move to using a single call
(rather than per-partition) to gather this information.
Thanks,
Lenni
Software Engineer - Cloudera
On Mon, Mar 11, 2013 at 7:01 AM, Marcel Kornacker <[email protected]
wrote:
Thanks,
Lenni
Software Engineer - Cloudera
On Mon, Mar 11, 2013 at 7:01 AM, Marcel Kornacker <[email protected]
wrote:
On Sun, Mar 10, 2013 at 10:57 PM, Lake Chang <[email protected]>
Thanks for the reply.
It's unexpected that loading the metadata costs so much of time
(several
minutes), and the time varies according to the number of partitions.
Does it
mean that the first time the impalad loads the meadata, it scans
the impalad process needs to load the metadata
(several
minutes), and the time varies according to the number of partitions.
Does it
mean that the first time the impalad loads the meadata, it scans
the
partitions? And why?
It doesn't scan the partitions, but it gets the all of the relevantpartitions? And why?
partition data, which also includes locations of block replicas and
volume ids. This data is cached in order to avoid having to do this
for every single query.
Right now, this is done per-partition, but we're going to change that
to coalesce that into a single call per table.
Yours,
- Aaron
- Aaron
On Monday, March 11, 2013 11:58:58 AM UTC+8, Marcel Kornacker wrote:
On Sun, Mar 10, 2013 at 7:47 PM, Lake Chang <[email protected]>
On Sun, Mar 10, 2013 at 7:47 PM, Lake Chang <[email protected]>
Hi Impala Users,
I'm very glad to join this group and to talk with all of you.
Impala is new to me, and I encountered a problem when I tried to
I'm very glad to join this group and to talk with all of you.
Impala is new to me, and I encountered a problem when I tried to
Impala
on an existing hive table which had many partitions. Let's name
on an existing hive table which had many partitions. Let's name
table
"tbl_some_table". The problem is that, when I queried "describe
tbl_some_table", it took very long a time to respond. From the
"tbl_some_table". The problem is that, when I queried "describe
tbl_some_table", it took very long a time to respond. From the
saw
that it seemed to scan all the partitions of the table.
Does anyone know why did it do this? How to avoid the problem and
make
"impala describe" as fast as hive does?
The first time after startup you run "describe" (or any query, forthat it seemed to scan all the partitions of the table.
Does anyone know why did it do this? How to avoid the problem and
make
"impala describe" as fast as hive does?
that matter), the impalad process needs to load the metadata.
Subsequent "describe" commands should run much faster.
Thanks,
- Aaron
- Aaron
--
Thanks,
Darren