FAQ
That sounds pretty long. Can you try adding
"--load_catalog_at_startup=false" to the catalogd startup parameter and see
if it helps?

Thanks,
Alan

On Fri, Sep 5, 2014 at 6:19 AM, Erik Vandeputte wrote:

Hi all,

I'm part of the data team in a middle-sized web based company. We collect
logdata and convert those once per hour in .lzo format before we put them
in a YYYY/MM/DD/HH directory format on HDFS. We're also running an hourly
partition script to update the partitions in Hive.
We are also running Impala v1.1.1 on top of HDFS on a 20 node cluster
setup in production for several months now.
Since we are planning to do more advanced queries and we recently tried to
upgrade 5 of the machines we've updated to impala 1.4.0. Installed catalogd
and statestore on one node and 4 impalad deamons on 4 other nodes.
The problem is that it's taking a very long time to load our metadata.
Since we're partitioning on hourly basis, we have tables which contain
about 30K partitions. The initial describe query for the tables (which
should fetch the metadata after INVALIDATE METADATA)takes easily up to 4-5
hours.

While digging into the Catalog logs we came across the following:
I0905 13:01:21.322662 33913 Frontend.java:542] Requesting prioritized load
of table(s): default.data_A
I0905 13:03:21.335345 33913 Frontend.java:607] Missing tables were not
received in 120000ms. Load request will be retried.
I0905 13:03:21.335656 33913 Frontend.java:542] Requesting prioritized load
of table(s): default.data_A
I0905 13:05:21.347980 33913 Frontend.java:607] Missing tables were not
received in 120000ms. Load request will be retried.
I0905 13:05:21.348281 33913 Frontend.java:542] Requesting prioritized load
of table(s): default.data_A
I0905 13:07:21.360571 33913 Frontend.java:607] Missing tables were not
received in 120000ms. Load request will be retried.
I0905 13:07:21.360884 33913 Frontend.java:542] Requesting prioritized load
of table(s): default.data_A
I0905 13:09:21.372967 33913 Frontend.java:607] Missing tables were not
received in 120000ms. Load request will be retried.

Is there a way of speeding this up, or are we overlooking something?

To unsubscribe from this group and stop receiving emails from it, send an
email to impala-user+unsubscribe@cloudera.org.
To unsubscribe from this group and stop receiving emails from it, send an email to impala-user+unsubscribe@cloudera.org.

Search Discussions

Discussion Posts

Previous

Follow ups

Related Discussions

Discussion Navigation
viewthread | post
posts ‹ prev | 2 of 6 | next ›
Discussion Overview
groupimpala-user @
categorieshadoop
postedSep 5, '14 at 1:19p
activeOct 27, '14 at 8:07p
posts6
users4
websitecloudera.com
irc#hadoop

People

Translate

site design / logo © 2022 Grokbase