FAQ
I ended up having to:

1. Create a text table in impala
2. Switch to hive and alter the tables field and row delimiters
3. Perform the sqoop
4. Switch to impala, refresh the sqoop'd text table.
5. Load parquet table from sqoop'd text table.

All to avoid invalidate metadata.

Regards Andrew
________________________________
From: Lenni Kuff
Sent: ‎20/‎01/‎2014 22:21
To: impala-user@cloudera.org
Subject: Re: Invalidate metadata takes 1 hour to complete

Hi Sammy,
We do not current have any plans to support a database scoped "invalidate
metadata" command. We do plan on making enhancements to Impala to
automatically detect changes in the Hive Metastore, so "invalidate
metadata" should rarely/never be needed. Hopefully this will address your
needs.

I would also like to remind you that if your DDL operations are executed
via Impala there is no need to run "invalidate metadata" to view the new
tables/databases.

Thanks,
Lenni

On Mon, Jan 20, 2014 at 11:33 AM, Sammy Yu wrote:

Hi guys,
Sorry to hijack this thread. I know a solution is coming, but I'm
experiencing invalidate metadata with tables that have multiple
partitions. I tried to address this issue by creating a separate
database for each of these tables hoping that invalidate metadata
would be confined to a particular database, but it appears to scan all
the databases. I know that IMPALA-737 and IMPALA-736 will help quite
a bit, but is it possible to create some option for invalidate
metadata to only work within the scope of the current database.

Best,
Sammy

On Sat, Jan 18, 2014 at 12:07 AM, Andrew Stevenson
wrote:
We'll either do that or run an empty Sqoop at the end of the working and
take the invalidate metadata hit before the real extract process starts. We
can't start it till 1am anyway. That way when the parquet conversion happens
we can use the refresh table command.

Regards Andrew
________________________________
From: Alan Choi
Sent: ‎18/‎01/‎2014 07:45

To: impala-user@cloudera.org
Subject: Re: Invalidate metadata takes 1 hour to complete

Hi Andrew,

The more effective way is to create the table through Impala. Then, you
don't need to call invalidate metadata. Is there anything that would prevent
you from creating table in Impala?

Thanks,
Alan


On Fri, Jan 17, 2014 at 12:28 PM, Andrew Stevenson <
astevenson@outlook.com>
wrote:

Lenni,

To clarify, our issue was not only the time the invalidate metadata command
took but also that it blocked other DDL statements on other nodes at the
same time. I believe it triggers a reload in catalogd. I wonder if the
sync_ddl command would help here?

My expected pattern would have been:

Ask catalogd for a new cache
Catalogd checks for deltas in the Hive metastore and returns deltas plus
cache to requesting node. This could be optionally specified by the
requesting node. For example

Give me what you have,
Give me what you have plus deltas.
Reload and give me everything from scratch but don't affect other nodes
cache.

Requesting node swaps out cache once he's got it back.


Not blocking other nodes of course!

Parquet support for SQOOP would also be nice then I wouldn’t need to convert
it via IMPALA! And Flume while your at it😊!!

Regards Andrew

From: Andrew Stevenson
Sent: ‎Friday‎, ‎17‎ ‎January‎ ‎2014 ‎19‎:‎10
To: impala-user@cloudera.org

We are looking at collapsing the common tables but we'll still have a large
number as more internal systems switch to hbase and impala.

Is the number of tables the bottleneck? If so what's the tipping point?

Regards Andrew
________________________________
From: Lenni Kuff
Sent: ‎17/‎01/‎2014 17:48
To: impala-user@cloudera.org
Subject: Re: Invalidate metadata takes 1 hour to complete

Hi Andrew,
We have identified some performance issues around loading table metadata as
the number of tables is scaled up. These will be resolved in our upcoming
v1.2.4.

One of the changes we are making is that we will now lazily load the table
metadata. This will also significant improve the performance of
"invalidate
metadata". The JIRA tracking this is:
https://issues.cloudera.org/browse/IMPALA-737

We will also be updating "invalidate metadata <table name>" to support
adding the table to the metadata cache if it doesn't exist in impala. This
is tracked in this JIRA:
https://issues.cloudera.org/browse/IMPALA-736

v1.2.4 should be out within in less than 2 weeks if everything goes as
planned.

Let me know if you have any questions.

Thanks,
Lenni





On Fri, Jan 17, 2014 at 8:40 AM, Andrew Stevenson <
astevenson@outlook.com>
wrote:

Hi Guys,

We've upgraded to IMPALA 1.2.3 and face issues with the catalogd. We have
75,000 tables in HIVE, we SQOOP around 1000 a day in LZOP format and convert
PARQUET in IMPALA. We used to issue invalidate metadata after the SQOOP but
if we do this now we are blocked for at least an hour.

Invalidate metadata also becomes a problem if users issue the command.

Any suggestions? I can try reducing the number of tables active in the
metastore but it's only a temporary measure.


Thanks

Andrew

To unsubscribe from this group and stop receiving emails from it, send an
email to impala-user+unsubscribe@cloudera.org.


To unsubscribe from this group and stop receiving emails from it, send an
email to impala-user+unsubscribe@cloudera.org.

To unsubscribe from this group and stop receiving emails from it, send an
email to impala-user+unsubscribe@cloudera.org.


To unsubscribe from this group and stop receiving emails from it, send an
email to impala-user+unsubscribe@cloudera.org.

To unsubscribe from this group and stop receiving emails from it, send an
email to impala-user+unsubscribe@cloudera.org.
To unsubscribe from this group and stop receiving emails from it, send an
email to impala-user+unsubscribe@cloudera.org.
To unsubscribe from this group and stop receiving emails from it, send an email to impala-user+unsubscribe@cloudera.org.

To unsubscribe from this group and stop receiving emails from it, send an email to impala-user+unsubscribe@cloudera.org.

Search Discussions

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupimpala-user @
categorieshadoop
postedJan 20, '14 at 9:49p
activeJan 20, '14 at 9:49p
posts1
users1
websitecloudera.com
irc#hadoop

1 user in discussion

Andrew Stevenson: 1 post

People

Translate

site design / logo © 2022 Grokbase