FAQ
Hi Keith,

Have you considered the following alternative?

1. insert overwrite tmp select from bar;
2. alter table foo set location <tmp's location>;
3. delete the original data from foo manually.

With this approach, it's bit more involved, but none of your inflight
"select" query will fail.

Thanks,
Alan

On Sat, Apr 26, 2014 at 12:59 PM, Keith Simmons wrote:

It was the queries. I believe the error thrown was a missing file. It's
possible that your write process is actually atomic, and that I just had a
query that broke because it was already inflight when the atomic switch
over occcured. I'm totally fine with that scenario.

I'd love to be able to so the following:

insert overwrite foo select from bar;

And have foo be queriable the whole time. What I'd like to happen is
something along the following lines:

1) impala selects data from bar and writes to an intermediate directory.
2) impala atomically points foo at the new data.
3) Impala deletes the old data (I can live with inflight queries breaking
at this point, though it would be even cooler if they didn't)

What I'm hoping is not happening is:

1) Impala deletes data from foo
2) Impala reads data from bar and writes into foo

Does this make sense?

Keith

On Saturday, April 26, 2014 11:06:02 AM UTC-7, Marcel Kornacker wrote:

On Fri, Apr 25, 2014 at 4:12 PM, Keith Simmons <keith....@gmail.com>
wrote:
I've noticed that insert overwrites are note atomic. This is based on a
simple empirical test where I repeatedly query a table via the
impala-shell
during a longish insert (a few minutes). At a certain point in the
overwrite, Impala starts returning errors.
What part specifically returns errors? The insert or the concurrent
queries?

I'm wondering two things:
A) Is this expected?
B) If so, are there any plans to change this
Could you clarify what you would like to see changed?
Right now, I can work around this by writing to an intermediate table, then
using alter table set partition DDLs to move the data to the final table
atomically. However, issuing a separate ddl for each partition is pretty
slow, and adds quite a bit of time to my daily migration. Crossing my
fingers that this is on your radar.
Could you clarify with specific statements how you would like this
workflow to look?
Thanks for the help.

Keith

To unsubscribe from this group and stop receiving emails from it, send an
email to impala-user...@cloudera.org.
To unsubscribe from this group and stop receiving emails from it, send an
email to impala-user+unsubscribe@cloudera.org.
To unsubscribe from this group and stop receiving emails from it, send an email to impala-user+unsubscribe@cloudera.org.

Search Discussions

Discussion Posts

Previous

Follow ups

Related Discussions

Discussion Navigation
viewthread | post
posts ‹ prev | 2 of 3 | next ›
Discussion Overview
groupimpala-user @
categorieshadoop
postedApr 26, '14 at 6:06p
activeApr 28, '14 at 10:07p
posts3
users3
websitecloudera.com
irc#hadoop

People

Translate

site design / logo © 2022 Grokbase