FAQ
Before we implement SHOW PARTITIONS in Impala, I guess you might have to
use Hive's describe formatted and then do some parsing

describe formatted <table> partition


*partition_spec*


On Tue, Mar 11, 2014 at 12:54 AM, Tivona Hu wrote:

Thanks Alan :)

This approach makes me think about using a "double-buffer" partition for a
growing parquet. However, it seems we can't get current partition location
if the data's in parquet format?
http://stackoverflow.com/questions/18003038/is-there-a-way-to-show-partitions-on-cloudera-impala

Does anyone know if there's any way to get the location by shell or api?
Or I may have to use other database to keep track of it.

Thanks!



2014-03-11 5:38 GMT+08:00 Alan Choi <alan@cloudera.com>:

Hi Tivona,
You can issue the following stmt in Impala

ALTER TABLE <table name> PARTITION (part_col=val, part_col=val...) SET
LOCATION <path>

to update the partition location. By doing it in Impala, you don't even
need to call "invalidate metadata".

Thanks,
Alan

On Mon, Mar 10, 2014 at 3:15 AM, Tivona Hu wrote:

Hi Keith,

Could you kindly provide some hints on how to write the metastore? I
couldn't find appropriate API to do this.
Really appreciate the reply :)

Keith Simmons於 2014年3月5日星期三UTC+8上午5時28分32秒寫道:
We do something similar to this. We manually tell impala where the
parquet files are by writing to the hive metastore. You can only tell the
metastore about directories, not specific files, so when we want to drop in
a new file, we create a new timestamped directory. We then drop the new
file in there, update the metastore, then tell impala to update it's
metadata. Once the invalidate metadata call has completed, we delete the
old timestamped directory. So for example:

Before update:

my_table/partition_1/20140203/old-parquet-file.parquet

After update:

my_table/partition_1/20140203/old-parquet-file.parquet
my_table/partition_1/20140204/new-merged-parquet-file.parquet

After invalidate metadata has completed:

my_table/partition_1/20140204/new-merged-parquet-file.parquet

We don't have long in-flight queries, so we normally delete the old
data directory as soon as impala has refreshed its metadata, but you could
easily give it some extra time to make sure all queries have completed.

Keith

On Mon, Mar 3, 2014 at 12:14 AM, Tivona Hu wrote:

Thanks Nong for the reply :)

Then I'm wondering if there's any way to do a "virtual drop" on a
file..

For example, I convert staging avro files to parquet every hour:
data_1pm.parquet
data_2pm.parquet (= data_1pm + new data generated between 1pm~2pm)

At 2pm, I want to do a refresh to virtually remove data_1pm.parquet
and add data_2pm.parquet to the metastore.
So if any on-going query happened between data_1pm& data_2pm, it will
not fail since the file's still there physically.
And any query happens after data_2pm will only query data_2pm since
the metastore's refreshed.

Then finally after few hours, I can delete the file data_1pm
physically since there should be no existing queries associated with it.

Anyone has idea is it possible to do this kind of things?

Thanks!

Nong於 2014年2月28日星期五UTC+8上午2時31分59秒寫道:
Impala doesn't do anything special for concurrent reads and writes as
this
really needs to be handled at the storage layer. When a file is
deleted in HDFS,
the file will be removed even if there are active readers (HDFS
doesn't track active
readers).

Simultaneous read queries and refreshes are fine. Simultaneous read
queries
and deletes will cause the read queries to fail with file doesn't
exist errors.

On Wed, Feb 26, 2014 at 11:36 PM, Tivona Hu wrote:

I'm using Impala with parquet table and staging phase described
here:
https://github.com/cloudera/cdk-examples/tree/master/dataset-staging

All looks good but I'm wondering how Impala actually handles
concurrent read/write?
I mean, what will happen if I overwrite a parquet file in the data
warehouse and refresh the corresponding table while another person's
querying that table?

Thanks!

To unsubscribe from this group and stop receiving emails from it,
send an email to impala-user...@cloudera.org.
To unsubscribe from this group and stop receiving emails from it,
send an email to impala-user...@cloudera.org.
To unsubscribe from this group and stop receiving emails from it,
send an email to impala-user+unsubscribe@cloudera.org.
To unsubscribe from this group and stop receiving emails from it, send
an email to impala-user+unsubscribe@cloudera.org.
To unsubscribe from this group and stop receiving emails from it, send an
email to impala-user+unsubscribe@cloudera.org.
To unsubscribe from this group and stop receiving emails from it, send an email to impala-user+unsubscribe@cloudera.org.

Search Discussions

Discussion Posts

Previous

Related Discussions

Discussion Navigation
viewthread | post
posts ‹ prev | 9 of 9 | next ›
Discussion Overview
groupimpala-user @
categorieshadoop
postedFeb 27, '14 at 7:36a
activeMar 11, '14 at 5:58p
posts9
users5
websitecloudera.com
irc#hadoop

People

Translate

site design / logo © 2022 Grokbase