FAQ
Hi,

I'm having trouble with refreshing metadata on a partitioned parquet table
when I move the files

I start off with my table:

select count(*) from t;

+-----------+
count(*) |
+-----------+
137521890 |
+-----------+

hadoop fs -mv /user/hive/warehouse/test.db/t/month=201402 foo

refresh t;

select count(*) from t;

+-----------+
count(*) |
+-----------+
0 |
+-----------+

So far everything is as expected, but then:

hadoop fs -mv foo /user/hive/warehouse/test.db/t/month=201402

refresh t;

select count(*) from t;

+-----------+
count(*) |
+-----------+
0 |
+-----------+

when I move the same data back to the same location and refresh the table,
it can't find the data anymore?

I've also tried invalidate metadata before refresh t to no avail

To unsubscribe from this group and stop receiving emails from it, send an email to impala-user+unsubscribe@cloudera.org.

Search Discussions

  • Gerrard Mcnulty at Mar 10, 2014 at 1:57 pm
    Yeah permissions are the same
    On Monday, March 10, 2014 12:31:24 PM UTC, Prateek Rungta wrote:

    Are the privs on the file are preserved during the move?


    On Mon, Mar 10, 2014 at 6:54 AM, <gerrard...@gmail.com <javascript:>>wrote:
    Hi,

    I'm having trouble with refreshing metadata on a partitioned parquet
    table when I move the files

    I start off with my table:

    select count(*) from t;

    +-----------+
    count(*) |
    +-----------+
    137521890 |
    +-----------+

    hadoop fs -mv /user/hive/warehouse/test.db/t/month=201402 foo

    refresh t;

    select count(*) from t;

    +-----------+
    count(*) |
    +-----------+
    0 |
    +-----------+

    So far everything is as expected, but then:

    hadoop fs -mv foo /user/hive/warehouse/test.db/t/month=201402

    refresh t;

    select count(*) from t;

    +-----------+
    count(*) |
    +-----------+
    0 |
    +-----------+

    when I move the same data back to the same location and refresh the
    table, it can't find the data anymore?

    I've also tried invalidate metadata before refresh t to no avail

    To unsubscribe from this group and stop receiving emails from it, send
    an email to impala-user...@cloudera.org <javascript:>.
    To unsubscribe from this group and stop receiving emails from it, send an email to impala-user+unsubscribe@cloudera.org.
  • Alan Choi at Mar 10, 2014 at 9:48 pm
    Hi Gerrad,

    Do you only see a dir under /user/hive/warehouse/test.db/
    t/month=201402, or do you see all the files? Make sure that the files are
    back to the original location (not in a sub dir).

    Thanks,
    Alan

    On Mon, Mar 10, 2014 at 6:57 AM, wrote:

    Yeah permissions are the same

    On Monday, March 10, 2014 12:31:24 PM UTC, Prateek Rungta wrote:

    Are the privs on the file are preserved during the move?

    On Mon, Mar 10, 2014 at 6:54 AM, wrote:

    Hi,

    I'm having trouble with refreshing metadata on a partitioned parquet
    table when I move the files

    I start off with my table:

    select count(*) from t;

    +-----------+
    count(*) |
    +-----------+
    137521890 |
    +-----------+

    hadoop fs -mv /user/hive/warehouse/test.db/t/month=201402 foo

    refresh t;

    select count(*) from t;

    +-----------+
    count(*) |
    +-----------+
    0 |
    +-----------+

    So far everything is as expected, but then:

    hadoop fs -mv foo /user/hive/warehouse/test.db/t/month=201402

    refresh t;

    select count(*) from t;

    +-----------+
    count(*) |
    +-----------+
    0 |
    +-----------+

    when I move the same data back to the same location and refresh the
    table, it can't find the data anymore?

    I've also tried invalidate metadata before refresh t to no avail

    To unsubscribe from this group and stop receiving emails from it, send
    an email to impala-user...@cloudera.org.
    To unsubscribe from this group and stop receiving emails from it, send
    an email to impala-user+unsubscribe@cloudera.org.
    To unsubscribe from this group and stop receiving emails from it, send an email to impala-user+unsubscribe@cloudera.org.
  • Alan Choi at Mar 10, 2014 at 9:48 pm
    Since you're still using 1.1.1, I would strongly encourage you to upgrade
    to 1.2.4. We've added a good number of features and bug fxies since then!

    Thanks,
    Alan

    On Mon, Mar 10, 2014 at 2:48 PM, Alan Choi wrote:

    Hi Gerrad,

    Do you only see a dir under /user/hive/warehouse/test.db/
    t/month=201402, or do you see all the files? Make sure that the files are
    back to the original location (not in a sub dir).

    Thanks,
    Alan

    On Mon, Mar 10, 2014 at 6:57 AM, wrote:

    Yeah permissions are the same

    On Monday, March 10, 2014 12:31:24 PM UTC, Prateek Rungta wrote:

    Are the privs on the file are preserved during the move?

    On Mon, Mar 10, 2014 at 6:54 AM, wrote:

    Hi,

    I'm having trouble with refreshing metadata on a partitioned parquet
    table when I move the files

    I start off with my table:

    select count(*) from t;

    +-----------+
    count(*) |
    +-----------+
    137521890 |
    +-----------+

    hadoop fs -mv /user/hive/warehouse/test.db/t/month=201402 foo

    refresh t;

    select count(*) from t;

    +-----------+
    count(*) |
    +-----------+
    0 |
    +-----------+

    So far everything is as expected, but then:

    hadoop fs -mv foo /user/hive/warehouse/test.db/t/month=201402

    refresh t;

    select count(*) from t;

    +-----------+
    count(*) |
    +-----------+
    0 |
    +-----------+

    when I move the same data back to the same location and refresh the
    table, it can't find the data anymore?

    I've also tried invalidate metadata before refresh t to no avail

    To unsubscribe from this group and stop receiving emails from it, send
    an email to impala-user...@cloudera.org.
    To unsubscribe from this group and stop receiving emails from it, send
    an email to impala-user+unsubscribe@cloudera.org.
    To unsubscribe from this group and stop receiving emails from it, send an email to impala-user+unsubscribe@cloudera.org.
  • Gerrard Mcnulty at Mar 11, 2014 at 11:04 am
    Hi alan,

    Yeah the files are in there. From my example you can see all I do is mv
    the root directory of the partition out, refresh, and then back in again so
    the files should be preserved? We've plans to upgrade to 1.2.4, but that
    won't be for a while :( Is this a known issue?
    On Monday, March 10, 2014 9:48:40 PM UTC, Alan wrote:

    Since you're still using 1.1.1, I would strongly encourage you to upgrade
    to 1.2.4. We've added a good number of features and bug fxies since then!

    Thanks,
    Alan


    On Mon, Mar 10, 2014 at 2:48 PM, Alan Choi <al...@cloudera.com<javascript:>
    wrote:
    Hi Gerrad,

    Do you only see a dir under /user/hive/warehouse/test.db/
    t/month=201402, or do you see all the files? Make sure that the files are
    back to the original location (not in a sub dir).

    Thanks,
    Alan


    On Mon, Mar 10, 2014 at 6:57 AM, <gerrard...@gmail.com <javascript:>>wrote:
    Yeah permissions are the same

    On Monday, March 10, 2014 12:31:24 PM UTC, Prateek Rungta wrote:

    Are the privs on the file are preserved during the move?

    On Mon, Mar 10, 2014 at 6:54 AM, wrote:

    Hi,

    I'm having trouble with refreshing metadata on a partitioned parquet
    table when I move the files

    I start off with my table:

    select count(*) from t;

    +-----------+
    count(*) |
    +-----------+
    137521890 |
    +-----------+

    hadoop fs -mv /user/hive/warehouse/test.db/t/month=201402 foo

    refresh t;

    select count(*) from t;

    +-----------+
    count(*) |
    +-----------+
    0 |
    +-----------+

    So far everything is as expected, but then:

    hadoop fs -mv foo /user/hive/warehouse/test.db/t/month=201402

    refresh t;

    select count(*) from t;

    +-----------+
    count(*) |
    +-----------+
    0 |
    +-----------+

    when I move the same data back to the same location and refresh the
    table, it can't find the data anymore?

    I've also tried invalidate metadata before refresh t to no avail

    To unsubscribe from this group and stop receiving emails from it,
    send an email to impala-user...@cloudera.org.
    To unsubscribe from this group and stop receiving emails from it,
    send an email to impala-user...@cloudera.org <javascript:>.
    To unsubscribe from this group and stop receiving emails from it, send an email to impala-user+unsubscribe@cloudera.org.
  • Alan Choi at Mar 11, 2014 at 6:02 pm
    Hi Gerrad,

    I don't think it's a known issue. In fact, I tried moving the files around
    and it works.

    I think the "hadoop fs -mv" cmd that you posted will move the data back
    into a subdir of the original dir. That's why I suggested you to check the
    files location.

    Thanks,
    Alan

    On Tue, Mar 11, 2014 at 4:04 AM, wrote:

    Hi alan,

    Yeah the files are in there. From my example you can see all I do is mv
    the root directory of the partition out, refresh, and then back in again so
    the files should be preserved? We've plans to upgrade to 1.2.4, but that
    won't be for a while :( Is this a known issue?

    On Monday, March 10, 2014 9:48:40 PM UTC, Alan wrote:

    Since you're still using 1.1.1, I would strongly encourage you to upgrade
    to 1.2.4. We've added a good number of features and bug fxies since then!

    Thanks,
    Alan

    On Mon, Mar 10, 2014 at 2:48 PM, Alan Choi wrote:

    Hi Gerrad,

    Do you only see a dir under /user/hive/warehouse/test.db/
    t/month=201402, or do you see all the files? Make sure that the files
    are back to the original location (not in a sub dir).

    Thanks,
    Alan

    On Mon, Mar 10, 2014 at 6:57 AM, wrote:

    Yeah permissions are the same

    On Monday, March 10, 2014 12:31:24 PM UTC, Prateek Rungta wrote:

    Are the privs on the file are preserved during the move?

    On Mon, Mar 10, 2014 at 6:54 AM, wrote:

    Hi,

    I'm having trouble with refreshing metadata on a partitioned parquet
    table when I move the files

    I start off with my table:

    select count(*) from t;

    +-----------+
    count(*) |
    +-----------+
    137521890 |
    +-----------+

    hadoop fs -mv /user/hive/warehouse/test.db/t/month=201402 foo

    refresh t;

    select count(*) from t;

    +-----------+
    count(*) |
    +-----------+
    0 |
    +-----------+

    So far everything is as expected, but then:

    hadoop fs -mv foo /user/hive/warehouse/test.db/t/month=201402

    refresh t;

    select count(*) from t;

    +-----------+
    count(*) |
    +-----------+
    0 |
    +-----------+

    when I move the same data back to the same location and refresh the
    table, it can't find the data anymore?

    I've also tried invalidate metadata before refresh t to no avail

    To unsubscribe from this group and stop receiving emails from it,
    send an email to impala-user...@cloudera.org.
    To unsubscribe from this group and stop receiving emails from it,
    send an email to impala-user...@cloudera.org.
    To unsubscribe from this group and stop receiving emails from it, send
    an email to impala-user+unsubscribe@cloudera.org.
    To unsubscribe from this group and stop receiving emails from it, send an email to impala-user+unsubscribe@cloudera.org.

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupimpala-user @
categorieshadoop
postedMar 10, '14 at 10:54a
activeMar 11, '14 at 6:02p
posts6
users2
websitecloudera.com
irc#hadoop

2 users in discussion

Gerrard Mcnulty: 3 posts Alan Choi: 3 posts

People

Translate

site design / logo © 2022 Grokbase