I am wondering how Hadoop assign groups when dirs/files are being created
by a user and below are some tests I have done. In my cluster, group hadoop
is configured as the supergroup.
hadoop fs -ls /tmp
drwxrwxrwx - abc hadoop 0 2012-08-10 23:02 /tmp/abc
drwxrwxrwx - def other_group 0 2012-08-10 23:02 /tmp/def
groups apache
apache: apache wheel
sudo -u apache hadoop fs -put somefile /tmp/abc
hadoop fs -ls /tmp/abc
-rw-rw-r-- 3 apache hadoop 120962 2012-08-13 16:03 /tmp/abc/somefile
sudo -u apache hadoop fs -put somefile /tmp/def
hadoop fs -lsr /tmp/def
-rw-rw-r-- 3 apache other_group 120962 2012-08-13
16:03 /tmp/abc/somefile

*Based on the experiments above, it looks like the file got pushed on hdfs
is always inheriting its group from the parent including folder. Is that
always the case?*

A follow-up question on one finding in Hive is: when executing a query to
overwrite a table (or a partition within a table), the newly written
overriding directory always end up as belong to HDFS's supergroup, no
matter what context it is running from
1. The user who is executing the hive query
2. The group where the user belongs to
3. The group the parent table directory is belonging to.
*Is it always expected in Hive?*

For example, table A is stored on /path/A and is partitioned on column
dh. /path/A is with group *other_group*.
After running *insert overwrite A partition (dh = "12") select column list
from ... where ...*

/path/A/12 ends up with *hadoop* as always. This has contradicts to the
assumption of inheritance I have drawn out above. Any thoughts would be


Search Discussions

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupuser @
postedAug 13, '12 at 4:21p
activeAug 13, '12 at 4:21p

1 user in discussion

Chen Song: 1 post



site design / logo © 2021 Grokbase