FAQ
Thanks Phillip.

The current install was done by CM but as I show below I let the linux
installation run amuk and do what it wanted to do. I could correct the Dell
machine but since IBM did a raid thing I'm thinking I have to re-format the
IBM and start over since I can't pull disks out of the RAID 0. Probably
will do both since I can do them in parallel and then re-run CM on "new
machines" - this time with the right file structure. That will give me a
clean setup with a bit more experience.

Mike

On Wednesday, April 3, 2013 6:08:43 PM UTC-7, Philip Zeyliger wrote:

Hi Mike,

The configuration "dfs.data.dir" controls what directories the datanodes
look like. The best practice here is to set up the disks as "JBOD", with
one mount per disk, and then point dfs.data.dir to that. Cloudera Manager
does its best, in the initial configuration, to guess at what that is, but
if you're re-configuring, you'll have to do it manually.

Hope this helps.

-- Philip


On Wed, Apr 3, 2013 at 4:34 PM, Mike Mc <[email protected] <javascript:>>wrote:
The plot thickens:

[[email protected] ~]# pvscan
PV /dev/sda2 VG vg_dell lvm2 [278.91 GiB / 0 free]
PV /dev/sdb1 VG vg_dell lvm2 [279.39 GiB / 0 free]
PV /dev/sdc1 VG vg_dell lvm2 [279.39 GiB / 0 free]
PV /dev/sdd1 VG vg_dell lvm2 [279.39 GiB / 0 free]
PV /dev/sde1 VG vg_dell lvm2 [279.39 GiB / 0 free]
PV /dev/sdf1 VG vg_dell lvm2 [279.39 GiB / 0 free]
Total: 6 [1.64 TiB] / in use: 6 [1.64 TiB] / in no VG: 0 [0 ]

[[email protected] ~]# pvscan
PV /dev/sda2 VG vg_ibm lvm2 [1.36 TiB / 0 free]
Total: 1 [1.36 TiB] / in use: 1 [1.36 TiB] / in no VG: 0 [0 ]

These machines are test vehicles so please suggest any option appropriate.

Thanks,

Mike





On Wednesday, April 3, 2013 2:50:40 PM UTC-7, Mike Mc wrote:

I successfully installed two data nodes and a manger node. Worked out a
few minor kinks and all is working fine with small jobs however when I
tried to run TeraGen it crashed. After some investigation I believe I found
the root cause.

I did a fresh minimal install of CentOS 6.4 on all machines. The two
data nodes both have 6ea 300G disk drives. CentOS installs lvm2 by default.
Now I only get Cluster Configured Capacity of ~89G when I was expecting
~1.3T on each machine.

[[email protected] ~]# sudo -u hdfs hdfs dfsadmin -report
Configured Capacity: 95120437248 (88.59 GB)
Present Capacity: 94225088847 (87.75 GB)
DFS Remaining: 94098546688 (87.64 GB)
DFS Used: 126542159 (120.68 MB)
DFS Used%: 0.13%
Under replicated blocks: 98
Blocks with corrupt replicas: 0
Missing blocks: 0

------------------------------**-------------------
Datanodes available: 2 (2 total, 0 dead)

Live datanodes:
Name: 192.168.21.232:50010 (Dell.test.net)
Hostname: Dell.test.net
Rack: /default
Decommission Status : Normal
Configured Capacity: 47560218624 (44.29 GB)
DFS Used: 63279439 (60.35 MB)
Non DFS Used: 328152753 (312.95 MB)
DFS Remaining: 47168786432 (43.93 GB)
DFS Used%: 0.13%
DFS Remaining%: 99.18%
Last contact: Wed Apr 03 14:33:48 PDT 2013


Name: 192.168.21.231:50010 (IBM.test.net)
Hostname: IBM.test.net
Rack: /default
Decommission Status : Normal
Configured Capacity: 47560218624 (44.29 GB)
DFS Used: 63262720 (60.33 MB)
Non DFS Used: 567195648 (540.92 MB)
DFS Remaining: 46929760256 (43.71 GB)
DFS Used%: 0.13%
DFS Remaining%: 98.67%
Last contact: Wed Apr 03 14:33:49 PDT 2013

So I read a lot of threads about issues with creating multiple hdfs
directories on partitions and looked for something about lvm2 and have
found one thread from some time back that indicated lvm2 was a bad idea for
MapR. Obviously I have a problem and I do now understand that mapr will
under perform with this structure.

I can break 5 disks out of the lvm on each machine if needed but is this
the best (only) solution? Then the question is do I have to do other clean
up of hdfs to get it to play nice since some of it may get left on the lvm
drive. Any hints on getting the 5 disks hooked back into hdfs would be
helpful but I have seen some threads on add new drives so I have a (very)
rough idea of what's required.

Thanks

Mike




Search Discussions

Discussion Posts

Previous

Related Discussions

Discussion Navigation
viewthread | post
posts ‹ prev | 4 of 4 | next ›
Discussion Overview
groupscm-users @
categorieshadoop
postedApr 3, '13 at 9:52p
activeApr 4, '13 at 1:57a
posts4
users2
websitecloudera.com
irc#hadoop

2 users in discussion

Mike Mc: 3 posts Philip Zeyliger: 1 post

People

Translate

site design / logo © 2023 Grokbase