FAQ
So, before getting any suggestions, will have to explain a few core things:

1. do you know if there exist patterns in this data?
2. will the data be read and how?
3. does there exist a hot subset of the data - both read/write?
4. what makes you think hdfs is a good option?
5. how much do you intend to pay per TB?
6. say you do build the system, how do you plan to keep lights on?
7. forgot to ask - is the data textual or binary?

Those are just the basic questions. Are you going to be building and running the system all by yourself?

On Nov 28, 2012, at 14:09, Mohammad Tariq wrote:

Hello list,

Although a lot of similar discussions have been done here, I still seek some of your able guidance. Till now I have worked only on small or mid-sized clusters. But this time situation is a bit different. I have to cpollect a lot of legacy data, stored over last few decades. This data is on tape drives and I have to collect it from there and store in my cluster. The size could go somewhere near 24 Petabytes (inclusive of replication).

Now, I need some help to kick this off, like what could be the optimal config for my NN+JT, DN+TT+RS, HMaster, ZK machines?

What should be the no. of slaves and ZK peers nodes keeping this config in mind?

What is the optimal network config for a cluster of this size.

Which kind of disks would be more efficient?

Please do provide me some guidance as I want to have some expert comments before moving ahead. Many thanks.

Regards,
Mohammad Tariq

Search Discussions

Discussion Posts

Previous

Related Discussions

Discussion Navigation
viewthread | post
posts ‹ prev | 2 of 2 | next ›
Discussion Overview
groupuser @
categorieshadoop
postedNov 28, '12 at 10:10p
activeNov 29, '12 at 12:11a
posts2
users2
websitehadoop.apache.org
irc#hadoop

2 users in discussion

Gaurav Sharma: 1 post Mohammad Tariq: 1 post

People

Translate

site design / logo © 2021 Grokbase