FAQ
The 7th question should've been the first to rather obviate the need for
some of the other 6. So, if the data is binary, MR is of little use anyway.
Didn't understand and likely believe when you say this:
"No, entire data is equally important and will be read together."

Other than that, an 8th question:
8. how much read latency can the system tolerate?

and a 9th:
9. what is the usable size of a unit of data being read? it being binary,
does the entire stream have to be read to make sense of it for the
application are parts of the binary usable?


If you can get away with some read-latency, take a look at one of the
commercial erasure coding solutions out there (like Cleversafe) or just
code one yourself. Also, see: https://issues.apache.org/jira/browse/HDFS-503

hth

On Thu, Nov 29, 2012 at 2:19 AM, Mohammad Tariq wrote:

Hello Gaurav,

Thank you so much for your reply. Please find my comments embedded
below :

1. do you know if there exist patterns in this data?
Yes, entire file is divided into data blocks of fixed length (But there
is no separator between 2 blocks).

2. will the data be read and how?
Yes, data has to be read. To be honest, we are still not sure how to do
that.

3. does there exist a hot subset of the data - both read/write?
No, entire data is equally important and will be read together.
4. what makes you think hdfs is a good option?
Distributed architecture, Flexibility to read any kind of data,
Parallelism, Native MR integration, Cost, Fault tolerance, High
throughput etc.

5. how much do you intend to pay per TB?
I have to discuss it with my superiors (Will let you know soon).
6. say you do build the system, how do you plan to keep lights on?
I am sorry I did not get this. I mean i'll do whatever it takes to keep
everything moving. I have some experience with small clusters. And I have
got a small team with me which is ready 24*7.

7. forgot to ask - is the data textual or binary?
Data is binary.
No, I would require some help. I have a team with me as I have said. But
being new to Hadoop I would need some help from whatever source it is.

Many thanks.

Regards,
Mohammad Tariq



On Thu, Nov 29, 2012 at 5:40 AM, Gaurav Sharma <gaurav.gs.sharma@gmail.com
wrote:
So, before getting any suggestions, will have to explain a few core
things:

1. do you know if there exist patterns in this data?
2. will the data be read and how?
3. does there exist a hot subset of the data - both read/write?
4. what makes you think hdfs is a good option?
5. how much do you intend to pay per TB?
6. say you do build the system, how do you plan to keep lights on?
7. forgot to ask - is the data textual or binary?

Those are just the basic questions. Are you going to be building and
running the system all by yourself?

On Nov 28, 2012, at 14:09, Mohammad Tariq wrote:

Hello list,

Although a lot of similar discussions have been done here, I still
seek some of your able guidance. Till now I have worked only on small or
mid-sized clusters. But this time situation is a bit different. I have to
cpollect a lot of legacy data, stored over last few decades. This data is
on tape drives and I have to collect it from there and store in my cluster.
The size could go somewhere near 24 Petabytes (inclusive of replication).
Now, I need some help to kick this off, like what could be the optimal
config for my NN+JT, DN+TT+RS, HMaster, ZK machines?
What should be the no. of slaves and ZK peers nodes keeping this config in mind?
What is the optimal network config for a cluster of this size.

Which kind of disks would be more efficient?

Please do provide me some guidance as I want to have some expert
comments before moving ahead. Many thanks.
Regards,
Mohammad Tariq

Search Discussions

Discussion Posts

Previous

Follow ups

Related Discussions

Discussion Navigation
viewthread | post
posts ‹ prev | 4 of 5 | next ›
Discussion Overview
grouphdfs-user @
categorieshadoop
postedNov 28, '12 at 10:10p
activeNov 30, '12 at 12:01a
posts5
users2
websitehadoop.apache.org...
irc#hadoop

2 users in discussion

Mohammad Tariq: 3 posts Gaurav Sharma: 2 posts

People

Translate

site design / logo © 2021 Grokbase