Hi,

I am trying to understand the effects of increasing block size or minimum

split size. If I increase them, then a mapper will process more data,

effectively reducing the number of mappers that will be spawned. As there is

an overhead in starting mappers, so this seems good.

However, If I increase their values too much, what negative effects will

come up? Put in other words, how to compute what is the best number of

mappers to start for processing a given size data on a cluster.

For calculations, let us assume- 100G of data, 4 machines (dual core).

Also if I set the reuse jvm flag to -1, will it make a difference?

Thanks,

Tarandeep