I'm new to setting up hadoop's scheduler and i'm trying to set up
Fairscheduler on a 3-node cluster. The initial setup is fine but
throughput is abysmal.
Each node is configured with 16 map task capacity and 8 reduce task
capacity. Most jobs being run are reading data from cassandra installed
on the same nodes using ColumnFamilyInputFormat.
With the default scheduler these jobs take from 5 to 15 minutes.
When i plug in the fairscheduler they take from one to many hours.
What i see is that the map task capacity is not being used. Jobs now
only run 3 map tasks at a time whereas before they would always run all
48 map tasks.
This is without any custom fair-scheduler.xml configuration. But i've also
tried configuring userMaxJobsDefault, maxRunningJobs, and weight
without any luck.
I've also tried adding mapred.fairscheduler.locality.delay=0 without any
Is it possible with fairscheduler to get the same throughput when only
one job is running as it is with hadoop's default scheduler? Am i
missing something obvious?