Schedule bound

Steve Loughran: "Looking at the other areas of work, I think scheduling will get the most interest from different people. Why? Because its where people like Platform Computing deliver value. It's not the APIs for grid computing, it's in distributing work to chosen machines. The current Job Scheduler works, but it is very simple. Every task worker node has a number of 'slots' -work is assigned to workers with spare slots. The scheduler is location aware, looking for the closest open slot to data, but there is no real examination of how much work a node is really doing, what the expected workload of the new job is (based on past experience), or anything resembling balanced scheduling between users. Over time, that's where there is going to be fun. Watch that space."

I think scheduling is interesting for another reason. Scheduling seems like a natural bottleneck in a  master/worker system. I've was looking at Hadoop for a project in work a while back (and to see if we can use it for general async/batch work) and while it's easy to get hung up on something like the namenode or reducers, or even the "it takes get used to" programming model, I kept coming back to the code that will decide when to put work into the jobservers - worried that it would dominate the system.

Tags:

1 Comment


    -there's some work on having different priority queues, and soon the ability to pause/resume jobs, though if you can't persist half-complete jobs to the filesystem, when the tracker goes down, there goes everything. Someone really needs deal with that high-level scheduling problem. Or even higher level: how many machines to allocate for tasks?


Post a comment