Can Hadoop limit CPU spare cycles?

Is it possible to run Hadoop to only use spare CPU cycles? That is, it would make sense to install Hadoop on work machines for humans so that the number crunching can be done when they are not using their PCs, and they don't experience an obvious performance degradation (fan off to the side!).

Perhaps this is just a case where the JVM starts up at low priority and doesn't use "too much" networking (assuming this is possible on a Windows machine)?

If not, does anyone know of any Java equivalents for things like BOINC ?

Edit: Found a list of the loop cleanup infrastructure here . Although my question about Hadoop still stands.

+2


source to share


2 answers


It very much depends on the intended use for Hadoop. Hadoop expects all of its nodes to be fully accessible and networked for optimal throughput - not what you get with workstations. Moreover, it doesn't even work on Windows (you can use it with cygwin, but I don't know if anyone uses it for "production" - except that client machines issue jobs).



Hadoop does things like chunks of data stores across multiple nodes and tries to schedule all computation on that data on those nodes; in a shared environment, this means that the task that needs this data will run on these three workstations - no matter what their users are currently doing. In contrast, clean up loops projects store all data elsewhere and send it and the task to any node that is currently available; this allows them to be more pleasant to the machines, but it comes with obvious data transfer costs.

+4


source


Perhaps Terracotta is more than your alley?



Terracotta product link

0


source







All Articles