Next: Beowulfs Made to Order: Up: Building a Beowulf Previous: Tools and Tricks Contents

The Food Chain: Recycling your Beowulf

One lovely thing about cluster computing, including beowulfery per se is that there is a natural life cycle for cluster compute nodes. Let us meditate upon this life cycle for a moment.

Your grant is approved, your company agrees: You can Build a Beowulf. You collect quotes, select near-bleeding edge hardware, and put the whole thing together. It works! You do all sorts of fabulous research, or invent a new drug, or solve some critical problem, over the next two or three years.

Suddenly your once-new cluster looks pretty shabby. Ten percent or so of the nodes have given up the ghost altogether and been cannibilized for parts or been repaired at modest expense. Worse, Moore's Law has continued its inexorable march and it is getting hard to find nodes as slow and ill equipped with memory as your are any more, even for a paltry $500 each.

What do you do?

Well, an obvious thing to do is to buy shiny new nodes from current technology, and replace your old cluster with a new one some eight times faster at equivalent cost, but that leaves one with the problem of what to do with all the old nodes.

Welcome to the food chain. As systems age out (in any LAN or cluster environment) they gradually ``lose value'' compared to current technology, because

Their reliability is diminished. Hardware failure starts costing administrative time and loss of productivity, especially if the cluster runs tightly coupled code.
Their warranties (even extended warranties) expire, so you have to start paying out of pocket to fix them.
Moore's Law dictates that spending a mere $100 to fix a three year old node is less cost-effective than using that $100 to pay for part of a (six to eight times faster) replacement.
The overhead costs (power, cooling, space, networking, human management) for nodes scale more strongly with the number of nodes than with their speed and power. A 2.4 GHz Intel-based node might consume twice the electricity of a 400 MHz Intel-based node (to take a current snapshot that will of course require retranslation as these numbers advance) but requires 1/6 the space, 1/3 the TOTAL power and cooling, and 1/6 the management effort of the six 400 MHz nodes it might replace.
Amdahl's law (see chapter 4) usually favors faster nodes to get more work done. One node at 2.4 GHz will generally complete work more than six times faster than six 400 MHz nodes. At best the six nodes would be equally fast, or nearly so.

Consideration of the above cruel facts may, in fact, convince you that it is better to upgrade your cluster more often than every three years. A lot of folks (myself included) try to arrange to upgrade their clusters once a year, with an explicit line item in each year's budget for a new set of nodes based on the technology du jour, skimming along near the crest of Moore's law instead of being lifted up to the top of the wave every three years only to wipe out in the troughs in between.

A totally dispassionate review of the Total Cost of Ownership (TCO) of the nodes in an associated Cost-Benefit Analysis (CBA) might well dictate throwing the nodes away every twelve to eighteen months rather than operating them until they die of old age. After this period, new technology is typically roughly 2x faster at equivalent cost, the overhead for operating the older nodes is 2x as great (per unit of work done), and the human cost of waiting for (presumably valuable) work to complete is often far greater than any of the hardware or operational costs. I have seen Real Live CBA's that prove this to be the case in at least some environments.

However, the proof depends to some extent upon the assumptions made (to include the infrastructure costs or not, to include the cost of the human time spent waiting for results or not). Given a set of assumptions and an assignment of costs and benefits, I can do no better than quote Dr. Josip Loncaric, a venerable and respected beowulfer^13.1:

Picking the best hardware replacement interval is an analytically solvable problem. Assuming that performance per $ doubles every N months, the most cost effective policy is to buy replacements whenever you can get 4.92155 times the performance for the same money. The Moore's law says that N=18, so the best replacement interval works out to be 3.44867 years. Using intervals of 3-4 years is almost as good.

This is the general view - most people view three years plus to be the ideal replacement cycle, and as Josip points out this is analytically justifiable. Note that this does not mean that replacing your cluster every three years is ideal - in general it will usually be better to replace 1/4-1/3 of your cluster (all three year old machines) every year, not replace the whole thing every three years. However, an equally good argument for a much shorter replacement cycle has been sent to me by a very competent list person who accounts for things like hardware reliability and so forth ignored by Josip. As always in cluster engineering, your mileage may vary according to your particular needs and cost/benefit landscape.

This still leaves one with the question of what to do with all the nodes one accumulates as they gradually age out, whether they age out in one year or five. The following are some very generic suggestions:

Turn 1-3 year old nodes into desktops within your organization. Since one often buys relatively advanced nodes, they are likely to be strong enough to make good desktops (possibly enhanced with e.g. sound cards and CD drives) even when they are too slow to be terribly productive on your mail problem. This is the ``food chain'' - passing systems down in the organization until they become worthless to even the least demanding user.
Create a hierarchy of clusters. Even older nodes can be useful for some problems, e.g. embarrassingly parallel projects with infinite time requirements (like my own research, so I know that projects like this exist). More or less unsupported, older nodes can often run for years doing useful work, even if it isn't your useful work. The downside of this is the power, cooling, space and network infrastructure the nodes consume. It costs roughly $100 in power and cooling per year per (presumed 100W) node at $0.08/KW-hr. A $500 node will cost $500 more in power over a five year lifetime. By the third or fourth year one reaches break-even on spending the power money alone for six nodes on a single node with six times the speed and less than six times the power requirement. However, in some cases you pay for new nodes, while power and AC are ``free'' (paid for by somebody else). Just one example of nonlinear cost profiles and how they distort decision making...
Donate older nodes to organizations that can use their remaining lifetime profitably. For example, schools are often desperate for computers, and a three year old node (with a bit of updating) may be far better than what they have. Ditto for a number of non-profit entities. Schools may even be happy to take an entire beowulf, intact, so that they can use it to teach beowulfery! This is ``like'' passing it down the food chain within your organization, only passing it down to a different and poorer chain altogether. Sometimes realizing a tax deduction or other benefit in the meantime.
Eventually, a node becomes junk. As in, it isn't worth plugging into a wall by anybody, even somebody poor and compute-power deprived. Or it is broken, and not worth fixing. Nodes in this state need to really be recyled, and not just thrown in a landfill.

Note well that computers contain a variety of toxic materiels. There is typically mercury in the little battery that backs up the bios. There is arsenic in the doped silicon in the IC wafers. There may be lead, cadmium and a number of other heavy metals used in various sub-assemblies. Computers also contain some valuable metals. There is gold on the contacts, for example, and plenty of copper everywhere.

There are good sides and bad sides to all of this. Node ``recycling'' often involves third world child labor and toxic materials (such as mercury) to extract the gold, and frequently ignores the rest of the toxic metals that build up whereever they ultimately dispose of the parts once the gold is mined out of it. We don't have the technology to disassemble nodes into reusable micro components, and even the reuseable macro components (such as the case and power supply, the drives, and so forth) tend not to be reusable for more than three to five years before they no longer work with current technology at all.

It doesn't do any good to recycle nodes ``properly'' where properly means sending them off to India to provide short term jobs and a toxic future for small Indian children. However, dumping them in landfills here isn't terribly wise either. Perhaps the best approach is to recycle the mercury-laden components (the battery) by hand, and landfill the rest, accepting that the arsenic and so forth will eventually show up in the water table. I'd be happy to hear better suggestions as this document reaches more people, and will cheerfully update this chapter as better ideas emerge. [email protected], people.

Next: Beowulfs Made to Order: Up: Building a Beowulf Previous: Tools and Tricks Contents

Robert G. Brown 2004-05-24