Managing Computers with Automation

Virtual machine snapshots considered (nearly) worthless...
With apologies to Edgar Dijkstra... Usually when people talk about virtual machine snapshotting, they include with it snapshotting both the server and any filesystems its directly connected to. Although this is more complex than just snapshotting the
A Complete Cluster Stack for Linux
Recently, I've had some folks ask me offline what exactly would a “complete” Linux cluster stack look like. That's a good question, and this posting is intended to address that question. So let's start with – what kind of cluster?...
How Managed Virtualization (including HA) conflicts with System Management
Managed Virtualization Versus System Management In an earlier post[1], I talked about a couple of kinds of virtualization, comparing two of them and highlighting their strengths. This posting discusses how virtualization can confuse and confound conv
A brief overview of load balancing techniques
Something that people commonly do which involves a form of automation is load balancing. Load balancing is the idea that incoming network requests are distributed across a set of servers which then each provide the same service. If you spread...
Quorum Server Illustrated - updated
In two earlier posts [1] [2], I gave brief descriptions of the quorum server which seem to have left as much confusion as they provided clarity. This post is only about the Linux-HA quorum server, and includes illustrations for clarity....
Alan eats his own cl_respawn dog food. Yum!!
In this posting, I show how to use cl_respawn[1] to monitor my system logging and help keep it running, and along the way, I improved cl_respawn a little as well. In addition, I explain why I couldn't just use the...
Availability, MTBF, MTTR and other bedtime tales
If we let A represent availability, then the simplest formula for availability is: A = Uptime/(Uptime + Downtime) Of course, it's more interesting when you start looking at the things that influence uptime and downtime. The most common measures that.
More about quorum - updated
In a previous article[1], I talked about quorum, and alluded to some more details about quorum which I'll discuss here in a little more detail. Let's examine a couple of common quorum tie-breaker methods, and see what's useful, and what's...
Bad application design => Bad availability (more Rockies ticket debacle)
One final quick note on the Rockies ticket sales debacle - following up on my previous posting[1] on the subject. This note discusses how to including the humans in your system design can improve both your perceived availability and your...
The cost of un-availability - and the value of a bad example
Today, the good people of Major League Baseball suffered what looked like a denial-of-service attack which kept them from selling tickets to the (at least) the World Series games in Denver (at Coors Field). This "attack" started at the same...