Finely-Sliced Operations

11 Mar 2010

Since the previous post, I've reading up more on the devops movement and found a few nice pieces, particularly Graham Bleach on internal borders.

Thinking about borders within operations departments, I'm reminded of a recent client who operated with a mind-blowing number of siloed operations teams.  Coming at it from the development side, working through a problem with the operations staff was like exploring a never-ending labyrinth of teams.  Every time I probed a little deeper, my contact would defer responsibility to another mysterious team, usually located somewhere distant, that owned a slightly lower level of the stack.  Once I tracked down a real person in the mysterious team, the same thing would happen again, and I would be left searching for another distant team.  It seemed like I would never actually get to the real hardware.  An unforeseen consequence of virtualization is that it allows someone to declare themselves in charge of a Virtual Machine, but not in charge of the host or guest operating systems nor the hardware itself - staggeringly unhelpful.

The genius of these extremely-finely-sliced organisations is that they provide innumerable cracks down which responsibility, ownership and useful work can fall.  If a team has a sufficiently narrow focus, it is almost certain that no problem will ever occur that falls cleanly within its boundaries, so there is no responsibility at all - a manager's dream.

One thing that encourages me a little bit is a recent trend for counting the number "engineers" in an organisation.  I think all technical staff are counted in this metric, though I'm not sure whether people are including managers and other IT staff who might not be hands on.  In his QCon talk, Aditya Agarwal was very proud of Facebook's metric of 1.1 million users per engineer, while Andres Kütt was similarly proud of Skype's 600,000 users per employee.  Note that Facebook is very open about its number of engineers but secretive about its number of employees - I have no idea why.  I think counting the total number of technical staff is useful because helps people look at the broader cost of running software - not just development or just operations.  Also, I hope in the long term these kinds of metrics will highlight the painful inefficiency of silos and barriers.