There's always a SPOF

Steve Loughran: "If you read the thread, a lot of people are upset. What's going on? they demand; fix it! they say. I sympathise with their point of view, but I don't agree with it. AWS can't say what's going on, not until they know. They can't fix it until after that. I've been on the receiving end of these 'fix it now' crises, and having lots of people on the phone doesn't help you find the problem any faster. So well done to the AWS team to (a) fixing it fairly quickly and (b) having so many users that the outage got such publicity!"

Lister's law: "people under time pressure don't think faster". I agree with Steve. And within Steve's point is a general anti-pattern of trying to go faster and introduce more stress when you're in the weeds.  But that's SOP. The most important thing seems to be to design a system that can be fixed in place when bad things happen. That means a lot of things that aren't normally considered part of software "design" - build management, release management, issue tracking, testing, configuration, deployment, rollbacks, logistics, cross-departmental processes, SLAs, findable documents and stacktraces. I really hope Steve writes a continuous deployment book.

Tags:

    tags:

1 Comment


    The 'people under time pressure don't think faster' link is broken. It's listed as 'http://www.dehora.net/journal/people%20under%20time%20pressure%20don'.


Post a comment

Your name:

Comment: