Ready—Charge! How We Handle Outages at Jive
Every once in a while, an outage disrupts Jive’s service. It happens; we acknowledge that. No company is error-free, and we don’t pretend to be either. However, because we also want to provide quality service 24-7, we have procedures in place to make sure outages are short-lived, infrequent, and, all in all, unlikely to ruin your day.
When faced with an outage, Jive has two battlefronts: the dev side and the customer experience side. We’ll start with the dev side.
Dev begins on the defensive, keeping an eye on processes that monitor our systems for error increase, malfunctions, etc. If suspicious activity is coming in, an alert goes out to all developers via this system. Dev then notifies all heads of other departments so they can be aware and ready to respond to customers. The goal is to catch problems before customers complain about them and to fix them ASAP.
After notifying the other departments, dev goes on the offensive. Outages take precedence over all other tasks, and dev attacks the problem until they’ve solved it and Jive’s service is back to normal. If an outage occurs on the weekend or after hours, there’s a team of developers on call to deal with it. And even though the on-call team rotates and specific people are assigned to it each week, everyone is encouraged to jump in and help in a crisis, and they do. Surprised that people would sacrifice their free time this way? You shouldn’t be. All the developers at Jive want to give customers quality service, and it shows in the effort they put forth.
On the other side of the coin, the customer experience team manages communicating with customers during outage times. When an issue occurs, they go into battle mode. First of all, they determine the scope of the outage. Is it affecting just one user? A PBX? A phone? Once this comes to light, they begin the customer-facing messages. Using a server that’s separate from our network (so it hasn’t gone down), they send updates on the status of the outage to all subscribers to the network. (The network, status.getjive.com, functions kind of like Twitter, but just for Jive customers.)
Then customer experience convenes a war council with development in the “dev cave.” Customer Experience Director David Rowley meets with Phil Holmes, the developer who heads up fixing outages, and keeps tabs on the status of the problem, the details, etc., to make sure the right information gets sent to customers. Also, the team leads for technical support and customer service work with their team members to get all the tickets relating to the outage put in one place so they can respond to them with the latest updates. Paige Guthrie, Jive’s Multimedia Specialist, also posts updates on Jive’s Facebook, Twitter, and Google+ accounts. Finally, we change our inbound phone message to tell customers that there’s an outage and that they can find updates about it on the status page of Jive’s website.
We at Jive do all we can to make inconveniences as convenient as possible. On keeping customers informed, Rowley says that Jive’s commitment is to put a message out to customers within five minutes of a problem occurring and to post status updates every ten minutes until the issue is resolved. Furthermore, within one business day, Jive posts a root cause of the problem, whether that was human error, technical issues, etc., and a description of what we’re doing to fix the problem if we haven’t already fixed it. We do this because we want to be transparent to customers about both our strengths and our failings, and that includes not covering it up when we make mistakes. However, we do learn from our mistakes and use the data gathered from problems and from our monitoring systems to catch issues earlier and earlier on, with the aim to eventually eradicate them altogether.
Cause that’s what great companies do, yo.