You are on-call for an infrastructure service that has a large number of dependent systems. You receive an alert indicat

Business, Finance, Economics, Accounting, Operations Management, Computer Science, Electrical Engineering, Mechanical Engineering, Civil Engineering, Chemical Engineering, Algebra, Precalculus, Statistics and Probabilty, Advanced Math, Physics, Chemistry, Biology, Nursing, Psychology, Certifications, Tests, Prep, and more.
Post Reply
answerhappygod
Site Admin
Posts: 899604
Joined: Mon Aug 02, 2021 8:13 am

You are on-call for an infrastructure service that has a large number of dependent systems. You receive an alert indicat

Post by answerhappygod »

You are on-call for an infrastructure service that has a large number of dependent systems. You receive an alert indicating that the service is failing to serve most of its requests and all of its dependent systems with hundreds of thousands of users are affected. As part of your Site Reliability Engineering (SRE) incident management protocol, you declare yourself Incident Commander (IC) and pull in two experienced people from your team as Operations Lead (OL) and
Communications Lead (CL). What should you do next?

A. Look for ways to mitigate user impact and deploy the mitigations to production.
B. Contact the affected service owners and update them on the status of the incident.
C. Establish a communication channel where incident responders and leads can communicate with each other. Most Voted
D. Start a postmortem, add incident information, circulate the draft internally, and ask internal stakeholders for input.
Join a community of subject matter experts. Register for FREE to view solutions, replies, and use search function. Request answer by replying!

This topic has 1 reply

You must be a registered member and logged in to view the replies in this topic.


Register Login
 
Post Reply