You are here:Home/AFAS by JTS/Identifying, Triaging & Resolving Critical Issues
Our process follows the below 4 phases to mitigate any type of issue with the application.
1. Alerting and Engagement
The alerting and engagement phase focuses on bringing awareness to incidents within the application. Amazon CloudWatch monitoring and alerts are used to properly communicate about any incident in the application. This is where first responders get involved to start triaging the incident. Also, customers can escalate any issues by email to our help desk team at email@example.com.
First responders attempt to determine the impact on customers. Customers are notified of the impact of the incident. The responders prioritize incidents by using the following impact rating:
1- Critical impact, this typically relates to full application failure that impacts many to all customers. 2- High impact, partial application failure with impact to many customers. 3- Medium impact, the application is providing reduced service to customers. 4- Low impact, customers might are not impacted by the problem yet. 5- No impact, customers aren’t currently impacted but urgent action is needed to avoid impact.
3. Investigation and Mitigation
JTS uses the Failure Mode and Effects Analysis methodology to investigate and mitigate potential software issues. This process:
Prioritizes failures according to severity, frequency, and detectability. Severity describes the seriousness of failure consequences. Frequency describes how often failures can occur. Detectability refers to the degree of difficulty in detecting failures.
Documents current knowledge about failure risks.
Mitigates risk at all levels with resulting prioritized actions that prevent failures or at least reduce their severity and/or probability of occurrence.
Employed from the earliest design and conceptual stages onward through development and testing processes, into process control during ongoing operations throughout the life of the product.
4. Post-Incident Analysis
Post-Incident Analysis (PIA) will be conducted to assess the chain of events that took place, the methods used to control the incident, and how the actions of JTS and outside agencies (e.g. vendors, etc.) contributed to the eventual outcome. A PIA meeting and report will be conducted and at a minimum cover the following areas:
DATE AND TIME OF INCIDENT: The date and time the event occurred. TYPE OF INCIDENT: Include a brief description of the type of event that required response activation. SITUATION UPON ACTIVATION OF THE RESPONSE PROCESS: Include a brief description of the situation encountered by the first personnel assessing the failure. The type of personnel responding should be listed. FINAL OUTCOME OF THE INCIDENT: List the extent of damage and impact on business operations. STRATEGY: List the strategies chosen to respond and recover from the event. For each strategy, list what the strategy entailed as well as the results of implementing each strategy. COMMON OBSTACLES: List those problems encountered by more than one user or Incident Commander that may indicate a need for a review of procedures, training or recovery plans. RECOMMENDATIONS: List any recommendations for correction or reduction of these obstacles. WHAT OPERATIONS WORKED WELL? WHY? Look at strategies and results to help reinforce procedures and tactics that were successful so they may be applied to similar situations in the future. STAKEHOLDER COMMUNICATIONS: Review the internal and external communications process during the incident. Were communications adequate, proactive and regularly updated? Were the appropriate stakeholders communicated with during and after the event?
Note: Safety concerns or financial risk does not apply to AFAS.
Let’s Get Started
Contact us today for a complete capabilities presentation.