Transcription

Root Cause Analysis Process1. Root Cause Analysis (RCA) - Process DefinitionThe Root Cause Analysis (RCA) process establishes the actual cause, not symptoms, of aproblem. It defines and tracks the actions required to eliminate or mitigate the actual causeof a problem. The process also maintains the documentation related to the analysis of theroot cause.2. ObjectiveThe goal of a RCA process is to identify the actual cause of major problems and providepreventive measures to eliminate recurrence of major problems. It also identifies secondaryproblems that prolonged the duration of a problem.3. ApplicabilityThis process applies whenever a Root Cause Analysis is required. It may be required as theoutcome of failed production implementations and service disruptions to the productionenvironment.4. Summary Description4.1Triggering Event(s) At the request of a customer or DTI Management Severity 14.1.1 Exceptions to Triggering Event(s) 4.24.3Documented known errors where there is no readily available solution.Reoccurring events where the solution is known but will not beimplemented in the short term.Primary Result(s) Identifying the root cause of a Service Outage Identifying the root cause of a trendProcess Prerequisites and RequirementsThe process prerequisites can be any of the following: Change Records with a status of unsuccessful. Incidents Emergency changes to the production environment Detailed past history of the issueUpdated July 2016Page 1 of 16

Root Cause Analysis ProcessThe process requirements are:4.4 The RCA Lead tracks any DTI action items as part of the problem ticket. RCA’s are started immediately after the triggering event and RCA’s arecompleted within 1 week.InputsThe process inputs can be any of the following:4.54.6 CAB Meeting minutes and Change Requests Incident tickets Any pertinent information related to the problemOutputs RCA Report Customer RCA Summary Report Corrective Action PlanKey Performance Measures Cost: Measure of the cost to complete each Root Cause Analysis processinstance. Quality: Measure of times same problem re-occurs after a Root CauseAnalysis. Timeliness: Measure of success to complete the Root Cause Analysisdocumentation within five days of assignment. Efficiency: Measure of hours and number of resources expended. Cycle time: Measure of time to complete the process from the end problemresolution, to complete the RCA report.4.7 Process FlowThe Root Cause Analysis process may be triggered at the conclusion of any servicedisruption in production or failed implementation. Additionally, the RCA Process canbe utilized on any issue where a detailed analysis is required.Updated July 2016Page 2 of 16

Root Cause Analysis ProcessCompletionof a ServiceDisruption1.0Identify RootCause AnalysisRequirements2.0Create CorrectiveAction Plan3.0Create RootCause AnalysisReport4.0Conduct Audit ofRoot CauseAnalysis ProcessRoot Cause ofProblemIdentifiedCompletion ofanunsuccessfulchangeCustomerRequestSeverity 14.8 Process HierarchyThe process hierarchy below is the result of further depicting the significant activitiesof the Root Cause Analysis (RCA) Process.Updated July 2016Page 3 of 16

Root Cause Analysis ProcessRoot Cause AnalysisProcess1.0Identify Root CauseAnalysis Requirements2.0Create Corrective ActionPlan3.0Create Root CauseAnalysis Report4.0Conduct Audit of RootCause Analysis Process1.1DetermineResourceRequirements2.1Identify CorrectiveAction Items3.1Create Root CauseAnalysis Report4.1Identify ProcessMetrics1.2Schedule Meeting2.2Create CorrectiveAction Plan3.2Finalize Root CauseAnalysis Report4.2Apply ProcessImprovement1.3Gather and AnalyzeData2.3Assign CorrectiveAction Items3.3Review RCAReport withCustomer1.4Complete RootCause AnalysisChecklist Seeappendix A2.4ImplementCorrective ActionItems1.5DevelopSolution2.5Verify CorrectiveAction Items2.6Update CorrectiveAction PlanUpdated July 2016Page 4 of 16

Root Cause Analysis Process4.9 Roles and ResponsibilitiesCustomer RepresentativeSubject Matter Experts (SMEs)Service OwnerRoot Cause Analysis (RCA) LeadRoot Cause Analysis Process OwnerLegendA - ApprovesC - ContributesL - LeadsP - PerformsR - ReviewsRoot Cause Analysis (RCA) TeamroleROLESThe roles and responsibilities of the participants are outlined in the following matrix.RESPONSIBILITIES1.0Identify Root Cause Analysis Requirements1.1Determine Resource Requirements1.2Schedule Meeting1.3Gather and Analyze DataPL1.4Complete Root Cause Analysis ChecklistPL1.5Develop SolutionPL2.0PPCPCreate Corrective Action Plan2.1Identify Corrective Action ItemsPCC2.2Create Corrective Action PlanPCC2.3Assign Corrective Action ItemsPCC2.4Implement Corrective Action ItemsP2.5Verify Corrective ActionP2.6Update Corrective Action Plan3.0Create Root Cause Analysis Report3.1Create Root Cause Analysis Report3.2Finalize Root Cause Analysis Report3.34.0CAReview Root Cause Analysis Report with Customer4.1Identify Process MetricsP4.2Apply Process ImprovementPPage 5 of 16CPCCCPCCPConduct Audit of Root Cause Analysis ProcessUpdated July 2016P

Root Cause Analysis Process5. Root Cause Analysis - Detailed DescriptionsThis section contains the detailed description for each process thread of the Root CauseAnalysis Process. This detailed description consists of a process map for the process threadand a brief description of each activity box in the process dInitiation of aRCA1.4Complete RootCause AnalysisChecklistRoot CauseAnalysis (RCA)LeadRoot CauseAnalysis (RCA)TeamRoot CauseAnalysis (RCA)Process Owner5.1 Process Map – Identify Root Cause AnalysisRequirement1.3Gather andAnalyze Data1.5DevelopSolutionSubject matterExperts (SMEs)1.2ScheduleMeeting5.2 Process Activity Description – Identify Root CauseAnalysis Requirements (Activity 1.0)A brief description of each activity in the process thread ‘Identify Root CauseAnalysis Requirements’ is provided below. The activity box numbers are in theparentheses of each heading. The responsibility for this section is the RCA Lead.5.2.1Determine Resource Requirements (Activity 1.1)The RCA Lead is responsible for determining whether a RCA is to beperformed by a specific individual or a team. Each specific individual orteam uses their reports and any diagnostic tools required to complete asuccessful RCA.Basic RCA guidelines for determining if a team is required are:Updated July 2016 Problem has a high level of complexity Problem crosses multiple disciplines or platforms Problem is part of a previous RCAPage 6 of 16

Root Cause Analysis Process Problem is part of an identified problem trendThe problem is initially assigned to an individual, but if through gatheringand analyzing the data it was determined that the individual requiresassistance, a team is assigned to complete the problem analysis to obtainproper resolution5.2.2Schedule Meeting (Activity 1.2)If it is determined the RCA requires a team, the RCA Lead schedules anRCA meeting. The RCA Lead is responsible for all communicationsregarding the RCA.The RCA Lead is to: Establish a time and place for conducting the RCA session Identify all participants for the session, which should include: All personnel associated with the recovery and crisismanagement Scribe Subject Matter Experts (SMEs) Representation from the Management that owns the area inwhich the problem occurred5.2.3 Prepare an agenda with objectives for the meeting Notify all participants of the day, time and location; also include thepurpose and objectives of the meeting along with an agendaGather and Analyze Data (Activity 1.3)The individual or team assigned to the RCA is responsible for gathering andanalyzing the problem data. The data from the individual team meeting (ifone occurred) provides answers to the following questions:Updated July 2016 How the problem was detected What symptoms were associated with the problem Impact to Customer How the problem was triggered What the time-line/chronology of events was during the problem, toinclude any of the steps taken during recovery and crisismanagementPage 7 of 16

Root Cause Analysis Process5.2.45.2.5Complete Root Cause Analysis Checklist (Activity 1.4)To ensure that the RCA Team gathers consistent data as well asto separate the cause from the symptom, the team needs to reviewRoot Cause Analysis Checklist (see Appendix A).DevelopSolution (Activity 1.5)RCA Team members brainstorm solutions following the analysis of theproblem and focus on idea generation and corrective action determinationincluding any action or alternative solution that reduce the impact orshortened the duration of the problem.Solutions focus on the following key elements: Eliminating the root cause of the problem Mitigating actions, events, and architecture or infrastructure relatedissues responsible for extending the outage2.6UpdateCorrectiveAction Plan2.1IdentifyCorrectiveAction Items2.2CreateCorrectiveAction Plan2.3AssignCorrectiveAction ItemsCorrectiveAction PlanIdentifiedAction ItemComplete2.1IdentifyCorrectiveAction Items2.2CreateCorrectiveAction Plan2.3AssignCorrectiveAction ItemsSubject MatterExperts (SMEs)Service OwnerRoot Cause Analysis Root Cause Analysis(RCA) Lead(RCA) TeamRoot Cause Analysis(RCA) ProcessOwner5.3 Process Map – Create Corrective Action PlanUpdated July 2016Page 8 of 162.4ImplementCorrectiveAction Items2.5Verify CorrectiveAction Items2.6UpdateCorrectiveAction Plan

Root Cause Analysis Process5.4 Process Activity Descriptions – Create Corrective ActionPlan (Activity 2.0)A brief description of each activity in the process thread ‘Create Correction Plan’ isprovided below. The activity box numbers are in the parentheses of each heading.5.4.1Identify Corrective Action Items (Activity 2.1)The RCA team members with support from service owners identifycorrective actions with the intent of eliminating the Root Cause of theproblem, as well as proactive prevention of similar problems.5.4.2Create Corrective Action Plan (Activity 2.2)This activity includes the completion of a Corrective Action Item Plan. TheCorrective Action Plan molds the identified corrective action items into acohesive plan, which includes the identification of any dependencies betweenaction items.5.4.3Assign Corrective Action Items (Activity 2.3)This activity includes the assignment of an individual or team, deliverablerequirements, and recommended completion due dates.5.4.4Implement Corrective Action Items (Activity 2.4)Action item owners are accountable for their assigned corrective actions.Any problems or issues that occur during the performance of a correctiveaction are noted and communicated, as appropriate, through standardescalation channels to the RCA Lead, to RCA team, or through the RCAReport.5.4.5Verify Corrective Action Item (Activity 2.5)Once the corrective action item is complete, the Service Owner of the actionitem verifies the resolution of the problem or problem subset.5.4.6Update Corrective Action Plan (Activity 2.6)Once the success of the action item is verified, the service owner updates theaction item database or contacts the RCA Lead to update the appropriatedatabase. Also included in this update are details of the success or failure ofthe action in relation to the problem's original resolution. If the action itemsdo not resolve the problem, the RCA process is reinitiated.Updated July 2016Page 9 of 16

Root Cause Analysis ProcessRoot Cause Analysis Root Cause Analysis Root Cause Analysis(RCA) Lead(RCA) TeamProcess Owner5.5 Process Map – Create Root Cause Analysis Report3.3CustomerReview3.2Finalize RootCause AnalysesReportData Gatheredfor Root CauseAnalysis ReportReport isDistributedCustomerRepresentativeSubject MatterExperts (SMEs)Service Owner3.1Create RootCause AnalysisReport3.3CustomerReview5.6 Process Activity Descriptions – Create Root CauseAnalysis Report (Activity 3.0)A brief description of each activity in the process thread ‘Create Root Cause AnalysisReport’ is provided below. The activity box numbers are in the parentheses of eachheading.5.6.1Create Root Cause Analysis Report (Activity 3.1)The details of the RCA are compiled into a standard RCA format anddistributed.The following data is part of each RCA report:Updated July 2016 Problem Description: A short synopsis of the problem Outage Duration: Hours and minutes by outage date Impact: An estimate of the number of customers impacted Chronology of Events: Date, time and name of contact with adescription of each event Root Cause Analysis: In this section, a full description is recordedonce the underlying cause of the problem is determined. Actionitems to resolve should follow Root Cause identification. ThePage 10 of 16

Root Cause Analysis Processaction items will appear in the RCA Action Items section of theRCA Report.5.6.2 Secondary Problems: In this section, any additional problems thatarise during the recovery of the original failure are documented.These types of problems are identified in the RCA session andinclude escalation problems, process/procedure problems,component failures, documentation deficiencies, or training issues Recommendations: Describe recommendations that helped resolvethe problem or could prevent its reoccurrence. Recommendationsgenerate action items required to achieve the statedrecommendation RCA Action Items: In this section, list all identified action items ofthe RCA session. The items include the assigned manager's name,assignee's name, and target date for completion or to be determineddate, if being handled by workflow managementFinalize Root Cause Analysis Report (Activity 3.2)The RCA is complete when the following items are complete:5.6.3 All action items associated with the problem have been completedand verified as successful The action item database has been updated The RCA report has been generated and distributed to the DTImanager(s) and Team Lead(s) of all personnel involved for reviewand sign off. The CES, Team Lead(s) or DTI manager (s) of the RCA lead willdetermine if DTI Senior Management needs to be involved incommunicating the results of the RCA to the customer.Review customer RCA summary with Customer (Activity 3.3)Upon completion of the RCA and after all the appropriate sign offs haveoccurred:Updated July 2016 CES will draft the customer RCA summary The CES will send the customer RCA summary and the RCA reportto DTI Senior Management and the Change Control Team forreview. The CES will schedule and conduct a meeting with the customer toreview the customer RCA summary unless it has been determinedthat DTI Senior Management needs to communicate the message.Page 11 of 16

Root Cause Analysis ProcessIn some instances there may be action items that the customer will need totake part in.Root Cause Analysis(RCA) ProcessOwner5.7 Process Map – Conduct Audit of Root Cause AnalysisProcess4.1IdentifiedProcess Metrics4.2Apply ProcessImprovement4.3Update SN KnowledgeBaseProcess isContinuouslyImprovedService OwnerRoot Cause Analysis Root Cause Analysis(RCA) Lead(RCA) TeamProcess MetricsIdentified5.8 Process Activity Descriptions – Conduct Audit of RootCause Analysis Process (Activity 4.0)A brief description of each activity in the process thread ‘Conduct Audit of RootCause Analysis Process’ is provided below. The activity box numbers are in theparentheses of each heading.5.8.1Identified Process Metrics (Activity 4.1)Metrics are used to determine the efficiency of the process.Key metrics include:5.8.2 Number of RCA sessions held by month and by the category ofproblems Percentage of problems that qualify for an RCA session. Tovalidate compliance, this number is compared to the actual numberof RCAs that took place Number of related problems that have recurred after RCA actionswere implementedApply Process Improvement (Activity 4.2)The effectiveness of the process is measured by using metrics. Based on theresults of the analysis, the feedback received from stakeholders of theprocess, and the needs of the organization, the process is modified andcontinuously improved.5.8.3Updated July 2016Update the SN Knowledge Base (Activity 4.3)Page 12 of 16

Root Cause Analysis ProcessBased on the root cause of the problem and the frequency of occurrence adecision will be made by the RCA team if a SN Knowledge Article should beopen for this event. For events that have occurred several times, aknowledge article will be open by the RCA Lead and referenced in theproblem ticket, detailing steps that need to be taken to resolve the problemexpeditiously.6. Key Terms, Acronyms6.1 Key Terms and AcronymsTERM/ACRONYMSDEFINITIONScribeRecords all assumptions and documents decisionsmade during the process.Subject Matter Experts (SMEs)Individuals called upon to share their specializedknowledge.Corrective Action PlanIdentification of actions to be taken along withassignees and completion dates.CABChange review board (meets twice weekly)RCARoot Cause AnalysisSenior ManagementRefers to the COO, CSO, CTO and CPCRoot Cause Analysis Process OwnerThis role is performed by the SN ProblemManagement Process OwnerRoot Cause Analysis TeamThe group of individuals in the RCA meeting andcontributing to the RCA reportRoot Cause Analysis LeadThe individual conducting the RCA meetingService OwnerThe individual/grouptechnologySubject Matter ExpertThe technical staff member who supports thetechnologyCustomer RepresentativeAgency personnelresponsible7. AppendicesAppendix ARoot Cause Analysis ChecklistDevelop an event chronology associated with the outage.Updated July 2016Page 13 of 16forthe

Root Cause Analysis ProcessAdd the activities and actions recorded by the Recovery/Availability manager.Add the related activities that preceded the outage.Add the related activities that were executed after service was restored.Obtain consensus of the event chronology by RCA participants.Analyze each of these activities to ensure: Identification of unaccounted time between events Identification of associated technical and process related problemsEvaluate any and all actions that were performed.Identify actions that extended the outage duration.Identify what actions / design could have possibly avoided the outage.Identify what actions / design could have shortened the duration of the outage.Identify what actions / design could have minimized the scope of the outage.Determine if any of the identified problems occurred previously.Ensure a service owner is assigned to an identified problem and the RCAinformation is added to the problem record.Problem Recognition QuestionsHow was the problem recognizedCould the problem have been recognized earlierHow could the problem have been recognized earlierCould the problem have been avoided or minimized if detected earlierWhat action was taken after the problem was recognizedWhat actions should have been takenProblem Notification QuestionsWere there pre-defined problem notification proceduresWere the procedures followedWas the correct person, department, or group notifiedDid the procedure work wellShould the notification have been earlierIs there a need for improvementProblem Determination QuestionsWas there a problem determination procedureWas the process followedDid the process workUpdated July 2016Page 14 of 16

Root Cause Analysis ProcessCould problem determination have begun earlierWere the needed problem determination tools availableWere the tools usedDid the tools workDo better tools existIs there a need for improvementWas there a change involved in the outageDid the change cause the outageDid the change increase the duration or scope of the outageWere there procedural changes madeWas there confusion because of the changeWas the change documentedWas the change scheduledWas everyone notifiedWas the change thoroughly testedWere associated procedural changes needed after installationRecovery QuestionsWere there predetermined recovery proceduresWere the procedures followedWere the procedures successfulDid they take as long as expectedCan any automation techniques be appliedSituation Management QuestionsWas a situation manager identifiedWas there a problem with the identification and role of the situationmanagerWere the proper escalation procedures followedWere the correct support staffs notifiedShould the support staff's response been fasterAre the escalation criteria and response received in keeping with thecommitted service level objectivesIs there a need for improved criteriaWas a log kept of all actions and decisionsHow effective was the overall management of the situationHow could this process be improvedProblem Resolution QuestionsIs the technical resolution validUpdated July 2016Page 15 of 16

Root Cause Analysis ProcessHas the root cause of the underlying problem been resolvedHas the problem been sufficiently documented in a Problem RecordUpdated July 2016Page 16 of 16

Root Cause Analysis Process Updated July 2016 Page 8 of 16 e 5.2.4 Complete Root Cause Analysis Checklist (Activity 1.4) 5.2.5 To ensure that the RCA Team gathers consistent data as well as to separate the cause from the symptom, the team needs to review Root Cause Analysis Checklist (see Appendix A). Develop Solution (Activity 1.5)