![]() |
|
||
Run-Time Root Cause Analysis in Adaptive Distributed SystemsAmit Raj, Stephen Barrett, and Siobhan Clarke School of Computer Science and Statistics, Trinity College of Dublin, Irelandaraj@scss.tcd.ie Stephen.Barrett@scss.tcd.ie Siobhan.Clarke@scss.tcd.ie Abstract. In a distributed environment, several components collaborate with each other to cater a complex functionality. Adaptation in distributed systems is one of the emerging trends that re-configures itself through components addition/removal/update, to cope up with faults. Components are generally inter-dependent, thus a fault propagates from one component to another. Existing root cause analysis techniques generally create a static faults’ dependencies graph to identify the root fault. However, these dependencies keep on changing with adaptations that makes design-time fault dependencies invalid at run-time. This paper describes the problem of deriving causal relationships of faults in adaptive distributed systems. Then, presents a statechart-based solution that statically identifies the sequence of methods execution to derive the causal relationships of faults at run-time. The approach is evaluated, and found that it is highly scalable and time efficient that can be used to reduce the Mean Time To Recover (MTTR) of a distributed system. Keywords: Distributed Systems, Root cause analysis, Fault causal relationship, adaptive system, component-based system LNCS 8186, p. 292 ff. lncs@springer.com
|