Fault Detection and Diagnosis in Distributed Systems: An Approach by Partially Stochastic Petri Nets.

Aghasaryan, Armen; Fabre, Eric; Benveniste, Albert; Boubour, Renée; Jard, Claude

In: Discrete Event Dynamic Systems, Volume 8, Issue 2, pages 203-231. Kluwer Academic Publishers, June 1998.

Abstract: We address the problem of alarm correlation in large distributed systems. The key idea is to make use of the concurrence of events in order to separate and simplify the state estimation in a faulty system. Petri nets and their causality semantics are used to model concurrency. Special partially stochastic Petri nets are developed, that establish some kind of equivalence between concurrence and independence. The diagnosis problem is defined as the computation of the most likely history of the net given a sequence of observed alarms. Solutions are provided in four contexts, with a gradual complexity on the structure of observations.

Keywords: Viterbi algorithm; capacity-one Petri net; causality semantics; distributed DEDS; error correlation; fault management; stochastic Petri net; telecommunication network.

