In: Proc. 27th IEEE FTCS - International Symposium on Fault-Tolerant Computing, Seattle, USA, pages 354-362. 1997.
Abstract: In this paper the consolidate identification of faults, distinguished as transient or permanent/intermittent, is approached. Transient faults discrimination has long been performed in commercial systems: threshold-based techniques have been practiced for several years for this purpose. The present work aims to contribute to the usefulness of the count-and-threshold scheme, through the analysis of its behaviour and the exploration of its effects on the system. To this goal, the scheme is mechanized as a device named a-count, endowed with a few controllable parameters. a-count tries to balance between two conflicting requirements: to keep in the system those components that have experienced just transient faults; and to remove quickly those affected by permanent or intermittent faults. Analytical models are derived, allowing detailed study of a-count's behaviour; the actual evaluation, in a range of configurations, is performed by standard tools, in terms of the d! elay in spotting faulty components and the probability of improperly blaming correct ones.
Back to the Petri Nets Bibliography