In: Mavronicolas, M.; Tsigas, Ph.: Lecture Notes in Computer Science, Vol. 1320: Distributed Algorithms, Proc. of 11th International Workshop, WDAG'97, Saarbrücken, Germany, pages 126-140. Springer, September 1997.
Abstract: We study the problem of achieving reliable communication with quiescent algorithms (i.e. algorithms that eventually stop sending messages) in asynchronous systems with process crashes and lossy links. We first show that it is impossible to solve this problem without failure detectors. We then show how to solve it using a new failure detector, called heartbeat. In contrast to previous failure detectors that have been used to circumvent impossibility results, the heartbeat failure detector is implementable, and its implementation does not use timeouts. These results have wide applicability: they can be used to transform many existing algorithms that tolerate only process crashes into quiescent algorithms that tolerate both process crashes and message losses. This can be applied to consensus, atomic broadcast, k-set agreement, atomic commitment, etc. The heartbeat failure detector is novel: besides being implementable without timeouts, it does not output lists of suspects as typical failure detectors do. If we restrict failure detectors to output only lists of suspects, quiescent reliable communication requires , which is not implementable. Combined with the results of this paper, this shows that traditional failure detectors that output only lists of suspects have fundamental limitations.
Back to the Petri Nets Bibliography