Jump to content

Failure semantics

From Wikipedia, the free encyclopedia
This is an old revision of this page, as edited by Tangurena (talk | contribs) at 18:40, 23 December 2008 (add reference section for references). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.



Failure Semantics is a concept used in Distributed computing[1][2] to describe and classify errors that distributed systems can experience.

Example (service):

  • A crash error is when nothing happens.
  • An omission error is when is when one or more responses fails. A crash is a special case of omission when all responses fails.
  • A timing error is when is when one or more responses arrive outside the time interval specified. Timing errors can be early or late. An omission error is a timing error when a response has infinite timing error.
  • An arbitrary error is any error, i.e. a wrong value or a timing error.
  • When a client uses a server it can cope with different type errors from the server.
  • If it can manage a crash at the server it is said to assume the server to have crash failure semantics
  • If it can manage an service omission it is said to assume the server to have omission failure semantics
  • Failure Semantics are the type of errors that we expect to appear.
  • Should another type of error appear it will lead to a service failure since we can not manage it.

References

  1. ^ Flaviu Christian, Understanding Fault-Tolerant Distributed Systems [1]
  2. ^ Arno Puder (2005). Distributed Systems Architecture. Morgan Kaufmann. ISBN 1558606483. {{cite book}}: Unknown parameter |coauthors= ignored (|author= suggested) (help), pp 14-16.