| << Prev | - Up - | Next >> |
FaultThis section summarizes the operations of the Fault module and their argument types. Please refer to the Distribution Tutorial for a full specification of the operations and examples of how to use them. This section carefully indicates where the current release is incomplete with respect to the specification (called a limitation) or has a different behavior (called a modification).
We summarize the argument types for the operations in the Fault module.
EntityA reference to any Oz language entity that has distributed fault modes, namely any object, cell, lock, port, or logic variable.
LevelEither site or 'thread'(T), where T is a thread reference or the atom this.1
FStatesA set of fault states, i.e., a list that can contain at most one of each of the elements tempFail, permFail, remoteProblem(tempSome), remoteProblem(permSome), remoteProblem(tempAll), and remoteProblem(permAll).
OPA record that indicates which attempted operation caused the exception or handler invocation. The value of OP is one of:
bind(T), wait, isDet (for logic variables).
cellExchange(Old New), cellAssign(New), cellAccess(Old) (for cells).
'lock' (for locks).
send(Msg) (for ports).
objectExchange(Attr Old New), objectAssign(Attr New), objectAccess(Attr Old), objectFetch (for objects). A limitation of the current release is that an attempted operation on an object cannot be retried.
HandlerProcA handler, i.e., a three-argument procedure that is called as {HandlerProc Entity FStates OP}, where FStates is a set of currently active fault states. A handler replaces an attempted operation on an entity.
WatcherProcA watcher, i.e., a two-argument procedure that is called in its own thread as {WatcherProc Entity FStates}, where FStates is a set of currently active fault states. A watcher is invoked as soon as the site detects a fault.
When there is a distribution problem, then three items of information are made available:
Entity: the faulty entity.
ActualFStates: the fault states that are currently active. This is always a subset of the states that the entity is set up to detect. For objects, cells, and locks, the fault states tempFail(info:I) and permFail(info:I) are possible, where I is in {state, owner}. This tells whether the fault is due to a lost state pointer (state) or a crashed owner (owner).
OP: the operation that is attempted but does not succeed.
The system can be configured (see below) so that these three items appear in one or more of the following three ways:
In an exception with format system(dp(entity:Entity conditions:FStates op:OP) ...).
As arguments to a handler call, {HandlerProc Entity FStates OP}.
As arguments to a watcher call, {WatcherProc Entity FStates}.
A limitation of the current release is that the Entity argument is undefined for an object operation. For handlers and watchers, this limitation can be bypassed by giving the handler and watcher procedures a reference to the object.
The Fault module contains the following operations. All operations return a boolean flag B that is true if the operation succeeds and false otherwise. All enable and install operations succeed if nothing was enabled or installed at that level. An entity with a successful enable or install at a given level is said to have fault detection at that level. All disable and deInstall operations succeed if nothing was disabled or deinstalled at that level. The system starts up as if {Fault.defaultEnable [tempFail permFail] _} was executed.
All the following operations that have an Entity argument will do nothing if entity does not have distributed fault modes. If a logic variable with fault detection is bound to a nonvariable entity, then the fault detection is transferred to the entity, provided the latter has no fault detection at that level.
{Fault.defaultEnable FStates ?B}Sets the default fault detection to FStates on the current site. When an operation is attempted on an entity and there is no fault detection on the site or thread level for the entity, then the default fault detection is used. This always succeeds.
{Fault.defaultDisable ?B}Sets the default fault detection to nil on the current site. This always succeeds.
{Fault.enable Entity Level FStates ?B}Enables fault detection on a given entity at a given level for a given set of fault states. An exception is raised if a fault is detected when an operation is attempted on the entity.
{Fault.disable Entity Level ?B}Disables fault detection on a given entity at a given level.
{Fault.install Entity Level FStates HandlerProc ?B}Installs a handler for fault detection on a given entity at a given level for a given set of fault states. The handler {HandlerProc Entity AFStates OP} is called if a fault is detected when an operation is attempted on the entity. A modification of the current release with respect to the specification is that handlers installed on variables always retry the operation after they return.
{Fault.deInstall Entity Level ?B}Deinstalls a handler for fault detection on a given entity at a given level.
{Fault.installWatcher Entity FStates WatcherProc ?B}Installs a watcher for fault detection on a given entity for a given set of fault states. Any number of watchers can be installed on an entity. It is always possible to install a watcher, so therefore this always succeeds. The watcher {WatcherProc Entity AFStates} is called in its own thread as soon as the site detects a fault.
{Fault.deInstallWatcher Entity WatcherProc ?B}Deinstalls the given watcher on a given entity. This call succeeds if WatcherProc was installed on the entity. If there is more than one instance of WatcherProc installed on the entity, then exactly one is deinstalled.
On a given entity at the global level, at most one enable can be done or one handler installed. For a given entity, the site level can have at most one fault detection per site. The 'thread'(T) can have at most one fault detection per thread. To have another fault detection, it is necessary to do a disable or deinstall first.
The current release has the following limitations and modifications with respect to the failure model specification. A limitation is an operation that is specified but not possible in the current release. A modification is an operation that is specified but behaves differently in the current release.
Most of the limitations and modifications listed here will be removed in future releases.
The limitations are:
The fault state tempFail is indicated only after a long delay. In future releases, the delay will be very short and based on adaptive observation of actual network behavior.
If an exception is raised or a handler or watcher is invoked for an object, then the Entity argument is undefined. For handlers and watchers, this limitation can be bypassed by giving the handler and watcher procedures a reference to the object.
If an exception is raised or a handler is invoked for an object, then the attempted object operation cannot be retried.
The modifications are:
A handler installed on a variable will retry the operation (i.e., bind or wait) after it returns. That is, the handler is inserted before the operation instead of replacing the operation.
| << Prev | - Up - | Next >> |
thread is already used as a keyword in the language, it has to be quoted to make it an atom.