
17.3.1 Node Failure
Node failure is defined as a node exiting the process, or the machine that the process runs on dying (possibly due to power
failure etc). The following steps are performed after a node failure:
• The cluster detects that the node has not been active
1
and votes for new cluster membership.
• The new membership is decided and the new primary component is formed.
• Any FT activities being processed on that node are failed over to another node.
Figure 17.2 shows the cluster configuration after the failure of node 1. The primary component has a membership of nodes 2
and 3.
Machine A
Rhino node 1
Machine B
Rhino node 2
Machine C
Rhino node 3
Cluster [2,3]
Figure 17.2: Node one has failed
17.3.2 Node Restart
A node booting and joining a cluster is the same as a failed node restarting. Figure 17.3 shows a two stage process for a node
re-starting. Initially the node boots and forms a non-primary cluster component with membership of itself.
The node becomes a member of the primary component and synchronizes working memory with the primary component. The
node can only perform work once it is a member of the primary component and has synchronized state with the rest of the
cluster.
17.3.3 Network Failure
In a distributed system, the connection between different computers can fail. Two possible examples of this include (but are not
limited to) the physical network cable being cut or the network interface card failing. Figure 17.4 shows two stages that occur
in a network failure.
The first stage is shown in the top portion of the diagram and shows that two different components are formed one is a non-
primary component that has a membership of node 1 the other is the primary component that has membership of nodes 2 and
3.
Once a node has transitioned from a primary component to a non-primary it logs an error message, ensures that outstanding
transactions will not commit, and finally terminates.
17.4 Configuration Parameters
Rhino SLEE clustering software can be configured for various purposes. The default configuration should be suitable for the
majority of environments.
1
The cluster detects the node has aborted by properties defined in the file
$RHINO_NODE_HOME/config/savanna/settings-cluster.xml
.
Open Cloud Rhino 1.4.3 Administration Manual v1.1 102
Komentarze do niniejszej Instrukcji