High Availability FailOver/iX Manual > Chapter 2 Product Description

How Failover Works

	MPE documents
	Complete PDF
	Table of Contents

	E0803 Edition 2
	E1100 Edition 1 ♥

Once installed and configured, the HAFO utilities continually monitor SCSI reply messages for failed SCSI data path components. This adds minimal overhead to the I/O subsystem operation.

HAFO event information and data structures are memory resident. This eliminates the need for disk file access to perform high availability failover. This is an advantage especially if a SCSI path failure to LDEV#1 should occur.

There is a new section of the SYSGEN program for HAFO called (ha). It is entered by running SYSGEN, by typing io at the SYSGEN prompt, then typing ha at the "io" prompt. No failover action is taken until the ldevs are configured for HAFO in the "ha" section of SYSGEN. Device data path and alternate data path information is entered and saved in SYSGEN's HAFOCONF configuration file. HAFOCONF configurations are read and validated during each system boot.

Specific configuration information is provided in the Chapter 4 "Configuration".

Figure 2-1 "Normal System Layout" illustrates a sample configuration. This figure can be compared against Figure 2-2 "Failover", which illustrates a failover of the same system.

Figure 2-1 Normal System Layout

Triggering a Failover

HAFO only acts on specific error types that indicate a data path failure. Any of these three occurrences will generate a failover:

Hung I/O
Failed high availability array controller (communicated by a SCSI reply status)
Failed host device adapter card

If any other error type occurs (such as a data transmission or device error), the I/O subsystem will manage the error and perform corrective action. HAFO will remain idle and not participate. In addition, HAFO remains idle when error types are received from non-high availability devices.

Executing a Failover

When a trigger status is received, HAFO will immediately begin the failover sequence. This sequence activates the alternate data path and reroutes I/O to it. Failover occurs on a per ldev basis. Each device manager (the piece of the I/O subsystem that manages a specific ldev) learns of the data path failure during a subsequent I/O to its ldev. For example, if three ldevs on a fast-wide SCSI bus experience an array controller failure, the associated three device managers will perform a HAFO failover event independently.

Figure 2-2 "Failover" illustrates a failover event in a sample system.

Figure 2-2 Failover

No application or higher level MPE/iX operation system will experience an abnormal event. All I/Os complete as normal using the alternate data path and alternate array controller. For example, file subsystem, database management, or memory management subsystem.

User Notification of Failover

The I/O and system logs will document the failover event. In addition, the following console error message will be displayed upon failover and every five minutes thereafter. The repeating message can be turned off with a [CTRL-A] reply.


  HIGH AVAILABILITY FAILOVER IS STARTED FOR LDEV# IN DISK ARRAY.
  NO DATA LOSS OR CORRUPTION.
  SYSTEM OPERATION WILL CONTINUE.
  PLEASE PLACE SERVICE CALL SOON.

  ACKNOWLEDGE HAFO FAILOVER IN DISK ARRAY FOR LDEV# (Y/N)?

Once a Failover Has Occurred

After an HAFO event, system I/O activity will resume via the alternate data path. There are no limits to the kinds of normal I/O that can be processed on the alternate data path. Throughput may be affected since I/O is shared with bus activities from other ldevs configured for that bus.

For additional information, see the Chapter 6 "Recovering From a Failover".

Chapter 2 Product Description

Components

How Failover Works

MPE documents

Triggering a Failover

Executing a Failover

User Notification of Failover

Once a Failover Has Occurred