HPlogo High Availability FailOver/iX Manual > Chapter 6 Recovering From a Failover

Special Considerations for Failed Paths

MPE documents

Complete PDF
Table of Contents

E0803 Edition 2
E1100 Edition 1 ♥

HAFO is not active during system boot. When the system boots, it always mounts all disk volumes using ONLY the primary path. This means that if a primary path is broken, the ldevs on that path WILL NOT MOUNT, and will not be available for access after the system is up.

Rebooting With a Failed Primary Path


In order to mount these ldevs when their primary path is broken, the user must use the io> section of SYSGEN (NOT the ha> section) to specify the alternate path as the actual path for these ldevs. The system must then be rebooted.

Consider the following sample output from the HASTAT command:

  :HASTAT
  
    High Availability Failover Device Status
    
   LDEV PRIMARY   ALTERNATE PRIMARY        ALTERNATE
        PATH      PATH      STATUS         STATUS
  ----- --------- --------- -------------  --------------
     80 8.0.0     16        Ready          Ready
     90 8.0.1     16        Ready          Ready
    100 8.0.2     16        Ready          Ready
    110 8.0.3     16        Ready          Ready
    120 8.0.4     16        Ready          Ready
    210 32.0.3    40        Ready          Ready
    220 40.0.4    32        Ready          Ready
    520 40.0.5    32        Ready          Ready

If there were an array failure on the path for ldev 210, the output of the HASTAT command would be as follows:

  :HASTAT
  
    High Availability Failover Device Status

   LDEV PRIMARY   ALTERNATE PRIMARY        ALTERNATE
        PATH      PATH      STATUS         STATUS
  ----- --------- --------- -------------  --------------
     80 8.0.0     16        Ready          Ready
     90 8.0.1     16        Ready          Ready
    100 8.0.2     16        Ready          Ready
    110 8.0.3     16        Ready          Ready
    120 8.0.4     16        Ready          Ready
    210 32.0.3    40        Array Failure  Ready
    220 40.0.4    32        Ready          Ready
    520 40.0.5    32        Ready          Ready

This indicates that ldev 210 has failed over to path 40. Because path 32 has suffered a hardware failure, ldev 210 will NOT mount if the system is rebooted before the hardware is repaired. If the system must be rebooted before the hardware is repaired, then there must be some changes made in the "io" section of SYSGEN. Ldev 210 must be configured onto path 40. For example:

  io> mdev 210 path=40.0.6
  io> hold
  io> e
  sysgen> keep

Now when the system is rebooted, ldev 210 will mount on path 40 and will be available for access.


NOTE: Near the end of the boot process, there will be an error message stating:

  Invalid Primary Path for this LDEV. - (HAFOERR 500)


This is because ldev 210 is still configured in the HAFOCONF file as having 32 as it's primary path. In this special case this error can be ignored. However, ldev 210 is not covered by HAFO and path 32 should be repaired as soon as possible. (Note also that ldevs 220 and 520 are also not covered by HAFO since path 32 is physically broken.)

Rebooting With a Failed Primary Path for ldev 1


Ldev 1 can be configured for HAFO just as any other ldev in the XP256. It is, however, a very special situation when the system needs to be rebooted while the primary path for ldev 1 is broken.

The user will need to make adjustments at the ISL prompt (ISL>) before booting the system. If the primary path for ldev 1 is broken, the system primary path will need to be adjusted to be alternate path for ldev 1.

The following example illustrates how to handle ldev 1.

Suppose that HASTAT shows the following on your system:

  :HASTAT

    High Availability Failover Device Status

   LDEV PRIMARY   ALTERNATE PRIMARY        ALTERNATE
        PATH      PATH      STATUS         STATUS
  ----- --------- --------- -------------  --------------
      1 8.0.0     15        Ready          Ready
     90 15.0.1     8        Ready          Ready
    100 15.0.2     8        Ready          Ready
    110 15.0.3     8        Ready          Ready
    120 15.0.4     8        Ready          Ready

Now suppose that there is a hardware failure on path 8. The system would continue to function, and HASTAT would show something like:

  :HASTAT

    High Availability Failover Device Status

   LDEV PRIMARY   ALTERNATE PRIMARY        ALTERNATE
        PATH      PATH      STATUS         STATUS
  ----- --------- --------- -------------  --------------
      1 8.0.0     15        Array Failure  Ready
     90 15.0.1     8        Ready          Ready
    100 15.0.2     8        Ready          Ready
    110 15.0.3     8        Ready          Ready
    120 15.0.4     8        Ready          Ready

If for some reason the system must be rebooted before path 8 can be repaired, special action must be taken before entering the start command.

Each MPE system has what is known as the system primary path, which has nothing to do with the primary path concept in HAFO. Ldev 1 is always on the system primary path. Since, in this example, the system primary path is broken, ldev1 will not be found and the system will not boot.

In order to remedy this, you must change the system primary path from 8 to 15. Please refer to the System Startup, Configuration, and Shutdown Reference Manual for information on changing the system primary path.

After the system primary path is changed to 15.0.0, the system can find ldev 1 and boot.


NOTE: In the previous example, there was only one ldev on the broken system primary path. If there had been other ldevs on path 8, then in order to ensure that all the volumes mounted, the actions described in "Rebooting With a Failed Primary Path" would also have to be taken.

It is recommended that the user creates and maintains an alternate configuration group that has the alternate paths for the system volume ldevs configured to be their primary path. That is in the io> section of SYSGEN and NOT in the ha> section. This is so that in the case where the system volume set's path goes bad and you need to reboot, you can use this alternate configuration group to ensure that the system volume set mounts properly. (This assumes that all the system volumes are on the same path. If they are not, then you may want to create multiple alternate configuration groups, one for each path that contains system volumes.)

Performance Considerations


It is recommended that the user understand the performance characteristics of the current system before making any non-HAFO configuration changes (i.e., changes in the io> section of SYSGEN) to accommodate HAFO. These changes may need to be carefully planned in order to maintain system performance.

Also be aware that a HAFO event can greatly increase the I/O load on a given path and can cause significant performance degradation. Actual performance is dependent on the capabilities of a host device adapter card and the load. We normally recommend no more then eight ldevs on a host device adapter card for "best" performance. If the load on the host device adapter card is relatively low then the difference between eight ldevs and fourteen ldevs will not effect performance. If the load on a host device adapter is "high" then when other ldevs have failed over to this host device adapter, performance degradation could occur.




Rerouting to the Primary Path After Failover


Appendix A Sample Failover and Recovery