HPlogo HP-UX Reference > O

olrad(1M)

HP-UX 11i Version 2: December 2007 Update
» 

Technical documentation

 » Table of Contents

 » Index

NAME

olrad — command for online addition/replacement of PCI IO cards

SYNOPSIS

Adding Card Commands

/usr/bin/olrad [-f] -a slot_id

/usr/bin/olrad -A slot_id

Replacing Card Commands

/usr/bin/olrad [-f] -r slot_id

/usr/bin/olrad -R slot_id

Other Commands

/usr/bin/olrad -n|-q

/usr/bin/olrad [-F] -q

/usr/bin/olrad [-F] -h|-c slot_id

/usr/bin/olrad [-F] -v interface_hw_path

/usr/bin/olrad -g device_hw_path |slot_hw_path

/usr/bin/olrad -I|-P|-p flag slot_id

/usr/bin/olrad -C|-e slot_id

DESCRIPTION

The olrad command provides the ability to perform on-line addition and replacement of I/O cards.

olrad performs critical resource analysis of the system before performing any OLA/R operation. This is to ensure that the system is not left in an inconsistent state after a PCI card is added/replaced.

Only users with root privileges may use this command.

On systems with the capability to handle certain PCI hardware errors during the operation of PCI I/O cards, the olrad command provides the option to attempt recovery from such errors. The availability of this feature is dependent on the platform and operating system environment.

Arguments

The following arguments are used in the olrad command.

slot_id

Slot ID of an OLA/R capable slot. A slot ID is a list of one or more numbers separated by dashes. Each number represents a component of the physical location of the slot. The user can use the slot ID to locate the slot. The sequence of numbers in the slot ID is platform dependent. On an N and L classes, the slot ID contains only the slot number. On all other platforms, including Superdome, the format of the slot ID is:

Cabinet#-Bay#-Chassis#-Slot#

slot_hw_path

Hardware path of an OLA/R capable slot.

interface_hw_path

Hardware path of an interface under an OLA/R capable slot.

device_hw_path

Any hardware path under an OLA/R capable slot.

Options

The following options are supported.

-a slot_id

Prepare to add a card to system at the specified slot. Critical resource analysis is run to ensure that the current card addition onto the system will not cause disruption in the functioning of the system. The driver scripts (pref_replace and prep_replace) for affected slots (if any) are run and the drivers associated with the affected slots are suspended. The slot power is turned OFF, and the attention LED at the corresponding slot is set to BLINK mode.

If the -f option is specified, it overrides critical analysis (CRA) results. See the description for the -f option.

-A slot_id

Post add phase. The slot power is turned ON, the drivers associated with all affected slots are resumed. ioscan is run and if the card is claimed, the driver scripts, post_add for the current slot and post_replace for affected slots (if any), are run and the attention LED at the corresponding slot is turned OFF.

-r slot_id

Prepare to replace a card on the system at the specified slot. Critical resource analysis is run to ensure that the current card replacement on the system will not cause disruption in the functioning of the system. The driver scripts (pref_replace and prep_replace) for the affected slots (if any) and the current slot are run. The drivers associated with the current slot and affected slots are suspended. The target slot is powered off and the attention LED is set to BLINK at the corresponding slot.

If the -f option is specified, it overrides critical analysis (CRA) results. See the description earlier for the -f option.

-R slot_id

Post Replace phase. The target slot power is turned ON. The suspended drivers are resumed and the driver scripts (post_replace) for the current slot and the affected slots (if any) are run. The attention LED at the corresponding slot is set to OFF.

On systems with the capability to handle certain PCI hardware errors during the operation of PCI I/O cards, the post replace phase can be used to attempt recovery of the PCI card and corresponding I/O slot from such errors.

-f

The -f option, if specified, overrides the "data critical" errors returned by CRA. It is important to note that olrad will not allow "critical" errors to be overridden and that olrad automatically overrides "warnings".

Irrespective of whether -f is specified or not, critical resource analysis routines are run before an OLA/R operation, to ensure that the current OLA/R operation does not interrupt the normal working of the system; in other words, to identify "critical" errors.

The "data critical" errors are typically not critical to the system, but they may be critical to the user. Hence, the user need to decide whether or not to use the -f option for overriding these types of errors.

-F

Displays the output in machine readable format. It can be used with the following options: -q, -c, -h, and -v.

-n

Display the number of OLA/R capable slots in the system.

-q

Displays the status of all OLA/R capable slots in the system. In the output, slots with the same bus number share the same PCI Bus. Output fields are detailed below; some descriptions are platform dependent.

On systems with OLA/R capable PCI-Express slots, the output fields are slightly varied. See the PCI Express Based Slots section for detailed description of the fields displayed for such slots.

Slot displays the slot_id.

Path displays the slot_hw_path.

Bus Number identifies the I/O Bus corresponding to the slot.

Max Spd displays the maximum operating speed of the PCI Bus attached to the slot.

Spd displays the current operating speed of the PCI Bus attached to the slot. The card inserted into the slot determines the current operating speed, together with the capability of the slot's PCI Bus.

Pwr displays the slot power status.

Occu displays whether the slot is occupied or not.

Susp displays if the card in the slot is suspended or not.

Driver(s) Capable displays the OL* capability of the interface driver/s that claimed the PCI device/s present in the slot. OLAR field displays whether the interface driver/s are capable of OnLine Add/Replace operations. OLD field displays whether the interface driver/s are capable of OnLine Deletion operation.

Max mode displays the maximum operating mode of the PCI Bus attached to the slot.

Mode displays the current operating mode of the PCI Bus attached to the slot. The card inserted into the slot determines the current operating mode, together with the capability of the slot's PCI Bus. PCI and PCI-X are examples of different operating modes.

PCI Express Based Slots:

On systems with OLA/R capable PCI-Express slots, the output fields are slightly varied. The detailed description of the fields displayed for such slots are as mentioned below;

  • Max Link Spd (Expressed in Giga Bits / Second) indicates the maximum link speed possible for the PCI-Express Link at the slot.

  • Link Spd (Expressed in Giga Bits / Second) indicates the negotiated link speed of the PCI-Express Link at the slot.

  • Max Link Width indicates the maximum link width supported by the PCI-Express link at the slot.

    For example: x8 means the maximum link width supported by a PCI-Express link at the slot is 8 lanes.

  • Link Width indicates the negotiated width of the PCI-Express Link at the slot.

  • Mode indicates the current operating mode of the slot. For PCI-Express slots mode is displayed as "PCIe".

-I flag slot_id

Controls the state of the Attention LED. The valid values for this flag option are: ATTN and OFF. Based on the flag value, the Attention LED at the corresponding slot is set to the appropriate state. The flags are not case-sensitive.

-P flag slot_id

Controls the state of the power indicator. Currently, the only valid value for this flag option is: RAIL. The -P option can be used with RAIL to set the power indicator to follow the specified slot's power state; in other words, the power indicator is turned solid ON if the slot power is ON, or the power indicator is turned OFF, if the slot power is OFF. The flag is not case sensitive.

-p flag slot_id

Controls the slot power. The valid value for this flag option are: ON and OFF. Based on the flag set, the slot power status changes appropriately. The flags are not case sensitive.

-C slot_id

Runs critical resource analysis routine (CRA) only on the specified slot_id and display the results. It checks for critical resources on all affected hardware paths associated with the specified slot. It analyzes file systems, volumes, processes, networking, swap, and dump; and generates a report of affected resources. It lists the severity levels and their meanings.

CRA_SUCCESS

no affected resources in use.

CRA_WARNINGS

resources in use on affected device(s) but none are deemed critical.

CRA_DATA_CRITICAL

probable data loss, only proceed with the user's permission.

CRA_SYS_CRITICAL

likely to bring down the user's system.

CRA_ERROR

some internal CRA error encountered.

Users are advised to use this option first to check out whether the intended OLA operation is safe and would not cause disruption in the functioning of the system.

-c slot_id

Displays the device information (Device_ID, Vendor_ID, Revision_ID, etc) of all the interface devices at the indicated slot. Output fields are detailed below, some descriptions are platform dependent.

Path displays the hardware path of the device.

Name displays the interface driver name that claimed the device.

Device_ID displays PCI Device ID of the device.

Vendor_ID displays PCI Vendor ID of the device.

Subsystem_ID displays PCI Subsystem ID of the device.

Subsystem_Vendor_ID displays PCI Subsystem Vendor ID of the device.

Revision_ID displays the PCI Revision ID of the device.

Class displays the PCI Class of the device.

Status displays the device status register.

Command displays the device command register.

Multi-func displays if this is one of the multiple functions on the PCI device.

Bridge displays if the device is a PCI-to-PCI bridge device.

Capable_66Mhz displays if the device is capable of operating at 66 MHz frequency.

Power_Consumption displays the power consumption of the device.

Capable_Frequency displays the bus frequency at which the device is capable of running.

-e slot_id

Lists the affected slot IDs for the specified slot.

-h slot_id

Displays the hardware paths of the interface node(s) for the specified slot.

-g hw_path

Displays the slot ID for the specified hardware path.

-v hw_path

Displays driver information, such as current state, time-out, etc. Output fields are detailed below.

Name displays the interface driver name.

State displays the interface driver state. State will be RUNNING if the driver is active. State will be SUSPENDED if the driver is suspended. When the driver is in a transition state (say from RUNNING state to SUSPENDED state), this field will indicate a state change in progress. For the rare occurrence of any internal errors during a driver state transition, this field will indicate an operation timed out status.

Suspend time displays the approximate time required to suspend the interface driver. The value displayed accounts for worst case scenarios, and the time taken would normally be less than this.

Resume time displays the approximate time required to resume the interface driver. The value displayed accounts for worst case scenarios, and the time taken would normally be less than this.

Remove time displays the approximate time required to delete the driver instance. The value displayed accounts for worst case scenarios, and the time taken would normally be less than this. This field will be valid only if the target operating environment supports OnLine Deletion.

Error time field is for future enhancements.

When performing an OL* operation on a slot, olrad runs pref_replace and prep_replace scripts in the pre-OL* phase and post_add and post_replace driver scripts in post-OL* phase.

There are no preface and prepare driver scripts for OLA (online add).

For a given OL* operation on a slot, pref_replace driver scripts are run for the affected slots (if any) irrespective of the type of operation being performed on the given slot.

An audit trail is logged onto NetTL log file whenever an OLA/OLR operation is initiated. This information is also written to standard output.

PCI Error Handling

Some systems have the capability to handle certain PCI hardware errors during the operation of PCI I/O cards. When such errors occur, the operating system will suspend the corresponding card and I/O slot. The software states of the components in error will be marked ERROR in ioscan(1M) output. If this scenario occurs, the following sequence can be tried from the olrad command to attempt a recovery of the slot:

1)

If the slot remains powered ON, power it OFF using:

olrad -p OFF slot_id

2)

If the power OFF succeeds, try a post replace operation at the slot using:

olrad -R slot_id

If the card/slot is recovered from the error and the post replace operation succeeds, software states of the components recovered from the error will be restored to CLAIMED in ioscan(1M) output. If the post replace operation fails and the error persists, one of the reasons could be that the card has gone bad. The card in error can be replaced with another card of the same type, and a post replace operation can be tried with the replaced card.

A complete description on PCI Error Handling is not covered here. Refer to documents on PCI Error Handling for details, available at the http://docs.hp.com website. Note that the sequence mentioned here for PCI Error Handling is generic. This is subject to changes depending on different platforms and operating system releases.

Logging

olrad uses the NetTL subsystem to log errors and audit trail for all OLA/R operation performed on various slots.

olrad makes use of the sysadmin subsystem formatter to format the log messages.

The following details are not logged:

  • CRA report when performing OLA/R,

  • CRA report when using the -C option,

  • Output of view information options such as -v, -c, -g, -h, -q, and -n.

EXAMPLES

Adding a New Card

1.

Get the information about all the OLA/R capable slots. Make note of the slot_id field:

/usr/bin/olrad -q

2.

Prepare to add:

/usr/bin/olrad -a slot_ID

3.

Physically insert the card into the slot.

4.

Post add:

/usr/bin/olrad -A slot_ID

Replacing a Card

1.

Get information about all the OLA/R capable slots. Make note of the slot_id field:

/usr/bin/olrad -q

2.

Prepare to replace:

/usr/bin/olrad -r slot_ID

3.

Replace the faulty card in the slot with a working card. The new card must be identical as the card being replaced.

4.

Post Replace:

/usr/bin/olrad -R slot_ID

RETURN VALUE

olrad returns the cra-return values when invoked for -C (cra-only) option. The valid values are as follows:

0

CRA_SUCCESS

1

CRA_WARNING

2

CRA_DATA_CRITICAL

3

CRA_SYS_CRITICAL

4

CRA_ERROR

For all other options olrad returns the following:

0

Successful completion.

-1

On failure. olrad also logs a message on the NetTL log file and to standard error.

FILES

NetTL

log file containing olrad audit trail and errors.