Multi-System Management Tools for HP-UX

 

 

Robert J. Bury
Hewlett-Packard Company
3404 East Harmony Road
Fort Collins, CO 80528
(970) 898-3066 (Phone)
(970) 898-2838 (Fax)
Bob_Bury@hp.com

 

 

 

August 1999

 



Table of Contents

Introduction

Intended Audience

System Management Challenges

System Management Goals and Metrics

HP Service Control: A Suite of System Management Products

Fault Management and Monitoring

Workload Management: HP Process Resource Manager

Centralized System Access and Administration

Centralized Multi-System Management with HP Service Control for HP-UX

Goals of Multi-System Management

Multi-System Management Paradigms

Service Control Multi-System Management Features in Support of Management Paradigms

Central mgmt station

Nodes

Node Grouping

Distributed Task Facility and Integrated Tools

Role based security

Tasks and Logging

Integrated Multi-System Management Solutions

Multi-System Software Management

System Configuration Repository

Initial System Deployment and Recovery

EMS Monitor Deployment

Integration with Enterprise Frameworks

Summary


 

Introduction

Data Center IT managers are under increasing pressure to deliver IT services with high availability, low cost, and excellent performance and resource utilization.  As the capacity and scale of IT operations has grown, so has the complexity of managing the IT environment.  System administrators are today using various approaches to provide consistency and control over their IT resources and operations.  A systems management paradigm is needed that offers a systematic and controllable approach to balance three objectives: improving systems availability, facilitating systems growth, and reducing costs.

This paper will outline some of the system management challenges facing IT managers and metrics that are important in the data center.  An overview of HP’s system management suite of products will be given.  HP’s multi-system management tools for consistently and effectively managing groups of HP-UX systems will be described.

Intended Audience

This paper is aimed at system administrators and IT managers responsible for data center environments.

 

System Management Challenges

Systems administrators and data center personnel are faced with a number of system management challenges, among them:.

·      Increasing complexity of the IT infrastructure

·     Proliferation of underutilized servers, negatively impacting return on investment (ROI)

·      Escalating costs in managing the IT infrastructure (measured as the average number of systems administrators to servers)

·      Lack of controls to ensure quality of service (performance, response time, and availability)

·      User error and communication breakdowns that lead to system problems

·      Slow application deployment, which impedes competitiveness

 

System Management Goals and Metrics

Systems management goals can be summarized as improving performance, increasing availability, and reducing cost. These goals can be achieved by enhanced systems administration tools and capabilities, decreasing the number of personnel required to achieve and meet service level objectives, and improving the processes surrounding the operation of the mission-critical systems and applications.


 

HP Service Control: A Suite of System Management Products

 

HP’s systems management portfolio includes a powerful suite of tools called HP Service Control. HP ServiceControl provides a robust set of tools that includes HP Central Web Console, Central Management Station, and powerful systems management software that allows systems administrators to centrally configure, deploy, and manage multiple HP-UX systems. Included with HP ServiceControl 1.0 are tools that enhance productivity and application availability by providing automatic status monitoring and control of critical system components anywhere on the enterprise network. Other components of HP ServiceControl 1.0 manage key system resources that affect application availability, performance, and response-time.

Hewlett-Packard’s HP-UX operating environment brings powerful new systems management capabilities to enterprise customers. These include:

 

Features

Benefits

Single control point for simplified and unified multi-system administration, deployment, and management

Reduced management complexity, lower administrative overhead, and faster deployment
of new applications

Dynamic system resource allocation and control, controlled by business policies

Business and IT are able to define and meet service level objectives (SLOs) for performance, response time, and availability; maximum resource utilization improves return on investment

Centralized and comprehensive, proactive fault-detection and avoidance for server, software, networks, and storage

Higher availability by providing a new, lower level of granularity for instant detection, notification, and avoidance of critical problems

Centralized and automated deployment of multiple operating system images and patches

Faster, more efficient deployment of new systems and applications

 


Fault Management and Monitoring

As organizations depend increasingly on distributed systems, they become more exposed to system outages caused by a failure of an individual component. Fault management methodologies and tools are necessary to help handle and prevent these outages. A fault management system reduces the response time to system faults by automating the mapping between a troubled resource and the appropriate troubleshooting action.

HP’s Event Monitoring Service (EMS), included in the ServiceControl solution suite, is a system monitoring application designed to facilitate real-time monitoring and error detection for HP products in the enterprise environment. This framework provides centralized management of software and hardware devices and system resources, and it provides immediate notification of real or potential problems and system status. HP EMS can receive data on unusual activity, add information on the problem’s source, and provide recommendations on problem resolution.

HP EMS consists of a set of system and network monitors within a monitoring environment. This monitoring framework has an easy-to-use interface and provides a mechanism for monitoring resources, registering monitoring requests, and sending notification when resources reach user-defined critical values. All EMS monitors come with preset threshold values defined by the monitor developers. EMS monitors are also designed for flexibility, so users can pre-configure event-monitoring thresholds to desired levels.

The monitors can also poll hardware, disks, clusters, network interfaces, and system resources and send information to the framework. An event can be simply defined as something the administrator wants to know about—for example, a disk failure or file space dropping below a predefined level.

HP EMS complements HP’s industry-leading high availability solutions by providing the most comprehensive set of monitors on the market today. The EMS framework allows users to tailor the monitoring devices to suit specific needs, and it provides:

·    Real-time notification—Just-in-time notification gives users immediate visibility of the problem so they can prevent unplanned downtime through remedy and repair.

·    Several notification methods—HP EMS can notify users of an event by way of the system console, by e-mail, or by generating pager notification through systems management tools such as OpenView IT/O. These notification methods can help reduce total cost of ownership, as the IT staff doesn’t need to be physically present to be made aware of an event. Event messages can be integrated with Enterprise Management applications such as HP OpenView and CA Unicenter for centralized systems management.

·    Logging capabilities—EMS allows the administrator to store messages in syslog or textlog, which provides logging for future trend analysis.

APIs are available so that system managers can write their own custom monitors that will plug into the EMS framework and take full advantage of the EMS capabilities.

HP Multi-Computer/ServiceGuard (MC/ServiceGuard) and Multi-Computer/LockManager (MC/LockManager) are specialized facilities for protecting mission-critical applications from a wide variety of hardware and software failures. With MC/ServiceGuard or MC/LockManager, multiple nodes (systems) are organized into an enterprise cluster that delivers highly available application services to LAN-attached clients.

Both HP MC/ServiceGuard and MC/LockManager use the EMS monitors to determine the health of monitored resources, such as disks or networking devices, and may fail over application packages upon receiving notification from EMS that a component has failed. All of the EMS resource monitors can be integrated with MC/ServiceGuard and MC/LockManager through the EMS framework. By writing custom EMS monitors for designated resources, EMS makes it possible to extend the fail-over capability of MC/ServiceGuard and MC/LockManager to include those monitored resources. EMS can also provide information to the leading enterprise management platforms from HP OpenView, Computer Associates, BMC Software, PLATINUM technology®,  and IBM Tivoli.

 

 

Figure 3.

HP’s Event Monitoring Service (EMS) Framework


Workload Management: HP Process Resource Manager

HP’s ServiceControl systems management solution includes HP Process Resource Manager (PRM). With PRM, IT managers are assured that performance, response time, and availability objectives for critical business applications are consistently maintained. By focusing the appropriate amount of system resources (CPU, memory, and I/O) exactly where the business needs them, PRM’s ability to deliver consistent performance and response time under heavy loads prevents an application’s peak processing from impacting the performance of other applications on the same HP-UX system. PRM’s ability to maintain consistent service levels among competing applications allows users to run multiple application workloads on the same HP-UX system with control and confidence. The result is fewer systems to manage, reduced management complexity and cost, and an improved return on IT investment by maximizing the utilization of each HP-UX system. HP’s Web QoS provides similar control and service consistency for Web-enabled applications.

When PRM is enabled, groups of users or applications are guaranteed a specified portion of the central processing unit (CPU) processing cycles or available real-memory or I/O resources. PRM effectively creates a “soft partition” which isolates the selected resources from other applications running on the HP-UX system.

With PRM the system administrator is able to:

·    Manage CPU, I/O, and real-memory allocation

·    Optimize CPU, I/O, and real-memory usage

·    Allocate CPU, I/O, and real-memory resources to business-critical applications in a way
that reflects real business needs

·    Prioritize workgroups based on business importance

·    Guarantee each group of users their specified CPU, I/O, or real-memory entitlement

·    Dynamically modify CPU, I/O, or real-memory allocations to adapt them to changes in
business priorities

Hewlett-Packard PRM allows customers to run mission-critical applications in less time by matching processing needs to CPU resources. System administrators can make intelligent decisions regarding computing resources that optimize the processing power available. This in turn increases the productivity of HP 9000 users and systems. With HP PRM, customers can meet the service level agreements dictated by the business needs of workgroups by providing specific CPU resource allocations. The result is an overall improvement in the quality of information technology (IT) service delivery and an improvement in end-user satisfaction.


Centralized System Access and Administration

 

HP’s new Central Web Console provides a single, unified control point for accessing the console of multiple HP-UX systems. The Central Web Console eliminates the need for the individual system consoles that have proliferated—saving floor space and simultaneously easing administration.

The Central Web Console provides a scalable and secured Web-based single point of access to HP server console ports. This product answers one of the needs of the system consolidation market. It removes the traditional “ASCII” terminal attached to each server via the console port located on the service processor card by replacing it with a virtual terminal accessible via any Web browser. This solution will allow scalability for up to 228 servers. A conceptual view of the console is shown below:

 

Figure 2.

HP’s Central Web Console


Centralized Multi-System Management with HP Service Control for HP-UX

Goals of Multi-System Management

IT system administrators and data center personnel who are responsible for managing multiple systems are aiming to increase their efficiency and reduce their costs, while improving the availability of their systems and services.  Many have found that user error is the root of many system management and availability problems.  The goals of multi-system management solutions are:

·         To promote consistency and order among multiple systems so as to gain efficiency and reduce user error

·         To gain efficiency by operating across multiple systems at once

·         To centralize access to multiple systems in order to operate on them more efficiently

·         To manage multiple systems securely and with confidence in what is being done

Multi-System Management Paradigms

The Service Control multi-system management solutions are built around tested system management paradigms or strategies that have proven to be effective in managing multiple systems.  These paradigms are centralization of management, consistency and replication, and administration by distribution.

·         Centralization of Management:  Administrative actions and access to the systems should be done from one centralized location via a common user interface.  This centralized point of access should contain tools and all configuration information about the multiple systems being managed.

·         Consistency and Replication:  Complexity and user errors are reduced when multiple systems are managed and configured consistently.  Consistent systems can be grouped together for common management operations, and replicated to expand capacity.

·         Administration by Distribution:  Ongoing, incremental administrative changes are done from the point of centralized management by distributed tasks to systems or groups of systems in order to effect changes, gather information, or distribute files and information.

 

 

 

 

 


 


Figure 3.

Service Control Multi-System Management

Service Control Multi-System Management Features in Support of Management Paradigms

Central mgmt station

A Central Management Station is used to provide administrative access to a set of systems to be managed.  Software called the Service Control Supervisor runs on the Central Management System.  Management tools are centralized and can be applied to multiple systems.  The Central Management Station and the Supervisor that runs on it maintains a description of systems in the management domain, and maintains the persistent storage of this information.  The Central Management Stations provides the user interface for administrative action.  All tasks performed on the set of managed systems are done from the Central Management Station, either through its Graphical User Interface (GUI) or Command Line Interfaces (CLI).  The GUI is web-based so that access to the Central Management Station can be accomplished from remote systems via a web browser interface.

Nodes

In Service Control Multi-System Management, the HP-UX systems being managed are referred to as “managed nodes” or simply as “nodes”.  A node represents a single instance of HP-UX running on some hardware.  A node can be a single HP-UX server or system, or a partition or protection domain running a single instance of the HP-UX operating system.  A node in a Service Control Management Domain can by any kind of HP-UX system, from low-end A-Class to high-end multi-node V-Class systems.  Membership in the Service Control Management Domain is very flexible, and is not constrained to any fixed or specified replicated hardware configuration.  The Central Management Station itself is a managed node in the Service Control Management Domain.

All nodes are explicitly added to the Service Control Management Domain via either a command line or GUI operation.  The process of adding a node to the Service Control Management Domain involves installing software components on the managed node and performing an “add node” operation on the Supervisor.  Managed nodes run agent software that performs the management tasks sent to them from the Supervisor.  In order to participate in the management domain, the nodes must have a copy of the public key of the Supervisor for network authentication.

The user interface runs only on the Central Management Station.  No other node in the management domain is capable of executing remote tasks, accessing the centralized repository of information, or performing any other Service Control multi-system management operations.

Node Grouping

Nodes within a Service Control Management Domain can be grouped together into node groups.  These node groups can be used to bring order and organization to the management domain, and directly support the management paradigm of employing consistency to reduce complexity.  Node groups can be formed to reflect your environment.  For example, systems can be grouped by application type, by operating system revision level, by locality or geography, by use, by hardware class, in general in any way that makes sense to the customer.

Node groups can have overlapping memberships, such that a given node can be a member of more than one group.  Node groups provide a convenient was to specify multiple targets for distributed tasks and actions.

Distributed Task Facility and Integrated Tools

At the heart of Service Control Multi-System Management is the distributed task facility.  A central part of the Supervisor is the ability to execute various management commands or applications on one or more nodes.  Single nodes, multiple selected nodes, or groups can be selected for distributed actions.

Commands, scripts, or applications are integrated into the Service Control Management Domain as a Supervisor tool.  The collection of tools provides essentially a toolbox for system administrators and operators to manage their HP-UX management domain.  Tools can be defined that run simple HP-UX commands (like ps, bdf, or vginfo), launch single system interactive applications (like SAM), or launch multi-system aware applications (like Ignite-UX).  HP has integrated and tested a set of management tools that are standard with the Service Control Management Station.  In addition, end customers can integrate their own custom applications and scripts, which are pervasive in nearly all data center environments.

There are two general types of tools: local and distributed.  Local tools, also called single-system aware tools, run on a node and only affect the operation of that node.  To run a local tool on multiple target nodes, the Service Control Supervisor executes the tool on each target node.   In addition to executing commands and launching applications, local tools are able to copy files from the Central Management Station to the target nodes.  This provides the ability to push out copies of configuration files that need to be kept consistent across a specific set of nodes.

Distributed tools, also called multi-system aware tools, run on a single node but are able to operate on multiple other nodes.  To run a distributed tool on multiple target nodes, the Supervisor executes the tool on a single node and passes it a list of target nodes.  These applications execute on a single node but contact other nodes to accomplish their work via their own mechanisms, outside the control of the Supervisor.

Tools can have arguments that are specified by the Supervisor when the tool is run.  In addition, tools can be grouped into categories as an organizing method to facilitate finding the appropriate tool in the overall toolbox. 

Role based security

Data centers with large numbers of systems typically employ a team of people with varying responsibilities who together operate on the systems.  The “many hands” that interact with a system perform different roles.  For example, a typical data center may have the following roles:

·         Help desk people who receive requests for change and handle a small percentage of them, and pass the majority on

·         Operators who perform specific, well-defined tasks

·         Mainline system administrators who plan and execute the majority of system changes

·         Expert system administrators and system planners who plan and execute major system changes and to whom the most complicated tasks are escalated

Unfortunately, many Unix environments are managed in a “all-or-nothing” security model, in which people either operate as superuser and are all-powerful, or do not know the root password, and are very limited in their capabilities on the system.  This security model does not support the reality of “many hands” with varying skill levels and responsibility levels operating on the system.  In addition, this security model invites user errors by tempting to open up the superuser password to more people that are skilled to use it.

Service Control Multi-System Management provides a solution for this situation in the form of secure role-based delegation of tasks to specific individuals and roles.  Users are first authenticated through the standard HP-UX authentication mechanism of login and password.  Users are then assigned the ability to run some set of tools on some set of nodes.  An association is formed to link a user to a role on either a node or a group of nodes.  The role defines what the user is able to do on the associated node or node group.  Each role can have one or more tools that are associate with it, and each tool can belong to one or more roles.  When users are given the authority to perform some restricted set of functionality on one or more nodes, the authorization is done based upon the role definition and the users membership in that role.   The role concept allows the sum total of functionality represented by all the tools to be divided into logical sets that correspond to the responsibilities that are given to the various administrators and operators in the data center.

Tasks and Logging

Data centers need an accurate and complete record of changes that have been made in their environment in order to execute an effective change management policy.  Not only do they need to know what changes have been made, but also by whom, and the status of changes in progress.

In Service Control Multi-System Management, when a Supervisor runs a tool, the result is a Supervisor task.  A task represents the invocation of a specific tool on a specific set of target nodes at a specific time.  For every target node, a task goes through a set of states as it progresses.  Information about a task can be tracked from the Central Management Station in the Task View.

An integral part of the Supervisor is the ability to record and maintain a history of events, both Supervisor configuration changes and task execution events.  Details such as launch time, target nodes, and results are logged.  Task execution events include details and intermediate events associated with running a tool.  The details include who launched the task, the task identifier, the task start time, the actual tool and command line with arguments, and the list of target nodes.  Exit codes, stdout and stderr are logged, and tools and applications can create their own log file entries.

Integrated Multi-System Management Solutions

Several system management solutions are tightly integrated with the Service Control Management Station to the extent that their functionality is integrated into the Tools View of the Central Management Station and are part of the “toolbox” from which system administrators can draw in a secure, controlled way.  From a customer’s point of view, Service Control Multi-System Management provides an integrated solution for centrally managing multiple HP-UX systems.  From the Central Management Station, customers can:

·         Logically and flexibly group systems for consistency, as needed

·         Authorize access to management capapbilities based on user defined roles

·         Create and distribute custom tasks and commands for execution in a secure was with an audit trail

·         Launch applicaions on nodes or groups of nodes

·         Replicate data files to groups of systems

·         Install and configure the HP-UX operating system

·         Upgrade and patch software on multiple systems

·         Archive and recover the operating system state

·         Snapshot and compare system configuration data and inventory

·         Configure monitors and monitor events on systems simply

·         Access the console on multiple systems

·         Interface via a Web-based GUI

These capabilities are accomplished by integrating HP’s system management solutions with the Service Control Management Station.


Multi-System Software Management

Service Control Multi-System Management integrates software management capabilities using the Software Distributor product of HP-UX.  The integration focuses on software deployment and management for the HP-UX operating system and layered application software and patches across nodes of the Service Control Management Domain.  The Software Management component of Service Control provides integrated functionality for software management of single or multiple systems in one task.  It provides depot management and software deployment for HP-UX software and patches.  The software management component uses the secure role-based authorization model of Service Control, and integrates target and tool selection.  Service Control software management can be used seamlessly with the standard software package creation, patch acquisition, patch bundle creation, and other Software Distributor-based products.

Service Control Software Management integrates the core functionality in Software Distributor, preserving the powerful depot and software selection capabilities found in the HP OpenView Software Distributor  product, while supporting the distributed, task-base management model of Service Control.  Service Control Software Management fully supports the Nodes and Roles features.  The tools are integrated into the Node, Node Group, and Tools views for consistent tool execution and target node selection.

System Configuration Repository

The System Configuration Repository provides functionality to snapshot and capture configuration and inventory information about a system at points in time, and then to compare snapshots of different systems or of the same system at different points of time.

Service Control Multi-System Management contains integrated System Configuration Repository functionality.  From the Central Management Station, users can compare the configuration snapshots of two systems, see what changed on a system between the two most recent snapshots, and change the frequency at which snapshots are taken on all managed nodes.  Node selection and role-based authorization is a product of this integration.

Initial System Deployment and Recovery

The Ignite-UX product in HP-UX provides a powerful tool for distributed installation of operating system images to multiple systems from a central server.  Standard, reusable configurations can be created and deployed in a replicated fashion to create a consistent operating environment.  Ignite-UX also provides a very capable mechanism to archive the operating system configuration of a system over the network, and to boot and restore the operating system state.

As an integrated tool in the Service Control Multi-System Management solution, Ignite-UX integrates with the role-based security, grouping, and distributed task features of Service Control.

EMS Monitor Deployment

EMS monitors, as described above, and be installed and configured across multiple systems from the Service Control Central Management Station.

Integration with Enterprise Frameworks

The focus of Service Control Multi-System Management is on HP-UX and the administrative tasks that are specific to and closely coupled with HP-UX and HP systems.  Service Control integrates with enterprise management frameworks, which provide multi-platform solutions, for more generic and common system management functionality.  Service Control integrates with enterprise event monitoring products and consoles, and passes events in standard formats to these frameworks.

Summary

Service Control Multi-System Management provides an integrated solution for managing multiple HP-UX systems consistently and effectively, improving the key metrics for improved availability and reduced costs.


 

Author | Title | Track | Home

Send email to Interex or to the Webmaster
©Copyright 1999 Interex. All rights reserved.