Troubleshooting in NT Environments with an Internet-based Diagnostic Tool

Introduction

As Windows NT is more commonly deployed into enterprise environments, support tools are needed that provide quick solutions to problems that arise.  Physical distance from the malfunctioning machines, reliability of information obtained from end users, and  the determination of the appropriate troubleshooting steps are among many factors that make support of desktops and servers a complex proposition. There are a number of tools available today that attempt to ease the burden on IT professionals by addressing some of these factors.

 

E-Diagnostics for Windows, an Internet-based NT support tool offered as part of a NT software support contract support contract with Hewlett Packard's Customer Support Organization, provides an end-to-end solution that offers the fastest possible resolution of problems by:

 

·         Guiding customers through a simple question and answer dialog that launches diagnostic probes throughout their environment in an attempt to solve the problem at the customer site[BJD1] .

·         Electronically escalating problems that cannot be solved locally to a Hewlett Packard Response Center, automatically transferring system configuration and diagnostic data.

·Frequently updating the diagnostic capabilities of the onsite tools through an automated download mechanism, maximizing the troubleshooting capability of the customer locally.

 

The remainder of this paper explores the problem of supporting NT, gives examples of currently available point products that partially address these issues, and details the end-to-end solution that HP provides.


Problem Description

A Typical NT Deployment
Most NT environments of significant size have a similar architectural layout to them:


 


A typical set up is to have a central IT group maintain the company-wide networking services such as user accounts (via one or many Primary Domain Controllers) and TCP/IP name resolution (via some combination of DNS, DHCP, and WINS).  Field offices, connected to the central office through some kind of WAN, have a Backup Domain Controller to speed the authentication of the desktop machines and provide productivity services like file and print sharing, email,  or database access using Back Office Applications.  Finally, the end user desktops running NT Workstation, Windows '95 or Windows '98 interact with the local services through a LAN and offer productivity applications.

 

There are multitude of variations to this scheme; for example, some companies have only one site, while others have too many people for a single PDC.  Nonetheless, the basic layout is similar across a vast majority of businesses.

 

Support Problem in the Typical Environment

 

Given this environment, there are many support problems that arise:

 

·         Physical distance from machines - Understanding a defect with a machine almost always requires obtaining data from it.  If a central office is in Phoenix and a field office is in Orlando, it is not convenient to walk over to a machine with a problem.

·         Reliability of information obtained from users - When an end user describes a problem, if specifics are not correctly relayed to the IT professional trying to determine a solution, the time spent crafting an answer can increase dramatically.

·         Determination of diagnostic steps - Not all problem solvers are created equal.  Often times, the personnel at the field offices do not have as much knowledge as those at the central office.  Based on knowledge of the environment and NT in general, a more experienced person may take a dramatically different set of steps than a less knowledgeable person.

·         Time taken to perform various diagnostic tests - Manually testing different things in an effort to get a clear picture of a problem takes a significant amount of time.

·         Analysis of diagnostic test results - Once tests are run, their results must be analyzed and compared to some expected state.  Like knowing what steps to perform, the success factor here is linked to the knowledge of the person performing the tests.

·         Interaction with a support provider call coordinator - If the problem cannot be solved with local resources, a call can be logged with a support provider like Hewlett Packard.  Most support providers have non-technical people answering the phone whose job it is to get information about the problem and then route the call to the appropriate team.  Often crucial information can be lost as reported facts are translated into the call tracking system.  Not only does this increase the chance that the call will be incorrectly routed, but information often has to be repeated when the correct support engineer gets involved.

·         Data collection by the support engineer - Here, the support engineer capable of solving the problem might have to ask some of the same questions the call coordinator did because of the data loss when the problem description was entered.  Even when that does not happen, more data is typically needed to get an accurate description of the environment.

·         Multi-level internal support - The support picture is further complicated if a company has their own internal helpdesk that attempts to solve problems before the central IT personnel get involved.  Typically, helpdesk personnel are not authorized to escalate the problem to the support provider and any problem data which has already been gathered and stored in the local call tracking system is not preserved as part of the escalation.

 


Partial Solutions

A number of tools offer partial solutions to the problems discussed above.

 

Diagnostic Tools

These tools are typically based on a reasoning system and present the user with a sequential list of questions retrieved from a local database.

The questions can be answered either by direct input from the user or by programatically querying the system.  Based on the answers received, an attempt is made to determine the nature of the problem.Examples of such tools include First Aid from McAfee and System Wizard from Systemsoft.

 

The main advantage of these tools is that they are extremely easy to use and give consistency to the problem solving process.  These products are typically most useful for diagnosing problems surrounding the use of a productivity desktop machine and are often deployed to all end user machines in an effort to reduce the number of "simple" calls that come into a corporate helpdesk.

 

The down side to such tools, however, is that the knowledge it contains is limited to solving problems on a single machine.  In the typical NT environment, an end user can be experiencing a problem because of a root cause on a machine that is not their own.  Additionally, these tools do not integrate with the other parts of the support chain.  If the knowledge contained within it does not solve the problem and a call must be logged to escalate the issue, the data collected during the end user's aided diagnosis is lost.  Finally, because these tools must be deployed in their entirety on each machine in an environment, the volume of overall corporate disk space is relatively high.

 

Still, for simple desktop problems that are isolated to a single machine, these tools are a good buy.

 

Remote Access Tools

Typically, a remote access tool allows a system administrator to launch an application on his or her own system that makes a network connection to another machine in the computing environment.  On the system administrator's machine, there is a window that has in it a live picture of the remote machine's screen that can be used to interact with the remote system as if the machine were sitting there locally.  Examples of such tools are pcAnywhere from Symantec and Remote Desktop from McAfee.

 

Advantages of these tools are obvious.  A system administrator can get access to any machine in the environment, run diagnostic tests, and change configurations without having to walk over to the machine, which is not always possible.

 

Access, however, is all that is provided.  There is no help in determining what steps to take in diagnosing the problem, all data must be collected manually, and no data collected can easily be sent on to the next escalation point.  Remote access tools are essential to any IT environment, but they do not provide a big enough solution to make support easier.

 

·         Tools for remote access like pcAnywhere

·         Pros: Provides access to machines remotely

·         Cons: No help in diagnosing, data collection must be done manually, no escalation path

 

Electronic Call Logging

Many support providers offer the ability to log a call through a web interface instead of picking up the phone.  Typically such interfaces allow the customer to provide a detailed description of their problem along with information regarding how to contact them with a solution or a request for more specific data.  An example of this is HP's Software Call Manager which is part of the Electronic Support Center.

 

This effectively circumvents the roll of the call coordinator in the call logging process.  The description of the problem can be as specific and as technical as the customer wants it to be.  The contents of the description are parsed for key phrases and electronically routed to the appropriate team.  Because no accuracy or detail is lost the amount of follow-up data needed by the support engineer is reduced and the time needed to resolve the issue is far less than when logged by a phone call.

 

All data provided, however, must be entered manually by the customer.  Additionally, that data is sometimes the customer's perception of the problem and not unbiased in nature.  Obviously, electronically logging a call provides no help in trying to solve the customer before a call has to be logged either.

 


End to End Solution

E-Diagnostics for Windows, a HP support product currently available to NT support contract customers and soon purchasable through the web for non-contract customers, combines elements of the partial solutions to provide a complete, end to end solution to the NT support problem.  It consists of:

 

·         Diagnostics that can involve more than one machine in an environment

·         Remote, automated data collection

·         Integration with Software Call Manager for problems that the software does not solve

 

Installation

Once the product has been downloaded from Hewlett Packard, installation occurs in two phases.  On an NT Server machine running Internet Information Server 4.0 or later, the Management Server software is installed.  This is the central E-Diagnostics hub in the customer environment.  The logical components for all diagnostics reside there as do the modules that communicate information back to HP should the problem have to be escalated. 

 

All machines that could be potentially diagnosed with E-Diagnostics are called Nodes.  In the second phase of installation, each Node must have at least the Diagnostic Installer software installed on it.  The Diagnostic Installer enables dynamic installation of data collection components while a diagnostic session is being executed.  This module is provided because it is unlikely that all problems will occur on all Nodes, so customers are not forced to install all data collection components on all machines ahead of time.  The Diagnostic Installer on the Node being diagnosed communicates with the logical components on the Management Server to insure that the needed data collection components are present before a diagnosis takes place.

 

Process Flow

The process of using E-Diagnostics starts when a System Administrator launches a browser from any machine in the environment and points it to the Management Server and the E-Diagnostics URL.  From this Management Client, the System Administrator will select a diagnostic to run from the list provided.

 


 


If none of the diagnostics currently available fit the problem that the System Administrator is experiencing, the problem can immediately be escalated to HP by logging a call electronically.  In this scenario, however, assume that a printing problem is being experienced.  Next, the System Administrator must enter the name of the machine that this problem is being experienced from.

 


 


Since configurations can vary greatly between different machines in an environment, it is important  to diagnose the problem from the perspective of the machine that is experiencing the problem. 

 

After answering this first question, the logical components on the Management Server first insure that the needed data collection components have been installed on the Node selected.  If not, they are installed using the Diagnostic Installer.  Then, the list of currently configured printers is collected off the Node in question.  This particular machine has several printers configured on it, so the System Administrator must choose the printer in question.

 


After answering this first question, the logical components on the Management Server first insure that the needed data collection components have been installed on the Node selected.  If not, they are installed using the Diagnostic Installer.  Then, the list of currently configured printers is collected off the Node in question.  This particular machine has several printers configured on it, so the System Administrator must choose the printer in question.

 

 


 

 


At this point for this diagnostic, no more data will be required from the System Administrator.  The logical components on the Management Server collect whatever data is needed from the data collection components that have been installed on the Node.  The logical components follow a specific set of steps in attempting to find the root cause of the problem.  When the diagnosis is complete, the results are displayed:

 


 

 


Notice that a cause is given as well as details regarding the exact set of steps that the logical components performed during the course of the diagnostic.

 

Suppose that this information did not solve the problem and a call should be logged with HP.  The question at the bottom of the results page "Did this solve your problem?" should be answered with "No, log a call with Hewlett Packard."  Selecting this link causes three things to occur.  First, the customer is programmatically logged in to the Electronic Support Center behind the scenes.  Next, all data that has been collected during the current problem solving session is sent up to the Electronic Support Center so that it can be incorporated into the new call.  Finally, a new instance of the browser on the Management Client is launched and pointed directly to the Software Call Manager screen. 

 



 


There, the System Administrator can enter more information about the problem and submit their call.  The data that was sent up to the Electronic Support Center by E-Diagnostics is appended to the call so that the HP support engineer has access to it while attempting to determine a solution.

 

Benefits

·         Easy to Use - System Administrators are guided through a simple question and answer dialog during the course of the diagnosis.

·         Consistent Diagnosis and Data Collection - Regardless of the skill of the person attempting the diagnosis, the logical components run the same.  For the less experienced person, this allows them to solve problems they otherwise would not be able to.  For the more experienced person, data is collected and analyzed for them, reducing the amount of their time they spend on a particular problem.

·         Integrated Call Escalation - Problems that cannot be solved locally by E-Diagnostics are easily escalated to a Hewlett Packard Response Center, automatically transferring system configuration and diagnostic data.

 

 

·E-Diagnostics for Windows

·Determines which diagnostic steps to perform

·Performs diagnostic tests on end user systems remotely

·Perspective from the machine experiencing the problem

·Analyzes the results

·If the problem is not solved, all data is saved and escalated to Software Call Manager.

·Current problem set includes top 3 problem areas HP receives NT calls on:

·Network connectivity

·Network security

·Printing

·Planned for September:

·Blue screen

SQL


 [BJD1] You may want to define more clearly what you mean by locally

 [BJD2]Isn’t all walking physical : )

Author | Title | Track | Home

Send email to Interex or to the Webmaster
©Copyright 1999 Interex. All rights reserved.