←Part 17  Part 19→

The HP 3000--for Complete Novices
Part 18: An Introduction to IMAGE/SQL

Feature by George Stachnik
Most COBOL compilers (including HP/COBOL) support extensions to the ANSII standard COBOL language. These extensions allow you to do things that are proprietary to specific platforms. Extensions to ANSII COBOL make it possible to do things on the HP 3000 that you cannot do on other platforms. For example, last time we explored ways of calling HP 3000 intrinsics. In this installment, we're going to begin looking at the HP 3000's most important proprietary feature--the IMAGE/SQL database management system.

Extensions are what make languages like HP/COBOL proprietary. Of course, in this sense, so-called open systems such as UNIX and NT are proprietary too. All of these platforms support a number of different COBOL dialects, virtually every one of which consists of an ANSII standard core surrounded by proprietary extensions. And COBOL is not unique in this regard. There are proprietary extensions to virtually all computer languages, including C, C++, and even the most popular versions of Java.

During the last 10 years or so, the industry has moved toward an approach to programming that is driven by industry standards. If you're writing software, it seems like a no-brainer that you should design your code so that it can be easily ported to other platforms, should the need ever arise. For this reason, many programmers seem to have lost interest in proprietary platforms like the HP 3000, and especially in the extensions that make them proprietary. After all, if nobody's using them anymore, why would anybody want to learn about them?

In spite of all the hoopla surrounding the move to open systems, the fact remains that most applications used in the real world make extensive use of proprietary, non-standard extensions to the languages that they were written in. There are a variety of reasons why. Some are technical, although many are marketing related.

In the past, platform vendors used proprietary extensions to differentiate their platforms from those of their competitors. For example, when I first began working for HP, HP 3000 sales literature extolled the unique functions of the HP 3000, its operating system, even its spooler--saying that they were "unique in the industry."

In today's open systems world, such a claim would be marketing suicide. But in the 1980s, software vendors that failed to take advantage of all the features of the platforms their products ran on (especially the proprietary ones) risked losing sales to their competitors who did. Customers wanted to be assured that their independent software vendors (or ISVs) were taking full advantage of the hardware that they were buying.

But by the 1990s, customers were asking for different things. Hardware was less expensive. Customers began to realize that their biggest expenses were no longer the boxes that sat in the data center: the largest line items in their budgets were now software-related. Their concerns shifted to getting the most bang for their software buck. Hence, the idea of designing applications to be platform portable took off in the 1990s.

HP 3000 Applications in the 1990s

Most HP 3000 applications were written in the 1980s (some even date back to the '70s). Most of them fully exploit the proprietary features of the HP 3000. And once again, this is a sword that cuts in at least two different ways:
  • On one hand, MPE/iX does provide strong support for industry standards--in much the same way that UNIX and NT do. This means that applications that were created to be compliant with applicable standards can be ported between HP 3000s, UNIX machines, and even Microsoft's Windows NT operating system.
  • But on the other hand, typical HP 3000 applications predate today's emphasis on standards. And the HP 3000's proprietary nature is precisely what makes it so difficult to port typical HP 3000 applications to other platforms. Keep in mind the fact that taking advantage of that proprietary functionality (like calling intrinsics explicitly) ties your application code to the HP 3000 platform.
MPE/iX's robust and reliable (if proprietary) architecture is one of the things that make the HP 3000 such an attractive platform. In this context, proprietary simply means that the HP 3000 does things that other platforms don't do. Applications that take advantage of the proprietary features of the HP 3000 can potentially reap great rewards in performance and reliability.

Database Management

At the time the HP 3000 was introduced, file access (opening, closing, reading, and writing files) was largely standardized. You could use ANSII standard COBOL to store and access data on your HP 3000, and as long as the data was stored in ordinary files, your COBOL code could be ported easily to other ANSII-standard COBOL platforms.

However, we're now going to turn our attention to an area of programming functionality that was not standardized at the time the HP 3000 was designed: database management.

When the HP 3000 was introduced in 1972, database technology was something brand new in the computer industry. Most minicomputers were used, in those days, for scientific applications--which tended to revolve around complex calculations, not around complex data structures or files.

When Hewlett-Packard introduced the HP 3000, they did something that (at the time) was unique in the minicomputer marketplace. They bundled a database management system with every HP 3000. Called IMAGE/3000, it gained quick acceptance in the nascent minicomputer marketplace, and won awards for the platform and for the Hewlett-Packard company. More importantly, it marked the HP 3000 as being a minicomputer targeted specifically to the business marketplace, rather than to the scientific community. Many new applications were written for the HP 3000 in the years following its introduction. Virtually all of them were (and are) based on HP's IMAGE/3000 architecture.

IMAGE/3000 was made up of the following two components:
  1. A collection of intrinsics that application programs use to access IMAGE/3000 databases.
  2. A collection of utility programs used to manage IMAGE/3000 databases.
In the 1970s, there were no agreed-upon standards for database management. This meant that IMAGE/ 3000 intrinsics could not be called implicitly. That is, you couldn't access an IMAGE database using an industry standard COBOL verb such as READ or WRITE. If you wanted to take advantage of IMAGE/3000, you had to call the proprietary intrinsics explicitly. This meant that any HP 3000 application that used IMAGE/3000 (and virtually all HP 3000 applications did) was locked into the HP 3000 platform. It couldn't be ported to another platform without some fairly major rework.

This was almost the kiss of death for the HP 3000 in the open-systems-obsessed 1990s. In fact, many platforms did "go under" in the UNIX shakeout that took place in the early part of the decade. Many industry observers expected that Hewlett-Packard would choose to jettison its proprietary HP 3000 platform in favor of its faster growing younger brother, the UNIX-based HP 9000. Fortunately, these observers did not understand a very basic fact about the company.

HP was (and is) very focussed on protecting its customers' investments. Instead of jettisoning the HP 3000 platform, the company chose to invest in it. They removed many of the restrictions that had pushed developers away from it, making it possible to access the HP 3000's features (including its database management system) through new industry standard interfaces, while continuing to support the older proprietary interfaces. In the final months of the 20th century, interest in the IMAGE database management system and sales of the HP 3000 platform are both on the rise.

What's a DBMS, and Why Do We Have Them?

The term database has been used to describe a wide variety of products and technologies. In the PC world, many products for sale that call themselves database management systems in fact are little more than keyed access methods (like KSAM). To understand IMAGE/SQL, we must first understand what makes a database different from a file.

On the HP 3000, data stored in ordinary "flat" files can be accessed in any of three ways:
  1. Sequentially: the application program reads the records one at a time, in the order in which the records are stored in the file.
  2. Directly: the application program can select the record it wants from the file by specifying its relative record number. For example, it can read record number 1,234 without first having to read the 1,233 records that precede it.
  3. Keyed: the application program can select the record that it wants from the file by specifying a key value. For example, it can read the record containing the value "Sam Jones" in a prespecified field.
When the term database was first coined in the 1970s, it was assumed that databases could be accessed just like files--in any of the three ways I just mentioned. So keyed access is not what makes a database unique. To understand what is special about a database, one must understand something about how to maintain applications that were built around files.

At the time the HP 3000 was designed, most application programs were built around a central data repository called a master file. A master file was an ordinary flat file that might be accessed by a number of different programs. A typical application's master file would be accessed by at least three major application programs:
  1. A reporting program that extracts data from the master file in order to generate reports
  2. An update program that modifies existing records and inserts new records into the master file
  3. A maintenance program that deletes old or obsolete records.
One of the biggest expenses involved in managing and maintaining these kinds of applications was maintenance programming. The high cost of maintenance programming stemmed from a fundamental principle of file-based programming, which is:
"When the structure of a file changes, that change must be reflected in every piece of software that touches that file."
In the 1970s, most companies managed their business using just such applications. File-based applications work pretty well in a static environment. But when the rules of doing business change, then these applications must be changed to reflect the business changes. And that's when things begin to break down.

For example, suppose the format of one of the fields in your master file needs to change. This might be forced by a simple change in the business environment--one of your suppliers that had been using 4-digit part numbers might decide to start using 6-digit part numbers.

In the master file, you'd need to change the 4-digit field that's used to hold part numbers to a 6-digit field. That would make the record size of the file 2 bytes longer. And that, in turn, would force you to change (or at the very least recompile) every program that accesses the master file. Even programs that don't touch part numbers would have to change.

In the 1970s, computers were rapidly proliferating throughout large organizations. Applications that were originally designed to be used by one department (or even by one individual) were suddenly being integrated with other applications that were used by other departments and other individuals in other parts of the company. Master files that were originally meant to be accessed by a single program suddenly found themselves being touched by other applications elsewhere in the company.

By the 1970s, many large companies that had jumped into the information age found themselves facing a kind of information gridlock. The term application backlog entered the IT manager's lexicon, as software developers were overwhelmed by the increasingly complex task of software maintenance. By the late 1970s, many corporate IT departments were reporting that they were facing a 7-year backlog of application enhancements.

The problem was not hard to understand. Virtually every change (no matter how trivial) to any application in the company forced programmers to make corresponding changes to other applications (in extreme cases, to every other application). Information Systems visionaries saw that it was becoming impossible to change anything, because changing anything forced you to change everything. Something had to give.

Database Technology Saves the Day

Early database management systems such as IMAGE/3000 brought a very simple and powerful tool to the table. They isolated application programs from the physical structure of the data that they were accessing.

For example, imagine a database containing records made up of the following 10 fields:
  1. Customer Account Number
  2. Customer Account Balance
  3. Company Name
  4. Customer Name
  5. Job Title
  6. Shipping Street Address
  7. City
  8. State
  9. Zip or Postal Code
  10. Country
Furthermore, let's assume that this database is being accessed by a total of 10 different programs. Of these 10 programs, let's assume that only two make use of the account balance, which is a numeric field that contains values in dollars and cents.

If this were a file-based application, any change in any of these fields would force a corresponding change to all ten-application programs. But since this is a database application, we can avoid this headache. Here's how it works.

When a database application accesses a database, it specifies the specific fields that it's interested in. So, for example, suppose one of our application programs only used information in the following fields:
  1. Country
  2. Company Name
  3. City
  4. Shipping Street Address
  5. State
  6. Zip or Postal Code
  7. Customer Account Number
A database application program would contain a definition of the record that it wanted to read--a sort of "virtual record layout." In other words, the record layout that appears in the database application program would be made up of only the fields that it's interested in, and in the order that it wants them to appear. Contrast this with the corresponding file-based application program, which must contain a record layout made up of all the fields in the file, in the order that they are physically stored in the file.

The value of this scheme to the IT manager is a giant reduction in maintenance programming costs. In the above example, suppose that the size of the field customer account balance changed (possibly because our customers were doing so much business with us that they now owed us more money than they did before). If this were a file-based application, all 10 programs would have to be changed, regardless of whether or not they actually used the data in the customer account balance field. But since this is a database application, any program that doesn't reference the customer account balance field can remain unchanged.

IMAGE/3000, TurboIMAGE, and IMAGE/SQL

IMAGE/3000 was Hewlett-Packard's effort to solve the maintenance programming backlog in the 1970s. Using IMAGE/3000, an application programmer could build programs that accessed data in a database, without being tied to (or even aware of) the way the data was physically stored in the database.

The IMAGE/3000 software was bundled with every HP 3000 system sold, making it easy for ISVs to create applications that were based on this early database management platform. It was one of the most attractive features of the early 3000 platform, and many ISVs wrote software to complement it and extend its capabilities.

In the 1980s, HP beefed up its IMAGE/3000 DBMS in preparation for the migration to the PA-RISC architecture. The new, more powerful DBMS was named TurboIMAGE. It made it possible to create databases with larger capacities, and also improved performance.

In the 1990s, HP put its DBMS through another evolutionary change. Up until this time, the only way IMAGE databases could be accessed had been through HP's proprietary intrinsic interface. In a client-server world, this approach was rapidly becoming a problem.

In the early 1980s, IBM developed a language called the "Structured Query Language," or SQL for short. Application programs used this language to manipulate data in a database. This language had originally been developed for use with an IBM database management system (also called SQL). But by the mid-80s it had been adopted by a number of other DBMSs, and quickly evolved into a de facto standard for database access.

Consequently, most new client-server applications that were developed in the 1980s made extensive use of the SQL language. In order to make it possible for these applications to work with the HP 3000, HP literally taught TurboIMAGE a new language--the ANSII standard SQL.

The resulting DBMS was named IMAGE/SQL--which is the name that is used today. IMAGE/SQL databases can be accessed in two ways: either using the traditional proprietary interfaces (thus protecting customers' investments in proprietary software) or using the new industry standard SQL interface (thus enabling standard client-server database tools to access the data stored on HP 3000s).

Today's IMAGE/SQL databases are used by tens of thousands of customers worldwide to house data that continues to be maintained by so-called legacy applications. These older applications are based on languages such as COBOL, and proprietary technologies such as HP's intrinsic interface to IMAGE/SQL databases. This intrinsic interface is very fast, and very reliable.

But--and this is the unique value proposition of the HP 3000--at the same time IMAGE/SQL databases can also be accessed by new client-server applications. These new applications could run on the HP 3000 or on other platforms such as UNIX or NT. They might be based on new programming languages such as Visual Basic or Java. And they might take advantage of new technologies such as the World Wide Web. The key value proposition of the HP 3000 is that you can take advantage of all these new technologies without having to walk away from your investment in the older, proprietary technologies.

The Components of an Image/SQL Database

We will now begin to drill down a bit and look at how an IMAGE/SQL database is created, and at the main components of every IMAGE/SQL database.

IMAGE/SQL databases are made up of the following parts:
  1. The Schema
  2. The Root File
  3. Detail Datasets
  4. Master Datasets
The creation of an IMAGE/SQL database begins with the creation of a schema. A schema is an ordinary ASCII file in which a database designer defines the structure of the database. We will look at the exact format of an IMAGE/SQL schema in next month's article. For now, it's enough to know that a schema contains a description of every field in the database. The schema contains security information, which will be used to control which users or programs are allowed to access which fields. The schema also groups the fields together into files, or as IMAGE/SQL refers to them, datasets.

In IMAGE/SQL terminology, there are four kinds of datasets:
  1. The Root File (one per database)
  2. Detail Datasets (one or more per database)
  3. One or more Master Datasets. There are two kinds of masters
    --Manual Masters and Automatic Masters.
The root file contains the same information as the schema, only it is encoded in a binary format. The schema is an ordinary ASCII file that can be viewed and edited using an ordinary editor. The root file is created from the schema. In a sense, the schema is compiled using a utility program called DBCREATE, thus creating the root file. Once this has been done, the schema is no longer needed--in much the same way that you don't need a program's source code once the program has been compiled. However, just as you need to keep the source code for a program so that you can modify the program at a later date, you will also want to keep your schema in case you need to re-create the database or change its structure at some later time. The root file, on the other hand, is part of the database itself. You cannot access the database without it. The root file cannot be changed or edited except by going through the DBMS, i.e., by recompiling the schema.

Most of the data in an IMAGE database is stored in detail datasets. A detail dataset is structured like an ordinary file--with the data stored in fields. IMAGE/SQL calls them "data items" and organizes them into records that are called "data entries" in IMAGE terminology. Remember, though, that no application program can access the information stored in detail datasets without going through the IMAGE/SQL DBMS. It's the DBMS that manages the physical structure of the detail datasets, provides an additional level of security, and isolates application programs from having to know what that physical structure is.

Some of the items in a detail dataset are designated as search items. A search item is one that can be used to perform keyed access to the data. So, for example, if a detail dataset contains a search item called Name, then an application program can specify that it wants to read the data entry with Name equal to Joe Smith and IMAGE/SQL will find it.

For each search item, IMAGE/SQL will create a corresponding master dataset. Master datasets contain a list of all the valid search items. For each search item, the master dataset entry is used to lead the application program to the corresponding detail dataset entry (or entries).

There are two kinds of master datasets: manual masters and automatic masters. An automatic master dataset is updated automatically when the corresponding detail datasets are updated. For example, suppose a detail dataset contains a search item called Name. When you insert a new entry (i.e., a new record) into the detail dataset with the name John Smith, IMAGE/ SQL will check to see if any John Smith records already exist. If so, then there will already be a John Smith entry in the master file (or files). If there is no John Smith entry, then IMAGE/SQL will automatically create one in the appropriate automatic master file.

A manual master dataset works slightly differently. Suppose you insert a new entry into a detail dataset that contains a search item called Part-Number. Let's assume that the new entry has a part number of 123. If the master dataset associated with Part-Number is an automatic master, then IMAGE will create a new entry for the master dataset if necessary, as we've just seen. But if the master associated with Part-Number is a manual master, then IMAGE/SQL will check to see if there is already an entry in the manual master with a Part-Number of 123. If so, then it will allow you to insert the entry into the detail dataset, and update the master to point to it. But if not, then IMAGE/SQL will not allow you to put the entry into the detail dataset until you have explicitly created a corresponding entry in the manual master.

The two different kinds of masters allow IMAGE to handle two different kinds of keys. Manual masters are appropriate when the master contains an entry for every possible valid key. Most companies have a certain number of valid part numbers, and you wouldn't want to create a new one just because some clerk miskeyed one. Manual masters are perfect for this kind of key.

Automatic masters make sense for more free-form kinds of keys. You wouldn't typically want a table with every possible customer name in it. When a new customer is ready to do business with you, you want to be able to add that customer's name to your database quickly and automatically. Automatic masters are appropriate in this case.

Each entry in a master dataset (whether it's an automatic master or a manual master) contains a pointer to the corresponding data in one or more detail datasets. IMAGE/SQL allows duplicate keys. For example, suppose a master dataset contains an entry in which the Name field contains a value of John Smith. The master will contain a pointer to a John Smith entry in a detail dataset. If the detail dataset contains more than one John Smith entry, each one will in turn contain a pointer to the next one.

A group of entries in a detail dataset that share the same search item (such as a common Name value of John Smith) are referred to in IMAGE/SQL terminology as a chain.

Summary

In this issue, we've learned a few basic facts about IMAGE/SQL, including some terminology that will be important. Next time, we'll see how to put what you've learned here to work, by coding a schema and using it to create a simple IMAGE/SQL database.


George Stachnik works in technical training in HP's Network Server Division.
  ←Part 17  Part 19→
     [3khat]3kRanger   [3khat]3kMail   Updated