|
by George Stachnik
Most COBOL compilers (including HP/COBOL)
support extensions to the ANSII standard COBOL language. These
extensions allow you to do things that are proprietary to specific
platforms. Extensions to ANSII COBOL make it possible to do things
on the HP 3000 that you cannot do on other platforms. For example,
last time we explored ways of calling HP 3000 intrinsics. In
this installment, we're going to begin looking at the HP 3000's
most important proprietary feature--the IMAGE/SQL database management
system.
Extensions are what make languages like HP/COBOL proprietary.
Of course, in this sense, so-called open systems such
as UNIX and NT are proprietary too. All of these platforms support
a number of different COBOL dialects, virtually every one of
which consists of an ANSII standard core surrounded by proprietary
extensions. And COBOL is not unique in this regard. There are
proprietary extensions to virtually all computer languages, including
C, C++, and even the most popular versions of Java.
During the last 10 years or so, the industry has moved toward
an approach to programming that is driven by industry standards.
If you're writing software, it seems like a no-brainer that you
should design your code so that it can be easily ported to other
platforms, should the need ever arise. For this reason, many
programmers seem to have lost interest in proprietary platforms
like the HP 3000, and especially in the extensions that make
them proprietary. After all, if nobody's using them anymore,
why would anybody want to learn about them?
In spite of all the hoopla surrounding the move to open systems,
the fact remains that most applications used in the real world
make extensive use of proprietary, non-standard extensions to
the languages that they were written in. There are a variety
of reasons why. Some are technical, although many are marketing
related.
In the past, platform vendors used proprietary extensions
to differentiate their platforms from those of their competitors.
For example, when I first began working for HP, HP 3000 sales
literature extolled the unique functions of the HP 3000, its
operating system, even its spooler--saying that they were "unique
in the industry."
In today's open systems world, such a claim would be marketing
suicide. But in the 1980s, software vendors that failed to take
advantage of all the features of the platforms
their products ran on (especially the proprietary ones) risked
losing sales to their competitors who did. Customers wanted to
be assured that their independent software vendors (or ISVs)
were taking full advantage of the hardware that they were buying.
But by the 1990s, customers were asking for different things.
Hardware was less expensive. Customers began to realize that
their biggest expenses were no longer the boxes that sat in the
data center: the largest line items in their budgets were now
software-related. Their concerns shifted to getting the most
bang for their software buck. Hence, the idea of designing applications
to be platform portable took off in the 1990s.
HP 3000 Applications in the 1990s
Most HP 3000 applications were written in the 1980s (some
even date back to the '70s). Most of them fully exploit the proprietary
features of the HP 3000. And once again, this is a sword that
cuts in at least two different ways:
- On one hand, MPE/iX does provide strong support for industry
standards--in much the same way that UNIX and NT do. This means
that applications that were created to be compliant with
applicable standards can be ported between HP 3000s, UNIX machines,
and even Microsoft's Windows NT operating system.
- But on the other hand, typical HP 3000 applications predate
today's emphasis on standards. And the HP 3000's proprietary
nature is precisely what makes it so difficult to port typical
HP 3000 applications to other platforms. Keep in mind the fact
that taking advantage of that proprietary functionality (like
calling intrinsics explicitly) ties your application code to
the HP 3000 platform.
MPE/iX's robust and reliable (if proprietary) architecture
is one of the things that make the HP 3000 such an attractive
platform. In this context, proprietary simply means that
the HP 3000 does things that other platforms don't do. Applications
that take advantage of the proprietary features of the HP 3000
can potentially reap great rewards in performance and reliability.
Database Management
At the time the HP 3000 was introduced, file access (opening,
closing, reading, and writing files) was largely standardized.
You could use ANSII standard COBOL to store and access data on
your HP 3000, and as long as the data was stored in ordinary
files, your COBOL code could be ported easily to other ANSII-standard
COBOL platforms.
However, we're now going to turn our attention to an area
of programming functionality that was not standardized
at the time the HP 3000 was designed: database management.
When the HP 3000 was introduced in 1972, database technology
was something brand new in the computer industry. Most minicomputers
were used, in those days, for scientific applications--which
tended to revolve around complex calculations, not around
complex data structures or files.
When Hewlett-Packard introduced the HP 3000, they did something
that (at the time) was unique in the minicomputer marketplace.
They bundled a database management system with every HP 3000.
Called IMAGE/3000, it gained quick acceptance in the nascent
minicomputer marketplace, and won awards for the platform and
for the Hewlett-Packard company. More importantly, it marked
the HP 3000 as being a minicomputer targeted specifically to
the business marketplace, rather than to the scientific community.
Many new applications were written for the HP 3000 in the years
following its introduction. Virtually all of them were (and are)
based on HP's IMAGE/3000 architecture.
IMAGE/3000 was made up of the following two components:
- A collection of intrinsics that application programs use
to access IMAGE/3000 databases.
- A collection of utility programs used to manage IMAGE/3000
databases.
In the 1970s, there were no agreed-upon standards for database
management. This meant that IMAGE/ 3000 intrinsics could not
be called implicitly. That is, you couldn't access an IMAGE database
using an industry standard COBOL verb such as READ or WRITE.
If you wanted to take advantage of IMAGE/3000, you had to call
the proprietary intrinsics explicitly. This meant that any HP
3000 application that used IMAGE/3000 (and virtually all HP 3000
applications did) was locked into the HP 3000 platform. It couldn't
be ported to another platform without some fairly major rework.
This was almost the kiss of death for the HP 3000 in the open-systems-obsessed
1990s. In fact, many platforms did "go under" in the
UNIX shakeout that took place in the early part of the decade.
Many industry observers expected that Hewlett-Packard would choose
to jettison its proprietary HP 3000 platform in favor of its
faster growing younger brother, the UNIX-based HP 9000. Fortunately,
these observers did not understand a very basic fact about the
company.
HP was (and is) very focussed on protecting its customers'
investments. Instead of jettisoning the HP 3000 platform, the
company chose to invest in it. They removed many of the restrictions
that had pushed developers away from it, making it possible to
access the HP 3000's features (including its database management
system) through new industry standard interfaces, while continuing
to support the older proprietary interfaces. In the final months
of the 20th century, interest in the IMAGE database management
system and sales of the HP 3000 platform are both on the rise.
What's a DBMS, and Why Do We Have Them?
The term database has been used to describe a wide
variety of products and technologies. In the PC world, many products
for sale that call themselves database management systems in
fact are little more than keyed access methods (like KSAM). To
understand IMAGE/SQL, we must first understand what makes a database
different from a file.
On the HP 3000, data stored in ordinary "flat" files
can be accessed in any of three ways:
- Sequentially: the application program reads the records
one at a time, in the order in which the records are stored in
the file.
- Directly: the application program can select the record
it wants from the file by specifying its relative record number.
For example, it can read record number 1,234 without first having
to read the 1,233 records that precede it.
- Keyed: the application program can select the record
that it wants from the file by specifying a key value. For example,
it can read the record containing the value "Sam Jones"
in a prespecified field.
When the term database was first coined in the 1970s,
it was assumed that databases could be accessed just like files--in
any of the three ways I just mentioned. So keyed access is not
what makes a database unique. To understand what is special about
a database, one must understand something about how to maintain
applications that were built around files.
At the time the HP 3000 was designed, most application programs
were built around a central data repository called a master
file. A master file was an ordinary flat file that might
be accessed by a number of different programs. A typical application's
master file would be accessed by at least three major application
programs:
- A reporting program that extracts data from the master file
in order to generate reports
- An update program that modifies existing records and inserts
new records into the master file
- A maintenance program that deletes old or obsolete records.
One of the biggest expenses involved in managing and maintaining
these kinds of applications was maintenance programming. The
high cost of maintenance programming stemmed from a fundamental
principle of file-based programming, which is:
"When the structure of a file changes, that
change must be reflected in every piece of software
that touches that file."
In the 1970s, most companies managed their business using
just such applications. File-based applications work pretty well
in a static environment. But when the rules of doing business
change, then these applications must be changed to reflect the
business changes. And that's when things begin to break down.
For example, suppose the format of one of the fields in your
master file needs to change. This might be forced by a simple
change in the business environment--one of your suppliers that
had been using 4-digit part numbers might decide to start using
6-digit part numbers.
In the master file, you'd need to change the 4-digit field
that's used to hold part numbers to a 6-digit field. That would
make the record size of the file 2 bytes longer. And that, in
turn, would force you to change (or at the very least recompile)
every program that accesses the master file. Even programs that
don't touch part numbers would have to change.
In the 1970s, computers were rapidly proliferating throughout
large organizations. Applications that were originally designed
to be used by one department (or even by one individual) were
suddenly being integrated with other applications that were used
by other departments and other individuals in other parts of
the company. Master files that were originally meant to be accessed
by a single program suddenly found themselves being touched by
other applications elsewhere in the company.
By the 1970s, many large companies that had jumped into the
information age found themselves facing a kind of information
gridlock. The term application backlog entered the IT
manager's lexicon, as software developers were overwhelmed by
the increasingly complex task of software maintenance. By the
late 1970s, many corporate IT departments were reporting that
they were facing a 7-year backlog of application
enhancements.
The problem was not hard to understand. Virtually every change
(no matter how trivial) to any application in the company forced
programmers to make corresponding changes to other applications
(in extreme cases, to every other application). Information
Systems visionaries saw that it was becoming impossible to change
anything, because changing anything forced you to change
everything. Something had to give.
Database Technology Saves the Day
Early database management systems such as IMAGE/3000 brought
a very simple and powerful tool to the table. They isolated
application programs from the physical structure of the data
that they were accessing.
For example, imagine a database containing records made up
of the following 10 fields:
- Customer Account Number
- Customer Account Balance
- Company Name
- Customer Name
- Job Title
- Shipping Street Address
- City
- State
- Zip or Postal Code
- Country
Furthermore, let's assume that this database is being accessed
by a total of 10 different programs. Of these 10 programs, let's
assume that only two make use of the account balance, which is
a numeric field that contains values in dollars and cents.
If this were a file-based application, any change in any
of these fields would force a corresponding change to all ten-application
programs. But since this is a database application, we can avoid
this headache. Here's how it works.
When a database application accesses a database, it specifies
the specific fields that it's interested in. So, for example,
suppose one of our application programs only used information
in the following fields:
- Country
- Company Name
- City
- Shipping Street Address
- State
- Zip or Postal Code
- Customer Account Number
A database application program would contain a definition
of the record that it wanted to read--a sort of "virtual
record layout." In other words, the record layout that appears
in the database application program would be made up of only
the fields that it's interested in, and in the order that it
wants them to appear. Contrast this with the corresponding file-based
application program, which must contain a record layout made
up of all the fields in the file, in the order that they
are physically stored in the file.
The value of this scheme to the IT manager is a giant reduction
in maintenance programming costs. In the above example, suppose
that the size of the field customer account balance changed
(possibly because our customers were doing so much business with
us that they now owed us more money than they did before). If
this were a file-based application, all 10 programs would have
to be changed, regardless of whether or not they actually used
the data in the customer account balance field. But since this
is a database application, any program that doesn't reference
the customer account balance field can remain unchanged.
IMAGE/3000, TurboIMAGE, and IMAGE/SQL
IMAGE/3000 was Hewlett-Packard's effort to solve the maintenance
programming backlog in the 1970s. Using IMAGE/3000, an application
programmer could build programs that accessed data in a database,
without being tied to (or even aware of) the way the data was
physically stored in the database.
The IMAGE/3000 software was bundled with every HP 3000 system
sold, making it easy for ISVs to create applications that were
based on this early database management platform. It was one
of the most attractive features of the early 3000 platform, and
many ISVs wrote software to complement it and extend its capabilities.
In the 1980s, HP beefed up its IMAGE/3000 DBMS in preparation
for the migration to the PA-RISC architecture. The new, more
powerful DBMS was named TurboIMAGE. It made it possible to create
databases with larger capacities, and also improved performance.
In the 1990s, HP put its DBMS through another evolutionary
change. Up until this time, the only way IMAGE databases could
be accessed had been through HP's proprietary intrinsic interface.
In a client-server world, this approach was rapidly becoming
a problem.
In the early 1980s, IBM developed a language called the "Structured
Query Language," or SQL for short. Application programs
used this language to manipulate data in a database. This language
had originally been developed for use with an IBM database management
system (also called SQL). But by the mid-80s it had been adopted
by a number of other DBMSs, and quickly evolved into a de facto
standard for database access.
Consequently, most new client-server applications that were
developed in the 1980s made extensive use of the SQL language.
In order to make it possible for these applications to work with
the HP 3000, HP literally taught TurboIMAGE a new language--the
ANSII standard SQL.
The resulting DBMS was named IMAGE/SQL--which is the name
that is used today. IMAGE/SQL databases can be accessed in two
ways: either using the traditional proprietary interfaces (thus
protecting customers' investments in proprietary software) or
using the new industry standard SQL interface (thus enabling
standard client-server database tools to access the data stored
on HP 3000s).
Today's IMAGE/SQL databases are used by tens of thousands
of customers worldwide to house data that continues to be maintained
by so-called legacy applications. These older applications are
based on languages such as COBOL, and proprietary technologies
such as HP's intrinsic interface to IMAGE/SQL databases. This
intrinsic interface is very fast, and very reliable.
But--and this is the unique value proposition of the HP 3000--at
the same time IMAGE/SQL databases can also be accessed by new
client-server applications. These new applications could run
on the HP 3000 or on other platforms such as UNIX or NT. They
might be based on new programming languages such as Visual Basic
or Java. And they might take advantage of new technologies such
as the World Wide Web. The key value proposition of the HP 3000
is that you can take advantage of all these new technologies
without having to walk away from your investment in the older,
proprietary technologies.
The Components of an Image/SQL Database
We will now begin to drill down a bit and look at how an IMAGE/SQL
database is created, and at the main components of every IMAGE/SQL
database.
IMAGE/SQL databases are made up of the following parts:
- The Schema
- The Root File
- Detail Datasets
- Master Datasets
The creation of an IMAGE/SQL database begins with the creation
of a schema. A schema is an ordinary ASCII file in which
a database designer defines the structure of the database. We
will look at the exact format of an IMAGE/SQL schema in next
month's article. For now, it's enough to know that a schema contains
a description of every field in the database. The schema contains
security information, which will be used to control which users
or programs are allowed to access which fields. The schema also
groups the fields together into files, or as IMAGE/SQL refers
to them, datasets.
In IMAGE/SQL terminology, there are four kinds of datasets:
- The Root File (one per database)
- Detail Datasets (one or more per database)
- One or more Master Datasets. There are two kinds of masters
--Manual Masters and Automatic Masters.
The root file contains the same information as the
schema, only it is encoded in a binary format. The schema is
an ordinary ASCII file that can be viewed and edited using an
ordinary editor. The root file is created from the schema. In
a sense, the schema is compiled using a utility program called
DBCREATE, thus creating the root file. Once this has been done,
the schema is no longer needed--in much the same way that you
don't need a program's source code once the program has been
compiled. However, just as you need to keep the source code for
a program so that you can modify the program at a later date,
you will also want to keep your schema in case you need to re-create
the database or change its structure at some later time. The
root file, on the other hand, is part of the database itself.
You cannot access the database without it. The root file cannot
be changed or edited except by going through the DBMS, i.e.,
by recompiling the schema.
Most of the data in an IMAGE database is stored in detail
datasets. A detail dataset is structured like an ordinary
file--with the data stored in fields. IMAGE/SQL calls them "data
items" and organizes them into records that are called "data
entries" in IMAGE terminology. Remember, though, that no
application program can access the information stored in detail
datasets without going through the IMAGE/SQL DBMS. It's the DBMS
that manages the physical structure of the detail datasets, provides
an additional level of security, and isolates application programs
from having to know what that physical structure is.
Some of the items in a detail dataset are designated as search
items. A search item is one that can be used to perform keyed
access to the data. So, for example, if a detail dataset contains
a search item called Name, then an application program
can specify that it wants to read the data entry with Name
equal to Joe Smith and IMAGE/SQL will find it.
For each search item, IMAGE/SQL will create a corresponding
master dataset. Master datasets contain a list of all the valid
search items. For each search item, the master dataset entry
is used to lead the application program to the corresponding
detail dataset entry (or entries).
There are two kinds of master datasets: manual masters and
automatic masters. An automatic master dataset is updated automatically
when the corresponding detail datasets are updated. For example,
suppose a detail dataset contains a search item called Name.
When you insert a new entry (i.e., a new record) into the detail
dataset with the name John Smith, IMAGE/ SQL will check
to see if any John Smith records already exist. If so,
then there will already be a John Smith entry in the master
file (or files). If there is no John Smith entry, then
IMAGE/SQL will automatically create one in the appropriate automatic
master file.
A manual master dataset works slightly differently.
Suppose you insert a new entry into a detail dataset that contains
a search item called Part-Number. Let's assume that the
new entry has a part number of 123. If the master dataset
associated with Part-Number is an automatic master, then
IMAGE will create a new entry for the master dataset if necessary,
as we've just seen. But if the master associated with Part-Number
is a manual master, then IMAGE/SQL will check to see if
there is already an entry in the manual master with a Part-Number
of 123. If so, then it will allow you to insert the entry
into the detail dataset, and update the master to point to it.
But if not, then IMAGE/SQL will not allow you to put the entry
into the detail dataset until you have explicitly created a corresponding
entry in the manual master.
The two different kinds of masters allow IMAGE to handle two
different kinds of keys. Manual masters are appropriate when
the master contains an entry for every possible valid key. Most
companies have a certain number of valid part numbers, and you
wouldn't want to create a new one just because some clerk miskeyed
one. Manual masters are perfect for this kind of key.
Automatic masters make sense for more free-form kinds of keys.
You wouldn't typically want a table with every possible customer
name in it. When a new customer is ready to do business with
you, you want to be able to add that customer's name to your
database quickly and automatically. Automatic masters are appropriate
in this case.
Each entry in a master dataset (whether it's an automatic
master or a manual master) contains a pointer to the corresponding
data in one or more detail datasets. IMAGE/SQL allows duplicate
keys. For example, suppose a master dataset contains an entry
in which the Name field contains a value of John Smith.
The master will contain a pointer to a John Smith entry in a
detail dataset. If the detail dataset contains more than one
John Smith entry, each one will in turn contain a pointer to
the next one.
A group of entries in a detail dataset that share the same
search item (such as a common Name value of John Smith)
are referred to in IMAGE/SQL terminology as a chain.
Summary
In this issue, we've learned a few basic facts about IMAGE/SQL,
including some terminology that will be important. Next time,
we'll see how to put what you've learned here to work, by coding
a schema and using it to create a simple IMAGE/SQL database.
George Stachnik works in technical training in HP's Network Server Division.
|