←Part 20  Part 22→

The HP 3000--For Complete Novices
Part 21: creating a database Root File

Feature by George Stachnik

Welcome to part 21 of this series of beginners' articles on the HP 3000. We have been focussing on various ways that data can be stored on HP 3000 systems. Most recently, we've been studying the IMAGE/SQL database management system that comes bundled with every HP 3000 system.

In the previous two articles in this series, we learned that database designers use a schema to define the structure of an IMAGE/SQL database, and last time we showed you how to create a schema. We've said that a schema is a sort of blueprint for an IMAGE/SQL database. That is, the schema contains a description of the various components (data items, data entries, and data sets) that will make up your database. In this installment we'll begin to show how a schema is used to create an IMAGE/SQL database.

There are actually three steps to this process:
  1. Using the schema to create a root file.
  2. Using the root file to create an IMAGE/SQL database.
  3. Using a program such as QUERY to open the database and write data to it.
We will concentrate here on the first step.

Before we launch into the details, keep in mind that everything we discuss in this article is covered in the very fine IMAGE/SQL manuals available from HP. A good place to begin learning the finer points of IMAGE/SQL is with Getting Started with HP IMAGE/SQL. The latest version is available from HP (Customer Order Number 36385-90008). You should also take a look at the TurboIMAGE/XL Database Management System Reference Manual (Customer Order Number 30391-90001).

DBSCHEMA and the ROOT File

We have shown you how to create a schema. Now we'll use that schema to create a root file for your database. A root file is a binary file that contains the database design information from the schema, encoded in a binary format. In other words, the root file contains the same information that's in your schema. But unlike the schema, the root file cannot be accessed using a text editor--it can only be accessed using the IMAGE/SQL DBMS. An example will help make this clear.

In Figure 1, we see the results of using the :LISTFILE command to display information about a file containing a schema. In this example, the schema is made up of 48 lines of ASCII text (with 80 bytes in each record). If we were to use the :PRINT command, we could display the contents of this file and see the schema. It would look very much like the schema that we showed you in the last article.

Figure 1: LISTFILE of Schema File
:LISTFILE,2

ACCOUNT=  STACHNIK    GROUP=  PUB   
  

FILENAME  CODE  ---------LOGICAL RECORD----------   ----SPACE----
                SIZE  TYP      EOF      LIMIT R/B    SECTORS  #X MX

SCHEMA           80B  FA        48         48   3        16   1  2
Figure 2 shows you how you'd go about creating a root file using this schema. The command at the top of the figure is telling the HP 3000 to execute a program called "DBSCHEMA.PUB.SYS."

This program expects to read a schema as input, via $STDIN. We discussed $STDIN in an earlier article, but let's review what it is very briefly. When you log on to an HP 3000, the name $STDIN is associated with your terminal keyboard. If you run a program that opens $STDIN, the program will expect you to type data records at your terminal when you run it. For example, suppose you enter the following command:
  :RUN DBSCHEMA.PUB.SYS
The DBSCHEMA program will display a banner on your screen, and then prompt you to type the schema--line by line. If you make even one mistake, there is no way to correct it except by breaking out of the program and starting all over again.

While you could create a database this way (assuming you were a very good typist who never made any mistakes), I think we'd all agree that it would be a lot easier if we could get DBSCHEMA to read the schema not from your terminal, but from a file (in this case, the file we saw in Figure 1, which we very creatively named "SCHEMA").

This is precisely what the command shown at the top of Figure 2 does:
  :RUN DBSCHEMA.PUB.SYS;STDIN=SCHEMA
This command tells the HP 3000 to run the DBSCHMA program, but instead of reading its $STDIN input from your terminal (as it normally would), it will read it from a file named "SCHEMA" that resides in the current logon group and account.

There's nothing special about the name "SCHEMA." A file containing a schema can be assigned any name you like. To get DBSCHEMA to read it, just substitute your schema's filename for the name "SCHEMA" that we used in the example in Figure 2.

Output of DBSCHEMA

Just as programs can read data from your terminal using the name $STDIN, they can write data to your terminal using the name $STDLIST. When you run DBSCHEMA, it displays a series of messages on $STDLIST to tell you what it did, and whether or not anything went wrong. These are also shown in Figure 2.

The first message (at the top of the figure) is always the same: a banner beginning with the string "PAGE 1" and identifying the version of DBSCHEMA you're running. (This is good information to have should you ever need to call HP support about a problem with DBSCHEMA--the version is one of the first things they'll ask for.) Immediately below that, DBSCHEMA displays the schema that it is reading from $STDIN.

One way to think of DBSCHEMA is as a kind of compiler. Just as a COBOL compiler uses COBOL source code in an ASCII file and creates a binary object file from it, DBSCHEMA uses the code in your ASCII schema file and creates a binary file called a root file. And like a compiler, DBSCHEMA checks your schema for errors before writing its binary output.

For example, as we discussed in the previous two articles in this series, the ITEMS section of a schema should contain a description of each data item that you want in your database. The SETS section groups these items into datasets. If the SETS section of your schema contains a reference to an item that isn't defined in the ITEMS section, then DBSCHEMA will flag that as an error. If, on the other hand, no errors are found, DBSCHEMA will create a root file for you.

Using information contained in the schema, DBSCHEMA determines the structure of your database. This structural information will be displayed on $STDLIST immediately after the last line of the schema. In the example shown in Figure 2, the structural information appears after the END statement. It tells us that our database will have three datasets, called CUSTOMER, DATE-MASTER, and SALES. It tells us the TYPE of each dataset. For example, CUSTOMER has an "M" under the TYPE heading, telling us that this is a MANUAL MASTER. The "A" that appears next to DATE-MASTER tells us that this dataset is an AUTOMATIC MASTER and the "D" next to SALES tells us that this dataset is a detail dataset. If you check this information against the specifications in the schema, you'll see that DBSCHEMA is simply echoing the information we gave it in the schema.

The next column of this table is labelled "FLD CNT" and it is simply a counter showing the number of data items in each record in the dataset. So, for example, each data entry (or record) in the SALES dataset contains eight data items (or fields).

That brings us to the column labelled "PT CT" or "pointer count." This column tells us how many pointers there are in each record. In this context, a "pointer" is a data item that IMAGE/SQL creates for you automatically. Each pointer contains the address of a record in another dataset.

If you review the structure of this database from the schema in Figure 2, you'll recall that you can search for records in the SALES detail dataset using either of two search items: ACCOUNT and DATE. The indices for these two search items are in the master datasets (CUSTOMER and DATE-MASTER, respectively). This means along with the eight user data items shown under FLD CNT, each record in the SALES dataset will contain two pointers--one to a corresponding record in the CUSTOMER dataset and one to a record in DATE-MASTER.

The next column (labelled ENTR LGTH) tells the length (in 16-bit words) of each data entry (or record). Let's see how these numbers were calculated. Looking back at the sets section of the schema, we see that DATE-MASTER has only one data-item: DATE. The items section of the schema tells us that DATE is an "X6" (or 6-byte ASCII) item. Therefore each data entry of DATE-MASTER will be 6 bytes long.

On the HP 3000, storage is often measured not in 8-bit bytes, but in 16-bit words--two bytes to a word. This means that each data entry of DATE-MASTER is three words long, which is exactly what DBSCHEMA tells us under the heading ENTR LGTH. To make sure you understand this, figure out for yourself what the entry length should be for the CUSTOMER dataset by adding up the lengths of the CUSTOMER data items defined in the schema. (Don't forget to divide your result by 2 to convert bytes to words). Then check your results against those shown in the ENTR LGTH column in Figure 2.

The next column is labelled MED REC, which stands for "media record." We've seen that in addition to your data, the records in each dataset also contain pointers to the other datasets. The ENTR LGTH column told us how long each data entry would be, but it only reflects the data items that you defined in the schema. The MED REC column includes not only your data items, but also the pointers. In a sense, the media record is the "real" record length for each dataset.

This brings us to the MAXIMUM CAPACITY column. This tells us how many media records each dataset will hold. Most of the time, this number will be the same as the number you specified in the schema.

For example, the SETS section of the schema shown in Figure 2 contains the statement:
  CAPACITY:..........201;
By coding this statement, we specified that the CUSTOMER dataset was to be created large enough to hold 201 records. This is exactly the number that's displayed under MAXIMUM CAPACITY for the customer dataset.

Most of the time, IMAGE/SQL will create your datasets with exactly the number of records that you specify in the SCHEMA. However, there are some cases where it will find that it can make more efficient use of disk space by rounding this number up. For example, in the SCHEMA, we specified that SALES should have a capacity of 1231 data items, but the MAXIMUM CAPACITY column shows that IMAGE/SQL has determined that it can fit 1245 records in the same amount of disk space, so it has rounded it up for you.

While we're on the subject of capacities, you may have noticed that the capacities that we've specified in our schema have been a little strange looking. None of them are nice round numbers like 100 records or 1000 records. The capacities shown in Figure 2 are 201, 211, and 1231. These numbers were not selected totally at random--they are, in fact, all prime numbers. (If you remember back to your high school algebra class, you'll recall that this means that these numbers cannot be factored. So, for example, there are no two numbers that can be multiplied together to get 201--aside from 201 times 1.) It turns out that in some cases, IMAGE/SQL performs better when the master dataset capacities are chosen to be prime numbers. The reasons for this are beyond the scope of this article, but we will discuss it in more detail later on.

The remaining columns (BLK FAC, BLK LGTH, and DISC SPACE) describe the exact amount of disk space that will be allocated by IMAGE/SQL for each dataset. The total disk allocation for the entire database is shown next to the heading "TOTAL DISC SECTORS INCLUDING ROOT."

There are two caveats that must be given at this point. First of all, the example in Figure 2 shows a total of 480 disk sectors being allocated for the database. Remember that this is a theoretical total for the entire database. DBSCHEMA isn't going to create the database for you--all it's going to create is the root file. (That's why the figures don't add up--if you sum up the DISC SPACE requirements for the three datasets, they add up to 464 sectors. The remaining 16 sectors will be occupied by the root, for a total of 480 sectors.)

The second caveat is that these figures were accurate for the old 16-bit classic HP 3000 systems. In those days, disk space was extremely expensive, and so every effort was made to allocate it sparingly. It was doled out in 256-byte sectors, and DBSCHEMA (which dates back to classic systems) reflects that policy.

The PA-RISC models of the HP 3000 that are widely used today allocate storage quite differently from their 16-bit ancestors. Today, performance is much more of a critical issue than capacity. For this reason, MPE/iX allocates disk storage in 4K pages. These are larger, but they can be accessed far more efficiently than 256-byte sectors. Unfortunately, this means that there will often be cases in which the HP 3000 actually allocates more storage than you ask for in your schema. This is especially true for very small databases like the one shown in these examples.

Creating a ROOT file

Take a look at the bottom of Figure 2. DBSCHEMA has displayed the message:
  NUMBER OF ERROR MESSAGES: 0
Since DBSCHEMA found no errors, it will create a root file for your database. It will also display a variety of messages indicating the number of different data items that will reside in your database (13 in this example), and the number of datasets (3 in this example).

The last thing that DBSCHEMA displays on $STDLIST is some information about the root file, and finally the message:
  ROOT FILE ORDERS CREATED.
DBSCHEMA has assigned the filename "ORDERS" to the root file. If you look at the schema, it's not hard to understand where DBSCHEMA got this name. The schema itself contains the following line:
  BEGIN DATA BASE ORDERS; << CUSTOMER ORDERS >>
When DBSCHEMA creates a root file, it will use a filename that is the same as the database name that you specified in the schema. Figure 3 shows the output of a :LISTFILE command that we typed immediately after running DBSCHEMA. It shows that we now have two files in our logon group:
  1. SCHEMA, the ASCII file that contains the schema for our database
  2. ORDERS, the root file that was created by DBSCHEMA.

Figure 3: ORDERS File
:LISTFILE,2

ACCOUNT=  STACHNIK    GROUP=  PUB    

FILENAME  CODE     -------LOGICAL RECORD--------  ---SPACE---
                  SIZE  TYP        EOF     LIMIT R/B   SECTORS #X MX

ORDERS    PRIV     128W  FB          8         8   1       16  1  1
SCHEMA             80B  FA          48        48   3       16  1  2

:PRINT ORDERS
       ^
PRIVILEGED FILE VIOLATION  (FSERR 45)

The PRINT command failed. (CIERR 9080)
It's important to keep in mind that ORDERS is not a database. All we've created so far is the root file for the ORDERS database. The root file will never contain any user data. It only contains information about the structure of the database--information that it calculated from the information we coded in the schema.

Using the :LISTFILE command output shown in Figure 3, you can learn some interesting things about the ORDERS file in particular, and about root files in general.
  1. First of all, root files are fixed-length binary files. You can tell because the string FB appears under the TYP heading in Figure 3.
  2. Second, although the root file contains the same information as the schema, it is compressed and formatted differently. The schema contains 48 records. The root file contains only 8. The schema is made up of records that are 80 bytes long. The root file's records are 128 words (256 bytes) long.
  3. But perhaps the most important thing that the :LISTFILE command tells us is that this is a "PRIV" file. The character string "PRIV" appears under the heading "CODE." A filecode of "PRIV" tells us that this file cannot be accessed through normal file system commands or intrinsics.
For example, up until now, if we wanted to display the contents of an ASCII file on the HP 3000, we could simply use the PRINT command. But look at what happened when we tried to use the PRINT command to display the contents of the root file. As you can see in Figure 3, the :PRINT command failed with a "PRIVILEGED FILE VIOLATION". This is because IMAGE/SQL maintains its own security structure on top of the MPE security structure. If you want to look at the contents of an IMAGE database, you must go through IMAGE/SQL and its own internal security rules.

As you'll see, every part of an IMAGE/SQL database is stored in its own PRIV file. The root file, (which is all we've created so far) is a PRIV file, and thus cannot be accessed without the use of the IMAGE/SQL DBMS. Next time we're going to create the rest of the pieces of the ORDERS database--the CUSTOMER, DATE-MASTER, and SALES datasets, and as you will see, each of these will also be stored in PRIV files.


George Stachnik works in technical training in HP's Network Server Division.
  ←Part 20  Part 22→
     [3khat]3kRanger   [3khat]3kMail   Updated