⇓ ⇐

The HP 3000--For Complete Novices

Part 14: File Equations and Sizes

by George Stachnik

In the last few articles in this series, we have focused on how to create programs and build files. In particular, last month's article focused on using the :BUILD command to create files with specified characteristics. We saw how to use the ;REC= parameter of the :BUILD command to specify:

The record size (the number of bytes or words per record)

The blocking factor (the number of records per block--on PA-RISC systems, this number is ignored. The only reason it can still be specified is to maintain compatibility with the older "classic" models of the HP 3000)

The record format (fixed-length records or variable-length records)

The file type (ASCII data or BINARY data)

This month, we're going to turn our attention to more advanced characteristics of MPE files. We'll begin by looking at file equations. We'll also see how to determine the size of a file and how to specify the capacity of a file. We'll wind up this month's article with a brief discussion of why file I/O on MPE/iX is so fast.
FILE Equations
Suppose you were writing a COBOL program to read data from an input file. Let's assume that when this program is placed into production, its input file is called INFILE. In COBOL, you could code the filename right into the file definition. When you run such a program on an HP 3000, it will look for a file called INFILE and attempt to read data from it.
Of course, Murphy's Law dictates that as soon as you have a program that is "locked into" a particular filename, a need will arise to have it read a file with a different name. For this reason, most commercial operating systems provide a way of assigning a temporary alias to a file. Perhaps the best example is the granddaddy of all commercial operating systems: IBM's MVS operating system. Most mainframe applications refer to files not by their filenames, but by temporary aliases called DDNAMES. On IBM mainframes, DDNAMES are assigned using DD statements in a job control language called (fittingly enough) JCL. JCL is an old (and cryptic) language, but the concept of DDNAMES is a good one. It allows mainframe application programmers a degree of flexibility.
The HP 3000 provides a similar capability. MPE allows you to assign temporary aliases called "formal file designators" using :FILE commands. Figure 1 shows an example of the simplest form of file command:
    :FILE INFILE=MYFILE
Such a command is often referred to as a "file equation" because of the equal sign ("=") in the middle of the command. In this example, the filename MYFILE has been assigned a formal file designator of INFILE. The formal file designator can be used as a temporary alias for the filename. So MYFILE can now be referenced by two names: MYFILE and INFILE.
Figure 1 shows an example of how a file equation might be used in conjunction with a program that has been coded with the filename INFILE. On the left side of Figure 1, we see the application program as it normally works--reading its input from INFILE. But the right side of the figure shows what happens when we issue the file equation before running the program. In essence, this FILE command tells MPE that any program that tries to read a file called INFILE should be redirected to a different file (in this case, MYFILE).
The scope of a file equation is limited to the session in which the file command is issued. In other words, file equations are in effect only until you log off. For example, suppose you issued the file equation shown on the right side of Figure 1, and then immediately ran a program designed to read data from INFILE. The program would be redirected to MYFILE, just as the figure shows. You could run the program repeatedly during your session, and it would be redirected to MYFILE every time.
But as soon as you log off the system, any file equations that you've issued will die with your session. The next time you log on to your HP 3000, things will be back to normal (as shown on the right side of Figure 1). If you run the program again, it will revert to looking for an input file called INFILE.
Furthermore, the impact of file equations is limited to your session only. It is not global--it doesn't affect other users who may be logged on at the same time. So if you issue a file equation that assigns the formal file designator INFILE to the filename MYFILE, while another user runs the program, the program will behave normally for them, but it will be redirected when you run it.
If all you could do with the :FILE command was to assign formal file designators, it would be one of the most valuable and widely used commands on the system. But in a few moments we're going to see that assigning formal file designators is just the tip of the iceberg. The :FILE command gives MPE much of its flexibility, and we'll be returning to it again and again, not only in this article, but in future parts of this series as well.

Figure 1: File Equation

How Big Is My File?
When you create a file for use in an application, one of the first things you need to ask yourself is, "Exactly how much data can this file hold?"
On many other operating systems (UNIX, NT) this question isn't relevant. As you write data to a file, it will expand automatically, taking up as much disk space as it needs. If you write enough data to a file, it will continue to grow until your system runs out of disk space. This may or may not be what you had in mind. Under some circumstances, running out of disk space can have drastic consequences (including crashing the system). To prevent this from happening, most operating systems provide ways of capping the total amount of disk space that can be allocated.
Like UNIX and NT, MPE files grow automatically as you add data to them. Also, MPE allows you to establish disk quotas that prevent specified groups of files from growing until they soak up all the disk space on your system. But unlike many other operating systems, MPE also lets you place a limit on the size of each individual file. By default, MPE files are quite small. They can hold no more than 1,023 records. This is illustrated in Figure 2.
The :BUILD command shown in the figure creates a file named MYFILE. We have specified the size of each record (80 bytes), as we learned to do in last month's article. The :LISTFILE command shown at the bottom of the figure confirms that this file contains 80-byte records (under the heading "SIZE"), and that the file currently contains no records (under the heading "EOF," which stands for "End of File"). The maximum number of records that this file can hold is displayed under the heading "LIMIT" and in the figure this value is the default: 1,023.

Figure 2: Creating a File Using the BUILD Command

Writing Data to a File
Many applications need to be able to store considerably more than 1,023 records in their files. For this reason, MPE makes it easy to override the default number of records that a file can hold. Figure 3 shows another example of the :BUILD command. This time we have added the keyword
    DISC=10000
This keyword specifies how many records the file will (at most) be able to hold. The :LISTFILE command at the bottom of Figure 3 confirms that the file has a capacity of 10,000 records (under the heading LIMIT).

Figure 3: Using the DISC Keyword

Many HP 3000 users have wondered why Hewlett-Packard chose to spell the word DISC with the letter C, instead of the more common spelling ending in K. This HP convention goes back to the 1970s. The word DISC appeared in all of HP's manuals as well as in the software itself until the company reversed itself in the 1980s. All HP 3000 documentation was changed to use the standard DISK spelling. Unfortunately, MPE couldn't follow suit without forcing HP's customers to modify virtually every piece of HP 3000 software. So today's HP 3000 software continues to use the incorrectly spelled DISC= keyword.
Now that we've seen how to specify the capacity of a file, let's actually load a file with data. There are a variety of ways to write data to a file. In most cases, you'll use application programs (probably written in COBOL, although Java, C, and other languages are beginning to "catch on" in the HP 3000 world, just as they have on UNIX and NT). For the purposes of this article, however, we'll use the FCOPY utility that was introduced in last month's article. Figure 4 shows FCOPY being used to write records to a file, just as a COBOL application would do. In this example, each record is made up of a line of text from the keyboard.
In Figure 4, we begin by creating a file, once again called MYFILE. As before, we have specified that the records will be 80 bytes long. This time, however, we've made the file's capacity artificially small. MYFILE will now hold no more than 5 records (of 80 bytes each).

Figure 4: Writing to a File Using FCOPY

The FCOPY command shown in Figure 4 copies data from the keyboard and writes it to MYFILE. Notice that the parameter "FROM=;" has no filename specified. This forces FCOPY to fall back upon its default source file, $STDIN. When a user logs onto an HP 3000, MPE associates the name $STDIN with the session's standard input device (typically the terminal or PC keyboard). So the net effect of specifying "FROM=;" (without a filename) is to tell FCOPY to read data directly from your keyboard (just as if it were reading from a file).
If you actually try this at your HP 3000, FCOPY will display a banner line like the one shown in Figure 4. (The banner begins with "HP31900A.") After the banner is displayed, things will seem to simply stop. This is because FCOPY has issued an operating system intrinsic called FREAD, which tells MPE to read a record of data from the input file. Since that input file is your terminal, FCOPY is sitting there waiting for you to type something. Whatever you type next will be read by FCOPY as if it were a record of data in a file.
In Figure 4, we have typed the character string "one." FCOPY will dutifully accept this string and write it to its output file (MYFILE). Remember that MYFILE is made up of 80-byte fixed-length ASCII records. This means that the record that's actually written to MYFILE will be made up of the three characters you typed ("one") followed by 77 ASCII blanks. FCOPY will then post another read against your terminal and wait for you to type a second record.
If FCOPY were reading a "real" file, that is, a file stored on a disk, it would continue this way until it reached the end of the input file. But since FCOPY is reading from a terminal keyboard, the term "end of file" doesn't really have any meaning (at least not as long as you keep typing). We need some way of fooling FCOPY into thinking that it has come to the end of the "file" that it's reading.
In Figure 4, we have typed the string ":eod." Notice that the first character is a colon (":"). This string tells the system that it has reached the end of the file that we've been typing. (Note that this string is not treated as a line of data--that is, the string ":eod" is not actually sent to FCOPY and it will not be written to the output file. The string ":eod" triggers the file system to tell FCOPY that it has reached the end of the file it is reading.)
When FCOPY receive, the end-of-file condition, it goes through the same kind of processing that any application program would do. It closes whatever files it has open and writes a message telling how many records have been read. Figure 4 shows that one record has been processed, and there were no errors. Let's turn our attention now to Figure 5 to see how MYFILE has changed.
Overflowing Files
At the top of Figure 5, we use a :LISTFILE command to display the current state of MYFILE. By now you should be getting pretty adept at reading these :LISTFILE displays. Use your finger to cover the answers to the following questions, and see if you can answer them.

Figure 5: Overflowing the File

How many records are currently in MYFILE? (The answer is displayed under the heading "EOF," and in Figure 5, it's 1.)
How many records can MYFILE hold, at most? (The :LISTFILE command displays the answer under the heading "LIMIT," and in Figure 5, the file can hold no more than 5 records.)
How big is each record in MYFILE? (The :LISTFILE command displays the record size under the heading "SIZE," and in Figure 5, each record is made up of 80 bytes.)
Next we're going to see another use for the :FILE command that we learned about at the beginning of this article. Earlier, we saw that file equations can be used to assign a temporary alias called a "formal file designator" to a file. The example shown in Figure 5 is a bit more complicated. The command is:
    :file x=myfile;acc=append
This file equation does two things. First of all, it assigns the formal file designator "x" to MYFILE, so that a program that writes to a file named "X" will be automatically redirected to write to MYFILE. But this file equation also contains another parameter. The keyword "ACC=APPEND" (the "ACC" stands for "access") tells MPE that any data that is written to the formal file designator "X" should be appended to the file being written to. Normally, when you write to a file that already contains some data, MPE/iX would simply write over the data that is already there. But the ACC= APPEND keyword preserves any data that's already in the file and appends the new data to the end of the file.
In order for this file equation to work, we need a program to write to an output file called "X." Of course, we could get out our trusty COBOL compiler and write one, but there's an easier way. Our old friend FCOPY can write to any formal file designator we like. All you need to do is to put an asterisk ("*") right before the output formal file designator. Returning once again to Figure 5, we see the following FCOPY command:
    :fcopy from=;to=*x
This command leaves off the input filename, just as we saw earlier. FCOPY will therefore read its input from $STDIN (i.e., from the keyboard). The output filename ("x") is preceded by an asterisk, or "star" ("*"). This tells FCOPY, "You are going to write your output to 'x'--but 'x' isn't a filename, it's a formal file designator. Go look at the file equation that I've specified, and it will find what file I want you to write to, and how to write to it."
Putting the :FCOPY command above together with the file equation in Figure 5, we are telling the HP 3000 to read data from our terminal keyboard and append it to the end of MYFILE.
At the top of Figure 5, we saw that there was one record already in MYFILE when we started. The figure shows four additional records being added to the file. Record number 2 contains the number 2 (followed by 79 blanks). Record 3 contains the number 3, and so on. After we've added record number 5, FCOPY dutifully returns to our terminal keyboard for a sixth record. Watch what happens.
I've typed the character string "This one won't work!" Ordinarily FCOPY would have written a sixth record containing this string to our output file. But as you can see, something has gone horribly wrong. The error message "*134*FOUND EOF IN TOFILE" has been displayed instead, and FCOPY has terminated.
The error message isn't as cryptic as it looks. FCOPY is simply trying to tell us that MYFILE has a capacity of five records and we just tried to append a sixth record to the file. When it tried to write record six, it found the EOF (end of file), and it knows it can't write past the EOF.
How Big Is My File?
Next we're going to turn our attention to the physical size of MYFILE. Most operating systems report the physical size of files in bytes. This is because most operating systems (UNIX, NT) are the direct descendants of desktop operating systems, where small files and small disk capacities have historically been the rule. So a file that contained 80 bytes of data in fact occupied no more than 80 bytes of disk storage. (When you have only 20 MB of storage on the whole system, which was the rule not so very many years ago, it is important not to waste a single byte.)
MPE, on the other hand, was designed and built to be a server operating system from the beginning. Large disk capacities are the rule, and always have been. For this reason, MPE reports the sizes of files not in bytes, but in larger units called "sectors." Each sector is made up of 256 bytes (or 128 16-bit words).
Let's see how much disk space is being taken up by the files in the figures. Referring back to Figure 4, the :LISTFILE command at the top of the page shows the characteristics of this file, which at this point contains no data. Looking under the heading EOF, we can see that no records have been written to this file. A file that contains no data should occupy no disk space, and this is indeed the case. The number of sectors occupied by a file is reported under the heading SECTORS, and in Figure 4, we can see that MYFILE indeed currently takes up 0 sectors.
Of course, as soon as you write some data to a file, you can expect the number of sectors it occupies to rise. Sure enough, the :LISTFILE command at the top of Figure 5 shows what happened when we wrote a single record to a file. What may be surprising is the amount of space that MPE allocated to the file. Figure 5 shows the characteristics of MYFILE after we wrote a single 80-byte record. On a byte-oriented operating system, such a file would occupy 80 bytes of disk space. But MPE/iX is not byte-oriented. Looking under the SECTORS heading we can see that MYFILE is currently taking up 16 sectors (or 4,096 bytes).
It may seem wasteful to allocate 4 KB to hold a mere 80 bytes of data. But Figure 6 will help us illustrate that the system used by MPE is actually very efficient. The :LISTFILE command shown at the top of Figure 6 shows the characteristics of MYFILE after we've written all five records to the file (filling it). Note that it still occupies only 16 sectors.
Why MPE File Access Is So Fast
Large commercial applications (especially database applications) spend most of their time reading and writing data from files. (As we'll see in a future article in this series, even databases are constructed from files on MPE/iX.) For this reason, MPE/iX was designed and built to make access to files (and the databases they contain) as fast as possible. One way that it does this is through the wedding of MPE/iX's memory management system and its file system. Most operating systems manage memory in units called pages. MPE/iX is no exception. Both physical memory and virtual memory are managed in 4-KB units called "pages."
MPE/iX's file system is tightly integrated with its memory manager. File access is also handled in 4-KB pages. We've seen some evidence of this in Figure 5 and Figure 6. When we opened MYFILE in Figure 5, a page in virtual memory was allocated for MYFILE. (One page is 4,096 bytes, or 16 sectors.) That's why 16 sectors of disk space were allocated for a file containing only 80 bytes of data. As we added additional 80-byte records to MYFILE, the operating system did not allocate any additional storage, because all five records still fit in the one-page piece of virtual storage that represented MYFILE.

Figure 6: File Sizes

The MPE/iX operating system automatically maps all files into pages of virtual storage. Every file begins on a page boundary. As files grow larger, additional disk space will be allocated in extents made up of one or more pages. Therefore the size of each file will always be a multiple of 16 sectors. This has a disadvantage. If you have a lot of very small files, you could potentially wind up wasting a significant amount of disk space. But this system also has a very important benefit.
Most operating systems have two entirely separate and distinct subsystems that interact with the system's disk drives: the memory management subsystem and the file management subsystem. Memory management is usually the more efficient subsystem, in part because it can assume that it will always be dealing in large contiguous blocks of data--always aligned on page boundaries. Today's disk drives can handle these kinds of workloads very efficiently.
By contrast, the file management subsystem is often much less efficient, in part because it cannot make such sweeping assumptions about the data it will handle. Most operating systems allow files to be as small as a few bytes, and they do not align files on page boundaries. An application that's doing a lot of file input/output (I/O) can seriously bog down the disk drives with large numbers of small operations. Overall performance can suffer.
MPE/iX, by contrast, has only one subsystem that interacts with the disk drives: the memory manager. The file system is built on top of the memory manager, which allows it to map its files into virtual storage. The result is that any request that the file system makes to read or write a file is handled through the same very simple (and very fast) paging operations that the memory manager uses. This has the effect of streamlining disk I/O and simplifying the operating system. The result is that many of the kinds of application programs that tend to get bottlenecked around disk I/O on other operating systems can run blazingly fast on MPE/iX.
There's a name for this concept of mapping files directly into virtual storage--it is called "mapped access." Mapped access isn't unique to MPE/iX. Most modern operating systems provide ways of mapping files into virtual storage. But on most operating systems, mapped access requires very complex, low-level programming, usually requiring knowledge of OS internals and the use of a system programming language such as C.
By contrast, all files are mapped into virtual storage by MPE/iX, regardless of whether they are being accessed through the file system or through a low-level language such as C. Every request to read or write a file is handled by the memory manager, where it can be handled most efficiently. And for those of you with an interest in system programming in C, it is possible to read or write files on the HP 3000 while bypassing the file system altogether. This is, at least theoretically, an even more efficient way to access files on an HP 3000. With the file system out of the way, a file I/O can require as little as a single machine instruction to complete. But it's also easy to shoot yourself in the foot. The file system can do a lot to optimize file access (especially sequential access) and if you bypass it, you lose this optimization. Bypassing the file system requires some knowledge of MPE/iX internals, and a proficiency in a system programming language such as C. A detailed explanation of this kind of access is beyond the scope of these articles.
George Stachnik works in technical training in HP's Network Server Division.

←Part 13 Part 15→