⇓
⇐
←Part 13 Part 15→
The HP 3000--For Complete Novices
Part 14: File Equations and Sizes
by George Stachnik
In the last few articles in this series, we have focused on how to create
programs and build files. In particular, last month's article focused
on using the :BUILD command to create files with specified characteristics.
We saw how to use the ;REC= parameter of the :BUILD command to specify:
- The record size (the number of bytes or words per record)
- The blocking factor (the number of records per block--on PA-RISC systems,
this number is ignored. The only reason it can still be specified is to
maintain compatibility with the older "classic" models of the HP 3000)
- The record format (fixed-length records or variable-length records)
- The file type (ASCII data or BINARY data)
This month, we're going to turn our attention to more advanced characteristics
of MPE files. We'll begin by looking at file equations. We'll
also see how to determine the size of a file and how to specify the capacity
of a file. We'll wind up this month's article with a brief discussion
of why file I/O on MPE/iX is so fast.
FILE Equations
Suppose you were writing a COBOL program to read data from an input
file. Let's assume that when this program is placed into production,
its input file is called INFILE. In COBOL, you could code the filename
right into the file definition. When you run such a program on an
HP 3000, it will look for a file called INFILE and attempt to read data
from it.
Of course, Murphy's Law dictates that as soon as you have a program
that is "locked into" a particular filename, a need will arise
to have it read a file with a different name. For this reason, most
commercial operating systems provide a way of assigning a temporary alias
to a file. Perhaps the best example is the granddaddy of all commercial
operating systems: IBM's MVS operating system. Most mainframe applications
refer to files not by their filenames, but by temporary aliases called
DDNAMES. On IBM mainframes, DDNAMES are assigned using DD statements
in a job control language called (fittingly enough) JCL. JCL is
an old (and cryptic) language, but the concept of DDNAMES is a good one.
It allows mainframe application programmers a degree of flexibility.
The HP 3000 provides a similar capability. MPE allows you to
assign temporary aliases called "formal file designators" using
:FILE commands. Figure 1 shows an example
of the simplest form of file command:
:FILE INFILE=MYFILE
Such a command is often referred to as a "file equation" because
of the equal sign ("=") in the middle of the command. In
this example, the filename MYFILE has been assigned a formal file designator
of INFILE. The formal file designator can be used as a temporary
alias for the filename. So MYFILE can now be referenced by two names:
MYFILE and INFILE.
Figure 1 shows an example of how a file equation might be used
in conjunction with a program that has been coded with the filename INFILE.
On the left side of Figure 1, we see the application program as
it normally works--reading its input from INFILE. But the right
side of the figure shows what happens when we issue the file equation before
running the program. In essence, this FILE command tells MPE that
any program that tries to read a file called INFILE should be redirected
to a different file (in this case, MYFILE).
The scope of a file equation is limited to the session in which the
file command is issued. In other words, file equations are in effect
only until you log off. For example, suppose you issued the file
equation shown on the right side of Figure 1, and then immediately
ran a program designed to read data from INFILE. The program would
be redirected to MYFILE, just as the figure shows. You could run
the program repeatedly during your session, and it would be redirected
to MYFILE every time.
But as soon as you log off the system, any file equations that you've
issued will die with your session. The next time you log on to your
HP 3000, things will be back to normal (as shown on the right side of
Figure 1). If you run the program again,
it will revert to looking for an input file called INFILE.
Furthermore, the impact of file equations is limited to your session
only. It is not global--it doesn't affect other users who may be
logged on at the same time. So if you issue a file equation
that assigns the formal file designator INFILE to the filename MYFILE,
while another user runs the program, the program will behave normally
for them, but it will be redirected when you run it.
If all you could do with the :FILE command was to assign formal file
designators, it would be one of the most valuable and widely used commands
on the system. But in a few moments we're going to see that assigning
formal file designators is just the tip of the iceberg. The :FILE
command gives MPE much of its flexibility, and we'll be returning to it
again and again, not only in this article, but in future parts of this
series as well.
How Big Is My File?
When you create a file for use in an application, one of the first things
you need to ask yourself is, "Exactly how much data can this file
hold?"
On many other operating systems (UNIX, NT) this question isn't relevant.
As you write data to a file, it will expand automatically, taking up as
much disk space as it needs. If you write enough data to a file,
it will continue to grow until your system runs out of disk space. This
may or may not be what you had in mind. Under some circumstances, running
out of disk space can have drastic consequences (including crashing the
system). To prevent this from happening, most operating systems
provide ways of capping the total amount of disk space that can be allocated.
Like UNIX and NT, MPE files grow automatically as you add data to them.
Also, MPE allows you to establish disk quotas that prevent specified
groups of files from growing until they soak up all the disk space on your
system. But unlike many other operating systems, MPE also lets you
place a limit on the size of each individual file. By default, MPE
files are quite small. They can hold no more than 1,023 records.
This is illustrated in Figure 2.
The :BUILD command shown in the figure creates a file named MYFILE.
We have specified the size of each record (80 bytes), as we learned
to do in last month's article. The :LISTFILE command shown at the
bottom of the figure confirms that this file contains 80-byte records (under
the heading "SIZE"), and that the file currently contains no
records (under the heading "EOF," which stands for "End
of File"). The maximum number of records that this file can
hold is displayed under the heading "LIMIT" and in the figure
this value is the default: 1,023.
Figure 2: Creating a File Using the BUILD Command
Writing Data to a File
Many applications need to be able to store considerably more than 1,023
records in their files. For this reason, MPE makes it easy to override
the default number of records that a file can hold. Figure 3 shows
another example of the :BUILD command. This time we have added the keyword
DISC=10000
This keyword specifies how many records the file will (at most) be able
to hold. The :LISTFILE command at the bottom of Figure 3
confirms that the file has a capacity of 10,000 records (under the heading LIMIT).
Figure 3: Using the DISC Keyword
Many HP 3000 users have wondered why Hewlett-Packard chose to spell
the word DISC with the letter C, instead of the more common
spelling ending in K. This HP convention goes back to the 1970s.
The word DISC appeared in all of HP's manuals as well as in the
software itself until the company reversed itself in the 1980s. All HP
3000 documentation was changed to use the standard DISK spelling.
Unfortunately, MPE couldn't follow suit without forcing HP's customers
to modify virtually every piece of HP 3000 software. So today's HP 3000
software continues to use the incorrectly spelled DISC= keyword.
Now that we've seen how to specify the capacity of a file, let's actually
load a file with data. There are a variety of ways to write data to a file.
In most cases, you'll use application programs (probably written in
COBOL, although Java, C, and other languages are beginning to "catch
on" in the HP 3000 world, just as they have on UNIX and NT). For
the purposes of this article, however, we'll use the FCOPY utility that
was introduced in last month's article. Figure 4 shows FCOPY being
used to write records to a file, just as a COBOL application would do.
In this example, each record is made up of a line of text from the keyboard.
In Figure 4, we begin by creating a file,
once again called MYFILE. As before, we have specified that the
records will be 80 bytes long. This time, however, we've made the
file's capacity artificially small. MYFILE will now hold no more
than 5 records (of 80 bytes each).
Figure 4: Writing to a File Using FCOPY
The FCOPY command shown in Figure 4 copies data from the keyboard
and writes it to MYFILE. Notice that the parameter "FROM=;"
has no filename specified. This forces FCOPY to fall back upon its
default source file, $STDIN. When a user logs onto an HP 3000, MPE
associates the name $STDIN with the session's standard input device (typically
the terminal or PC keyboard). So the net effect of specifying "FROM=;"
(without a filename) is to tell FCOPY to read data directly from your keyboard
(just as if it were reading from a file).
If you actually try this at your HP 3000, FCOPY will display a banner
line like the one shown in Figure 4. (The banner begins with "HP31900A.")
After the banner is displayed, things will seem to simply stop. This
is because FCOPY has issued an operating system intrinsic called FREAD,
which tells MPE to read a record of data from the input file. Since
that input file is your terminal, FCOPY is sitting there waiting for you
to type something. Whatever you type next will be read by FCOPY
as if it were a record of data in a file.
In Figure 4, we have typed the character
string "one." FCOPY will dutifully accept this string and write
it to its output file (MYFILE). Remember that MYFILE is made up
of 80-byte fixed-length ASCII records. This means that the record
that's actually written to MYFILE will be made up of the three characters
you typed ("one") followed by 77 ASCII blanks. FCOPY will
then post another read against your terminal and wait for you to type a
second record.
If FCOPY were reading a "real" file, that is, a file stored
on a disk, it would continue this way until it reached the end of the input
file. But since FCOPY is reading from a terminal keyboard, the term
"end of file" doesn't really have any meaning (at least not as
long as you keep typing). We need some way of fooling FCOPY into
thinking that it has come to the end of the "file" that it's reading.
In Figure 4, we have typed the string ":eod." Notice
that the first character is a colon (":"). This string
tells the system that it has reached the end of the file that we've been
typing. (Note that this string is not treated as a line of
data--that is, the string ":eod" is not actually sent to FCOPY
and it will not be written to the output file. The string ":eod"
triggers the file system to tell FCOPY that it has reached the end of the
file it is reading.)
When FCOPY receive, the end-of-file condition, it goes through the same
kind of processing that any application program would do. It closes
whatever files it has open and writes a message telling how many records
have been read. Figure 4 shows that one record has been processed,
and there were no errors. Let's turn our attention now to Figure 5
to see how MYFILE has changed.
Overflowing Files
At the top of Figure 5, we use a :LISTFILE
command to display the current state of MYFILE. By now you should
be getting pretty adept at reading these :LISTFILE displays. Use
your finger to cover the answers to the following questions, and see if
you can answer them.
Figure 5: Overflowing the File
How many records are currently in MYFILE? (The answer is displayed under
the heading "EOF," and in Figure 5, it's 1.)
How many records can MYFILE hold, at most? (The :LISTFILE command displays
the answer under the heading "LIMIT," and in Figure 5,
the file can hold no more than 5 records.)
How big is each record in MYFILE? (The :LISTFILE command displays the
record size under the heading "SIZE," and in Figure 5,
each record is made up of 80 bytes.)
Next we're going to see another use for the :FILE command that we learned
about at the beginning of this article. Earlier, we saw that file
equations can be used to assign a temporary alias called a "formal
file designator" to a file. The example shown in Figure 5
is a bit more complicated. The command is:
:file x=myfile;acc=append
This file equation does two things. First of all, it assigns
the formal file designator "x" to MYFILE, so that a program that
writes to a file named "X" will be automatically redirected to
write to MYFILE. But this file equation also contains another parameter.
The keyword "ACC=APPEND" (the "ACC" stands for
"access") tells MPE that any data that is written to the formal
file designator "X" should be appended to the file being written
to. Normally, when you write to a file that already contains some
data, MPE/iX would simply write over the data that is already there.
But the ACC= APPEND keyword preserves any data that's already in the
file and appends the new data to the end of the file.
In order for this file equation to work, we need a program to write
to an output file called "X." Of course, we could get out our
trusty COBOL compiler and write one, but there's an easier way. Our
old friend FCOPY can write to any formal file designator we like. All
you need to do is to put an asterisk ("*") right before the output
formal file designator. Returning once again to
Figure 5, we see the following FCOPY command:
:fcopy from=;to=*x
This command leaves off the input filename, just as we saw earlier.
FCOPY will therefore read its input from $STDIN (i.e., from the keyboard).
The output filename ("x") is preceded by an asterisk, or
"star" ("*"). This tells FCOPY, "You are
going to write your output to 'x'--but 'x' isn't a filename, it's a formal
file designator. Go look at the file equation that I've specified,
and it will find what file I want you to write to, and how to write to it."
Putting the :FCOPY command above together with the file equation in
Figure 5, we are telling the HP 3000 to read data from our terminal
keyboard and append it to the end of MYFILE.
At the top of Figure 5, we saw that there
was one record already in MYFILE when we started. The figure shows
four additional records being added to the file. Record number 2
contains the number 2 (followed by 79 blanks). Record 3 contains
the number 3, and so on. After we've added record number 5, FCOPY
dutifully returns to our terminal keyboard for a sixth record. Watch
what happens.
I've typed the character string "This one won't work!" Ordinarily
FCOPY would have written a sixth record containing this string to our output
file. But as you can see, something has gone horribly wrong. The
error message "*134*FOUND EOF IN TOFILE" has been displayed instead,
and FCOPY has terminated.
The error message isn't as cryptic as it looks. FCOPY is simply
trying to tell us that MYFILE has a capacity of five records and we just
tried to append a sixth record to the file. When it tried to write
record six, it found the EOF (end of file), and it knows it can't write
past the EOF.
How Big Is My File?
Next we're going to turn our attention to the physical size of MYFILE.
Most operating systems report the physical size of files in bytes.
This is because most operating systems (UNIX, NT) are the direct descendants
of desktop operating systems, where small files and small disk capacities
have historically been the rule. So a file that contained 80 bytes
of data in fact occupied no more than 80 bytes of disk storage. (When
you have only 20 MB of storage on the whole system, which was the rule
not so very many years ago, it is important not to waste a single byte.)
MPE, on the other hand, was designed and built to be a server operating
system from the beginning. Large disk capacities are the rule, and
always have been. For this reason, MPE reports the sizes of files
not in bytes, but in larger units called "sectors." Each sector
is made up of 256 bytes (or 128 16-bit words).
Let's see how much disk space is being taken up by the files in the
figures. Referring back to Figure 4,
the :LISTFILE command at the top of the page shows the characteristics
of this file, which at this point contains no data. Looking under
the heading EOF, we can see that no records have been written to this file.
A file that contains no data should occupy no disk space, and this
is indeed the case. The number of sectors occupied by a file is
reported under the heading SECTORS, and in Figure 4, we can see
that MYFILE indeed currently takes up 0 sectors.
Of course, as soon as you write some data to a file, you can expect
the number of sectors it occupies to rise. Sure enough, the :LISTFILE
command at the top of Figure 5 shows what happened when we wrote
a single record to a file. What may be surprising is the amount
of space that MPE allocated to the file. Figure 5
shows the characteristics of MYFILE after we wrote a single 80-byte
record. On a byte-oriented operating system, such a file would occupy
80 bytes of disk space. But MPE/iX is not byte-oriented. Looking
under the SECTORS heading we can see that MYFILE is currently taking up
16 sectors (or 4,096 bytes).
It may seem wasteful to allocate 4 KB to hold a mere 80 bytes of data.
But Figure 6 will help us illustrate that the system used by
MPE is actually very efficient. The :LISTFILE command shown at the
top of Figure 6 shows the characteristics of MYFILE after we've
written all five records to the file (filling it). Note that it
still occupies only 16 sectors.
Why MPE File Access Is So Fast
Large commercial applications (especially database applications) spend
most of their time reading and writing data from files. (As we'll
see in a future article in this series, even databases are constructed
from files on MPE/iX.) For this reason, MPE/iX was designed and built to
make access to files (and the databases they contain) as fast as possible.
One way that it does this is through the wedding of MPE/iX's memory
management system and its file system. Most operating systems manage
memory in units called pages. MPE/iX is no exception. Both
physical memory and virtual memory are managed in 4-KB units called "pages."
MPE/iX's file system is tightly integrated with its memory manager.
File access is also handled in 4-KB pages. We've seen some evidence
of this in Figure 5 and
Figure 6. When we opened MYFILE in
Figure 5, a page in virtual memory was allocated for MYFILE. (One page
is 4,096 bytes, or 16 sectors.) That's why 16 sectors of disk space were allocated
for a file containing only 80 bytes of data. As we added additional 80-byte
records to MYFILE, the operating system did not allocate any additional
storage, because all five records still fit in the one-page piece of virtual
storage that represented MYFILE.
The MPE/iX operating system automatically maps all files into pages
of virtual storage. Every file begins on a page boundary. As
files grow larger, additional disk space will be allocated in extents made
up of one or more pages. Therefore the size of each file will always
be a multiple of 16 sectors. This has a disadvantage. If
you have a lot of very small files, you could potentially wind up wasting
a significant amount of disk space. But this system also has a very
important benefit.
Most operating systems have two entirely separate and distinct subsystems
that interact with the system's disk drives: the memory management subsystem
and the file management subsystem. Memory management is usually the more
efficient subsystem, in part because it can assume that it will always
be dealing in large contiguous blocks of data--always aligned on page boundaries.
Today's disk drives can handle these kinds of workloads very efficiently.
By contrast, the file management subsystem is often much less efficient,
in part because it cannot make such sweeping assumptions about the data
it will handle. Most operating systems allow files to be as small
as a few bytes, and they do not align files on page boundaries. An
application that's doing a lot of file input/output (I/O) can seriously
bog down the disk drives with large numbers of small operations. Overall
performance can suffer.
MPE/iX, by contrast, has only one subsystem that interacts with the
disk drives: the memory manager. The file system is built on top
of the memory manager, which allows it to map its files into virtual storage.
The result is that any request that the file system makes to read or
write a file is handled through the same very simple (and very fast) paging
operations that the memory manager uses. This has the effect of
streamlining disk I/O and simplifying the operating system. The
result is that many of the kinds of application programs that tend to get
bottlenecked around disk I/O on other operating systems can run blazingly
fast on MPE/iX.
There's a name for this concept of mapping files directly into virtual
storage--it is called "mapped access." Mapped access isn't unique
to MPE/iX. Most modern operating systems provide ways of mapping
files into virtual storage. But on most operating systems, mapped
access requires very complex, low-level programming, usually requiring
knowledge of OS internals and the use of a system programming language
such as C.
By contrast, all files are mapped into virtual storage by MPE/iX, regardless
of whether they are being accessed through the file system or through a
low-level language such as C. Every request to read or write a file
is handled by the memory manager, where it can be handled most efficiently.
And for those of you with an interest in system programming in C, it
is possible to read or write files on the HP 3000 while bypassing the file
system altogether. This is, at least theoretically, an even more
efficient way to access files on an HP 3000. With the file system
out of the way, a file I/O can require as little as a single machine instruction
to complete. But it's also easy to shoot yourself in the foot.
The file system can do a lot to optimize file access (especially sequential
access) and if you bypass it, you lose this optimization. Bypassing
the file system requires some knowledge of MPE/iX internals, and a proficiency
in a system programming language such as C. A detailed explanation
of this kind of access is beyond the scope of these articles.
George Stachnik works in technical training in HP's
Network Server Division.
←Part 13 Part 15→