DBEFile Organization

Many performance issues depend on understanding the structure of ALLBASE/SQL DBEFiles. DBEFiles are disk files that store data and indexes for both user and system tables. The following paragraphs outline the structure of DBEFiles and the format of data they contain.

Page Organization

Data is read from and written to DBEFiles in 4096-byte blocks known as pages. Each page read from or written to disk constitutes one I/O operation. The number of I/Os for a given query is an important measure of performance. When you create a DBEFile for the DBEnvironment, you specify its size as a number of pages, and the size is recorded in the system catalog.

Page Table Pages

The first page in a DBEFile is known as a page table page, so called because it contains a table of the contents of the following 252 pages in the file. Before each group of 252 data or index pages, a new page table page is included. Page table pages are composed of entries that indicate whether the following pages are allocated, full, free, or in some other state. These pages are accessed during serial scans, and they are modified when data is added to or dropped from a table.

Page table page entries are 16 bytes long; each page table page contains one entry for each of the following 252 pages. The page table page also contains a 48-byte header and a 16-byte trailer at the end for a total of 4096 bytes, as the following diagram shows:

Rows of Data on Pages

Data from tables and indexes is stored on DBEFile pages as rows or tuples. A given page can store data for only one table or index, though the DBEFile as a whole can contain pages for many different objects. A DBEFile of type TABLE can only contain data for tables, and a DBEFile of type INDEX can only contain index data. A DBEFile of type MIXED may include pages for both tables and indexes. A page table page entry indicates which object has data stored on that particular page.

Structure of a Page

Understanding how data is arranged on these pages can assist you in deciding how to reduce the number of I/Os required for particular database operations. Since table data and index data are stored somewhat differently, the two types of storage are presented in separate sections that follow.

Storage of Table Data on DBEFile Pages

Tuples from user tables are stored on DBEFile pages as in the following diagram.

The tuples in a TABLE DBEFile are rows from user tables. The tuple header contains descriptive information about the columns in a tuple, and the tuple body contains the actual data.

The format of a tuple header is as follows:

The tuple header contains the following fields:

Length of the header (2 bytes)
A 2-byte column descriptor for each column in the tuple. Each column descriptor has bits that indicate the following:
- Whether or not the column contains a null value.
- Data type of the column.
- Byte offset to the end of the column within the tuple.

The total length of a tuple header is given by the formula:

        Length = 2*(NCols + 1)

where NCols is the number of columns in the tuple.

At the beginning of the data area on a page, following the page header and before the first tuple on the page, there is a tuple header which contains information about the first tuple. This header is also known as the shared header because it can be shared with other tuples on the page. For all tuples other than the first one, a header is only stored with the tuple body if it is different from the shared header.

To illustrate the use of the shared header, suppose that a page holds 100 rows of 4 columns each. In this case, header data requires 10 bytes per tuple. If each header were different from the shared header, the total space used on the page by all tuple headers would be 1000 bytes. But if all 100 rows used the same header, then header data would only occupy a total of 10 bytes, leaving more room for tuples.

A page with a shared tuple header looks like the following:

A flag is set in the slot table to indicate whether or not the shared tuple header is used by a particular row. The slot table is further described below. Each tuple can be uniquely identified by a tuple ID (TID), which has the following format:

For individual rows, the TID is usually expressed as three numbers separated by colons (example: 10:5:8), where the first number is the DBEFile number; the second is the page number; and the third is the slot number. Version numbers are not used for identifying rows of data, but they are used in identifying other objects in the DBEnvironment, such as tables. The DBEFile number and page number indicate the file page in which the tuple is found. The slot number, further described below, indicates an offset on the page where the row is located.

Slot Table

At the end of the page is a slot table, which contains an entry for each tuple on the page. When ALLBASE/SQL uses a TID, the slot number found in the last field in the TID is an index into this table. Each slot table entry has the following format:

Each entry is 16 bits long; the first four bits are flags (FL), and the remaining 12 bits are the byte offset to the tuple within the page, counting from the beginning of the page. There is a maximum of 256 entries in a slot table, which means that a maximum of 256 rows can be stored on a page.

Indirect Rows

An indirect row is created when a row's length increases during an update, and the page that currently holds the row does not have enough space to store the new information. In cases like this, the updated row is stored on another page, and the original page is updated with the TID of the new row on the new page. An indirect row can only be accessed by first fetching one page to find the address of the row and then fetching a second page to obtain the row itself.

An indirect row may be created in any of the following circumstances:

A variable length data field is updated with a longer value than the one previously stored.
A NULL column is updated to contain a non-null value.
An ALTER TABLE statement adds a column to a table and supplies a default value that must be added to each row.

To determine the percentage of indirect rows, run SQLMON and examine the TABLE INDIRECT ROW field of the Static Indirect screen.

Hash Storage

Hash structures in ALLBASE/SQL are tables that you define to be hashed according to a specific key at the time you create them. In addition to the key, you specify a number of primary pages, which become the hashed locations of the data in the DBEFile or DBEFiles holding the table. Rows are inserted based on the value in the hash key; a row is said to hash to a specific page, which means that the primary page location is calculated from the value of the key, which must be unique. A search array is maintained on each page. This structure contains slot numbers in key order. ALLBASE/SQL does a binary search using this array to arrive at a specific row quickly. When a row is inserted, the array is updated.

If there is not enough room for an inserted row on a hash page, the row is placed on an overflow page. Overflow pages are linked to the primary page and to one another using the NXT and PRV pointers on the page. A large number of overflow pages means slower access to data. To avoid this, choose a good hash key with a uniform distribution of values within the table. Evenly distributed key values result in hashing to an evenly distributed set of pages. A key with a skewed distribution of values would result in hashing to a skewed set of pages. If the hash key does not have a uniform distribution, then the time required to access some rows will be much slower than the time required to access other rows.

Except for the search array and the NXT and PRV pointers, the format of a hash data page is similar to that of an ordinary data DBEFile page, as shown in the following diagram:

To get information about the hash structures of a DBEnvironment, run SQLMON and go to the Static Hash screen.

Page Compression

When a row of data is inserted on a DBEFile page, it enters the region known as the free area on the page. The free area is all the space that is marked as available for data. When a row is deleted, its space does not immediately return to the free area. Thus, additional inserts following a deletion will first fill up all the space in the free area on the page. If the free area becomes full at insert or update time, the space left over from deletions is returned to the page's free area. This process is known as page compression.

Storage of Index Data on DBEFile Pages

Index entries are stored just as data entries are, with the following exceptions:

Index pages are either leaf pages or non-leaf pages.
Leaf pages actually point to rows on DBEFile pages. Each leaf page tuple contains a key value and the TID of a data row containing that key.
Non-leaf pages contain tuples with key values and pointers to other non-leaf pages or to leaf pages.

A leaf index page for a common type of index looks like the following:

The NXT and PRV fields point to the next and previous leaf pages in the index. These pointers make an index scan in key order extremely fast. The shared header describes the tuple's length and characteristics. The tuple body on an index leaf page has two parts: a key and a data TID. The key is the actual index key value as it appears in the table, and the data TID is the address of the row pointed to by this index entry.

To monitor the indexes of a DBEnvironment, run SQLMON and go to the SampleIO Indexes and Static Cluster screens.

How Indexes are Used

The basic index structure in ALLBASE/SQL is a B-tree or balanced tree. The B-tree consists of a root page and a number of non-leaf and leaf pages. Typically, the root page points to a non-leaf page that contains entries for particular ranges of key values, and each non-leaf page points to still other pages containing entries for progressively narrower ranges of values. The lowest level in the tree is called the leaf page, which contains pointers to (TIDs of) specific rows in DBEFiles.

The following diagram shows how a B-tree provides access to data. After deciding to use a particular index, ALLBASE/SQL accesses the root page (1). Then it reads non-leaf pages (2) until it obtains a leaf page (3) which contains the TID of a qualifying row. Finally, it accesses the data page containing the row (4). Assume that the values of C1 in the following are percentages between 1 and 100:

How PCRs are Stored

A PCR (parent-child relationship) is a special kind of B-tree that supports a referential relationship between two tables--the parent table and the child table. This kind of index has entries that point to the rows in the referring table and different entries that point to the rows in the table referred to. The table referred to must also have a unique index defined on it.

In the leaf pages of a PCR, an entry for the parent table (that is, the table referred to) precedes the entries for child tables (tables referring to the parent). Keys are distinguished by adding a 0 to the end of a parent key, and a 1 to the end of a child key.

The leaf index page in a PCR would have tuples like the following:

Page Splitting

As rows are inserted into a table, index entries are also inserted into the B-tree. As the pages of a B-tree fill up, new branches of the tree are created through a process known as page splitting. In page splitting, two new index pages are allocated, and the index entries on the old page are copied to the new pages -- half the entries to each new page; then the new entry is inserted, and the old page is freed for reuse. The process is shown for a typical case in the following figure. (1) shows a portion of an index before splitting; (2) shows the index after splitting.

When the key value being inserted is greater than the largest value already in the index or smaller than the lowest value, page splitting is one-way. This means that only one new page is allocated, and half the entries from the old page are moved to it, after which the new key value is added to the new page. In the previous example, an attempt to insert a value of 110 on a full leaf page where the highest value is 100 would result in one-way page splitting.