Understanding Data Access Paths

In creating a database, you must consider not only the arrangement of data, but also the ways in which the data will be accessed during data manipulation operations. The four following access methods are supported directly by ALLBASE/SQL:

Serial access
Indexed access
Hashed access
TID access

For indexed access, you must create a named index, or unique or referential constraint on a table. Unique and referential constraints are supported by constraint indexes, which are similar to B-tree indexes. For information on B-trees, refer to the section "Designing Indexes" in the chapter "Logical Design" of the ALLBASE/SQL Database Administration Guide

For hashed access, you must define a hash structure as you create the table.

By default, you do not explicitly choose an access method when you issue a query; ALLBASE/SQL does this for you in a process known as optimization. Optimization determines the best access path to the data for the query you have submitted. If a choice is available among the different access methods--for example, if serial, indexed, and hashed access are all possible for the same query--then the optimizer picks the best path. If no other choice is available, the optimizer chooses serial access, also known as a sequential or table scan. Serial access is always possible.

To override the access method chosen by the optimizer, use the SETOPT statement.

Serial Access

Serial access does not require the existence of any special object in addition to the table itself. If ALLBASE/SQL chooses serial access when you issue a query, it starts reading data from the first page in the table and continues to the end. Serial access is probably the best access method when you intend to read all the data in the table. For example, an application that updates every row in a table in exactly the same way would perform best using a serial scan.

Indexed Access

Indexed access requires the use of a named index defined on specific columns in the table to be accessed. Indexes can be plain, or they can be unique and/or clustering. Tables having a unique index cannot have duplicate data values in the key column(s). A clustering index causes rows with similar key values to be stored near to each other on disk when this is possible. A table that is to use a clustering index should be loaded in the key order specified by the clustering index. A clustering index can be defined on a unique or referential constraint.

Whenever you issue a query, the query processor checks to see if an index exists for one or more of the columns in the query. If an index is available and if the optimizer decides that using the index is the fastest way to access the data, ALLBASE/SQL looks up the key values in the index first, then goes directly to the pages containing table data.

For example, in the following query, assume that PurchDB.Parts contains a large number of rows and that a unique index exists on the PartNumber column:

   isql=> SELECT PartName, SalesPrice FROM PurchDB.Parts
   > WHERE PartNumber = '1323-D-01';

The optimizer would probably choose this unique index for access to the single row because the alternative choice--a serial scan--would require reading each page in the table until the qualifying row is reached.

You define an index with the CREATE INDEX statement. The components of an index definition are as follows:

Type of the index (optional)
Name of the index
Table on which the index operates
Key column(s)

The following example contains numbers that refer to the index components listed above:

             1
             |
   CREATE UNIQUE INDEX
          PartIndex       --2
       ON PurchDB.Parts   --3
          (Partno)        --4

ALLBASE/SQL can choose to use an index when processing the SELECT, UPDATE, or DELETE statements if the following criteria are satisfied:

The statement contains a WHERE clause, which consists of one or more predicates. A predicate is a comparison of expressions that evaluates to a value of True or False. Refer to the "Search Conditions" chapter for more information on predicates.
The statement contains explicit join syntax.
Predicates are optimizable, which means that the use of an index is considered in choosing an access path for the data. The following predicates are optimizable when all the data types within them are the same; in the case of DECIMAL data, the precisions and scales of the values must be the same:
- WHERE Column1 ComparisonOperator Column2, in which ComparisonOperator is one of the following: =, >, >=, <, or <=. An index may be used if Column1 and Column2 are in different tables and an index exists on either column. For example:
  WHERE PurchDB.Parts.PartNumber = PurchDB.SupplyPrice.PartNumber
- WHERE Column1 ComparisonOperator (Constant or HostVariable), in which ComparisonOperator is as defined above. An index may be used if one exists on Column1; however, an index may be used if a host variable appears in the predicate only if the comparison operator is =, >, >=, <, or <=. For example:
  WHERE SupplyPrice = :SupplyPrice
- WHERE Column1 BETWEEN (Column2 or Constant or HostVariable) AND (Column2 or Constant or HostVariable). For example:
  WHERE OrderNumber BETWEEN '1123-P-01' AND '1243-MU-01'
Some queries which use the MIN or MAX aggregate function on an indexed column as follows are optimizable:
- MIN/MAX column is the first column of a nonhashed index.
- MIN/MAX indexed column on a single table with or without predicates.
- MIN/MAX indexed column on the outermost table of a nested loop join query.
- Single MIN/MAX within one query.
ALLBASE/SQL does not use an index in the following types of queries:
- The query contains a WHERE clause using a not-equal (<>) arithmetic operator, such as, WHERE Column1 <> (Column2 or Constant or Host Variable). For example:
  WHERE VendorState <> :VendorState
- The query contains a predicate using an arithmetic expression. For example:
  WHERE Column1 > Column2*:HostVariable
- MIN or MAX is used with the GROUP BY, ORDER BY, or HAVING clause.
- A MIN or MAX indexed column exists in the inner table of a nested-loop, join query.
- A MIN or MAX indexed column exists on all tables of a sort-merge, join query.
- MIN or MAX is used with an expression.
- One query contains multiple MINs or MAXs.
- A LIKE predicate contains a host variable.
If other predicates are used, then an index is considered in choosing an access path.

For more information about indexes, refer to the "Designing Indexes" section in the "Logical Design" chapter of the ALLBASE/SQL Database Administration Guide.

Hashed Access

Hashed access requires you to specify hashing when you create the table, before loading data. Because a hash structure is specified as part of the table definition, you do not assign a name to it, as you do with an index. However, you must identify specific key columns and a number of primary pages for data storage. ALLBASE/SQL determines the placement of rows based on specific unique key values. You can define one hash structure per table at table creation time; and if a hash is defined, you cannot define a clustering index on the table. You can define a multiple-column key for a hash structure; up to 16 columns are permitted in the key.

A hash structure is a group of designated pages in a DBEFile that are set aside for the storage of tuples according to the values in a unique hash key. The key enforces uniqueness; duplicate values cannot exist in the hash key column(s). A well-chosen hash key, like a good index key, provides the optimizer with the choice of a potentially faster data access method than a serial scan.

Create a hash structure at the time you create a table. In addition to the components of a table definition, a hash structure definition includes:

Columns that define the hash key
Number of primary pages

The reference numbers in the following example refer to the table definition components listed above:

  CREATE PUBLIC TABLE PurchDB.Vendors
                       (VendorNumber  INTEGER  NOT NULL,
                       VendorName  CHAR(30)  NOT NULL,
                       ContactName  CHAR(30),
                       PhoneNumber  CHAR(15),
                       VendorStreet  CHAR(30)  NOT NULL,
                       VendorCity  CHAR(20)  NOT NULL,
                       VendorState  CHAR(2)   NOT NULL,
                       VendorZipCode  CHAR(10)  NOT NULL,
                       VendorRemarks  VARCHAR(60) )
 
        UNIQUE HASH ON (VendorNumber)        -- 1
                       PAGES = 101           -- 2
 
                    IN PurchFS

Use the UNIQUE HASH clause or the HASH ON CONSTRAINT clause to specify one or more columns for a hash key. Use the PAGES= clause to define a number of primary pages in which to store the data in the table. This is different from ordinary data storage, which does not require a number of primary pages.

Based on the key and the number of primary pages you specify, ALLBASE/SQL calculates a page number for each row before insertion into the table. The page number depends directly on the data in the key. Because a specific number of primary pages is specified, you must create the hash structure as you create the table; you cannot modify a table from normal to hash storage at a later time.

The optimizer can decide to use hashed access provided the statement contains a WHERE clause with an EQUAL factor for each column in the hash key. This makes hashing especially useful for tables on which you need quick random access to a specific row.

For example, assuming you have defined a hash key on VendorNumber, the optimizer might choose hashed access for the following:

   isql=> SELECT * FROM PurchDB.Vendors
   > WHERE VendorNumber = 9002;

However, it would not consider hash access for the following:

   isql=> SELECT * FROM PurchDB.Vendors
   > WHERE VendorNumber > 9002
   > ORDER BY VendorName;

Hash structures operate like unique indexes; that is, they enforce the uniqueness of each key in the table. If you attempt to insert a duplicate key, ALLBASE/SQL will return an error message.

Differences between Hashed and Indexed Access

Hashing may provide faster access than B-tree lookups for many types of common queries, and it does not require the overhead of additional file space required by B-tree indexes. In addition, hashing is not subject to the overhead of updating index pages when you insert or modify rows. However, updating key values in a hash table requires you to delete the row containing the key value and then insert a row containing the new value. This means that you should choose a non-volatile key for hashing whenever possible.

When to Use a Hash Structure

Hashing offers high performance when you need essentially random access to individual tuples. It is not appropriate for applications that require sorting of the query result. In cases where both random access and sorting are required at different times, you can define a B-tree index as well as a hashing structure. This allows the optimizer the choice of the most efficient method for the specific query.

The best candidates for the use of hash structures are applications in which the following occur:

Keys are not frequently updated. Remember that you cannot use the UPDATE statement on hash key columns. This means that you must delete and then insert rows that contain changes to key values.
Most queries contain EQUAL factors on hash key columns.
Tuples are of fixed size, with a minimum of VARCHARS and NULL values.

You should not use a hash structure if your queries need to scan large areas, for instance, with BETWEEN clauses or with predicates containing <> factors.

TID Access

Each row of a table has a unique address called the tuple identifier, or TID. TID functionality provides the fastest possible data access to a single row. You can obtain the TID of any row with the SELECT statement. For more information on TID access refer to the ALLBASE/SQL application programming manual for the language you are using.