HP 3000 Manuals

Understanding Data Access Paths [ ALLBASE/SQL Reference Manual ] MPE/iX 5.5 Documentation
ALLBASE/SQL Reference Manual

Understanding Data Access Paths 

In creating a database, you must consider not only the arrangement of
data, but also the ways in which the data will be accessed during data
manipulation operations.  The four following access methods are supported
directly by ALLBASE/SQL:

   *   Serial access
   *   Indexed access
   *   Hashed access
   *   TID access

For indexed access, you must create a named index, or unique or
referential constraint on a table.  Unique and referential constraints
are supported by constraint indexes, which are similar to B-tree indexes.
For information on B-trees, refer to the section "Designing Indexes" in
the chapter "Logical Design" of the ALLBASE/SQL Database Administration 
Guide.

For hashed access, you must define a hash structure as you create the
table.

By default, you do not explicitly choose an access method when you issue
a query; ALLBASE/SQL does this for you in a process known as
optimization.  Optimization determines the best access path to the data
for the query you have submitted.  If a choice is available among the
different access methods--for example, if serial, indexed, and hashed
access are all possible for the same query--then the optimizer picks the
best path.  If no other choice is available, the optimizer chooses serial
access, also known as a sequential or table scan.  Serial access is
always possible.

To override the access method chosen by the optimizer, use the SETOPT
statement.

Serial Access 

Serial access does not require the existence of any special object in
addition to the table itself.  If ALLBASE/SQL chooses serial access when
you issue a query, it starts reading data from the first page in the
table and continues to the end.  Serial access is probably the best
access method when you intend to read all the data in the table.  For
example, an application that updates every row in a table in exactly the
same way would perform best using a serial scan.

Indexed Access 

Indexed access requires the use of a named index defined on specific
columns in the table to be accessed.  Indexes can be plain, or they can
be unique and/or clustering.  Tables having a unique index cannot have
duplicate data values in the key column(s).  A clustering index causes
rows with similar key values to be stored near to each other on disk when
this is possible.  A table that is to use a clustering index should be
loaded in the key order specified by the clustering index.  A clustering
index can be defined on a unique or referential constraint.

Whenever you issue a query, the query processor checks to see if an index
exists for one or more of the columns in the query.  If an index is
available and if the optimizer decides that using the index is the
fastest way to access the data, ALLBASE/SQL looks up the key values in
the index first, then goes directly to the pages containing table data.

For example, in the following query, assume that PurchDB.Parts contains a
large number of rows and that a unique index exists on the PartNumber
column:

     isql=> SELECT PartName, SalesPrice FROM PurchDB.Parts 
     > WHERE PartNumber = '1323-D-01'; 

The optimizer would probably choose this unique index for access to the
single row because the alternative choice--a serial scan--would require
reading each page in the table until the qualifying row is reached.

You define an index with the CREATE INDEX statement.  The components of
an index definition are as follows:

   1.  Type of the index (optional)
   2.  Name of the index
   3.  Table on which the index operates
   4.  Key column(s)

The following example contains numbers that refer to the index components
listed above:

               1
               |
     CREATE UNIQUE INDEX
            PartIndex       --2
         ON PurchDB.Parts   --3
            (Partno)        --4

ALLBASE/SQL can choose to use an index when processing the SELECT,
UPDATE, or DELETE statements if the following criteria are satisfied:

   *   The statement contains a WHERE clause, which consists of one or
       more predicates.  A predicate is a comparison of expressions that
       evaluates to a value of True or False.  Refer to the "Search
       Conditions" chapter for more information on predicates.

   *   The statement contains explicit join syntax.

   *   Predicates are optimizable, which means that the use of an index
       is considered in choosing an access path for the data.  The
       following predicates are optimizable when all the data types
       within them are the same; in the case of DECIMAL data, the
       precisions and scales of the values must be the same:

          *   WHERE Column1 ComparisonOperator Column2, in which
              ComparisonOperator is one of the following:  =, >, >=, <,
              or <=.  An index may be used if Column1 and Column2 are in
              different tables and an index exists on either column.  For
              example:

                   WHERE PurchDB.Parts.PartNumber = PurchDB.SupplyPrice.PartNumber

          *   WHERE Column1 ComparisonOperator (Constant or 
              HostVariable), in which ComparisonOperator is as defined
              above.  An index may be used if one exists on Column1;
              however, an index may be used if a host variable appears in
              the predicate only if the comparison operator is =, >, >=,
              <, or <= .  For example:

                   WHERE SupplyPrice = :SupplyPrice

          *   WHERE Column1 BETWEEN (Column2 or Constant or HostVariable)
              AND (Column2 or Constant or HostVariable).  For example:

                   WHERE OrderNumber BETWEEN '1123-P-01' AND '1243-MU-01'

   *   Some queries which use the MIN or MAX aggregate function on an
       indexed column as follows are optimizable:

          *   MIN/MAX column is the first column of a nonhashed index.

          *   MIN/MAX indexed column on a single table with or without
              predicates.

          *   MIN/MAX indexed column on the outermost table of a nested
              loop join query.

          *   Single MIN/MAX within one query.

   *   ALLBASE/SQL does not use an index in the following types of
       queries:

          *   The query contains a WHERE clause using a not-equal (<>)
              arithmetic operator, such as, WHERE Column1 <> (Column2 or
              Constant or Host Variable).  For example:

                   WHERE VendorState <> :VendorState

          *   The query contains a predicate using an arithmetic
              expression.  For example:

                   WHERE Column1 > Column2*:HostVariable 

          *   MIN or MAX is used with the GROUP BY, ORDER BY, or HAVING
              clause.

          *   A MIN or MAX indexed column exists in the inner table of a
              nested-loop, join query.

          *   A MIN or MAX indexed column exists on all tables of a
              sort-merge, join query.

          *   MIN or MAX is used with an expression.

          *   One query contains multiple MINs or MAXs.

          *   A LIKE predicate contains a host variable.

       If other predicates are used, then an index is considered in
       choosing an access path.

For more information about indexes, refer to the "Designing Indexes"
section in the "Logical Design" chapter of the ALLBASE/SQL Database 
Administration Guide.

Hashed Access 

Hashed access requires you to specify hashing when you create the table,
before loading data.  Because a hash structure is specified as part of
the table definition, you do not assign a name to it, as you do with an
index.  However, you must identify specific key columns and a number of
primary pages for data storage.  ALLBASE/SQL determines the placement of
rows based on specific unique key values.  You can define one hash
structure per table at table creation time; and if a hash is defined, you
cannot define a clustering index on the table.  You can define a
multiple-column key for a hash structure; up to 16 columns are permitted
in the key.

A hash structure is a group of designated pages in a DBEFile that are set
aside for the storage of tuples according to the values in a unique hash 
key.  The key enforces uniqueness; duplicate values cannot exist in the
hash key column(s).  A well-chosen hash key, like a good index key,
provides the optimizer with the choice of a potentially faster data
access method than a serial scan.

Create a hash structure at the time you create a table.  In addition to
the components of a table definition, a hash structure definition
includes:

   1.  Columns that define the hash key
   2.  Number of primary pages

The reference numbers in the following example refer to the table
definition components listed above:

     CREATE PUBLIC TABLE PurchDB.Vendors
                         (VendorNumber  INTEGER  NOT NULL,
                         VendorName  CHAR(30)  NOT NULL,
                         ContactName  CHAR(30),
                         PhoneNumber  CHAR(15),
                         VendorStreet  CHAR(30)  NOT NULL,
                         VendorCity  CHAR(20)  NOT NULL,
                         VendorState  CHAR(2)   NOT NULL,
                         VendorZipCode  CHAR(10)  NOT NULL,
                         VendorRemarks  VARCHAR(60) )

          UNIQUE HASH ON (VendorNumber)        -- 1
                         PAGES = 101           -- 2

                      IN PurchFS

Use the UNIQUE HASH clause or the HASH ON CONSTRAINT clause to specify
one or more columns for a hash key.  Use the PAGES= clause to define a
number of primary pages in which to store the data in the table.  This is
different from ordinary data storage, which does not require a number of
primary pages.

Based on the key and the number of primary pages you specify, ALLBASE/SQL
calculates a page number for each row before insertion into the table.
The page number depends directly on the data in the key.  Because a
specific number of primary pages is specified, you must create the hash
structure as you create the table; you cannot modify a table from normal
to hash storage at a later time.

The optimizer can decide to use hashed access provided the statement
contains a WHERE clause with an EQUAL factor for each column in the hash
key.  This makes hashing especially useful for tables on which you need
quick random access to a specific row.

For example, assuming you have defined a hash key on VendorNumber, the
optimizer might choose hashed access for the following:

     isql=> SELECT * FROM PurchDB.Vendors 
     > WHERE VendorNumber = 9002; 

However, it would not consider hash access for the following:

     isql=> SELECT * FROM PurchDB.Vendors 
     > WHERE VendorNumber > 9002 
     > ORDER BY VendorName; 

Hash structures operate like unique indexes; that is, they enforce the
uniqueness of each key in the table.  If you attempt to insert a
duplicate key, ALLBASE/SQL will return an error message.

Differences between Hashed and Indexed Access 

Hashing may provide faster access than B-tree lookups for many types of
common queries, and it does not require the overhead of additional file
space required by B-tree indexes.  In addition, hashing is not subject to
the overhead of updating index pages when you insert or modify rows.
However, updating key values in a hash table requires you to delete the
row containing the key value and then insert a row containing the new
value.  This means that you should choose a non-volatile key for hashing
whenever possible.

When to Use a Hash Structure 

Hashing offers high performance when you need essentially random access
to individual tuples.  It is not appropriate for applications that
require sorting of the query result.  In cases where both random access
and sorting are required at different times, you can define a B-tree
index as well as a hashing structure.  This allows the optimizer the
choice of the most efficient method for the specific query.

The best candidates for the use of hash structures are applications in
which the following occur:

   *   Keys are not frequently updated.  Remember that you cannot use the
       UPDATE statement on hash key columns.  This means that you must
       delete and then insert rows that contain changes to key values.

   *   Most queries contain EQUAL factors on hash key columns.

   *   Tuples are of fixed size, with a minimum of VARCHARS and NULL
       values.

You should not use a hash structure if your queries need to scan large
areas, for instance, with BETWEEN clauses or with predicates containing
<> factors.

TID Access 

Each row of a table has a unique address called the tuple identifier, or
TID. TID functionality provides the fastest possible data access to a
single row.  You can obtain the TID of any row with the SELECT statement.
For more information on TID access refer to the ALLBASE/SQL application
programming manual for the language you are using.
MPE/iX 5.5 Documentation