|
|
ALLBASE/SQL Reference Manual: HP 9000 Computer Systems > Chapter 2 Using ALLBASE/SQLUnderstanding Data Access Paths |
|
In creating a database, you must consider not only the arrangement of data, but also the ways in which the data will be accessed during data manipulation operations. The four following access methods are supported directly by ALLBASE/SQL:
For indexed access, you must create a named index, or unique or referential constraint on a table. Unique and referential constraints are supported by constraint indexes, which are similar to B-tree indexes. For information on B-trees, refer to the section "Designing Indexes" in the chapter "Logical Design" of the ALLBASE/SQL Database Administration Guide For hashed access, you must define a hash structure as you create the table. By default, you do not explicitly choose an access method when you issue a query; ALLBASE/SQL does this for you in a process known as optimization. Optimization determines the best access path to the data for the query you have submitted. If a choice is available among the different access methods--for example, if serial, indexed, and hashed access are all possible for the same query--then the optimizer picks the best path. If no other choice is available, the optimizer chooses serial access, also known as a sequential or table scan. Serial access is always possible. To override the access method chosen by the optimizer, use the SETOPT statement. Serial access does not require the existence of any special object in addition to the table itself. If ALLBASE/SQL chooses serial access when you issue a query, it starts reading data from the first page in the table and continues to the end. Serial access is probably the best access method when you intend to read all the data in the table. For example, an application that updates every row in a table in exactly the same way would perform best using a serial scan. Indexed access requires the use of a named index defined on specific columns in the table to be accessed. Indexes can be plain, or they can be unique and/or clustering. Tables having a unique index cannot have duplicate data values in the key column(s). A clustering index causes rows with similar key values to be stored near to each other on disk when this is possible. A table that is to use a clustering index should be loaded in the key order specified by the clustering index. A clustering index can be defined on a unique or referential constraint. Whenever you issue a query, the query processor checks to see if an index exists for one or more of the columns in the query. If an index is available and if the optimizer decides that using the index is the fastest way to access the data, ALLBASE/SQL looks up the key values in the index first, then goes directly to the pages containing table data. For example, in the following query, assume that PurchDB.Parts contains a large number of rows and that a unique index exists on the PartNumber column:
The optimizer would probably choose this unique index for access to the single row because the alternative choice--a serial scan--would require reading each page in the table until the qualifying row is reached. You define an index with the CREATE INDEX statement. The components of an index definition are as follows:
The following example contains numbers that refer to the index components listed above:
ALLBASE/SQL can choose to use an index when processing the SELECT, UPDATE, or DELETE statements if the following criteria are satisfied:
For more information about indexes, refer to the "Designing Indexes" section in the "Logical Design" chapter of the ALLBASE/SQL Database Administration Guide. Hashed access requires you to specify hashing when you create the table, before loading data. Because a hash structure is specified as part of the table definition, you do not assign a name to it, as you do with an index. However, you must identify specific key columns and a number of primary pages for data storage. ALLBASE/SQL determines the placement of rows based on specific unique key values. You can define one hash structure per table at table creation time; and if a hash is defined, you cannot define a clustering index on the table. You can define a multiple-column key for a hash structure; up to 16 columns are permitted in the key. A hash structure is a group of designated pages in a DBEFile that are set aside for the storage of tuples according to the values in a unique hash key. The key enforces uniqueness; duplicate values cannot exist in the hash key column(s). A well-chosen hash key, like a good index key, provides the optimizer with the choice of a potentially faster data access method than a serial scan. Create a hash structure at the time you create a table. In addition to the components of a table definition, a hash structure definition includes:
The reference numbers in the following example refer to the table definition components listed above:
Use the UNIQUE HASH clause or the HASH ON CONSTRAINT clause to specify one or more columns for a hash key. Use the PAGES= clause to define a number of primary pages in which to store the data in the table. This is different from ordinary data storage, which does not require a number of primary pages. Based on the key and the number of primary pages you specify, ALLBASE/SQL calculates a page number for each row before insertion into the table. The page number depends directly on the data in the key. Because a specific number of primary pages is specified, you must create the hash structure as you create the table; you cannot modify a table from normal to hash storage at a later time. The optimizer can decide to use hashed access provided the statement contains a WHERE clause with an EQUAL factor for each column in the hash key. This makes hashing especially useful for tables on which you need quick random access to a specific row. For example, assuming you have defined a hash key on VendorNumber, the optimizer might choose hashed access for the following:
However, it would not consider hash access for the following:
Hash structures operate like unique indexes; that is, they enforce the uniqueness of each key in the table. If you attempt to insert a duplicate key, ALLBASE/SQL will return an error message. Hashing may provide faster access than B-tree lookups for many types of common queries, and it does not require the overhead of additional file space required by B-tree indexes. In addition, hashing is not subject to the overhead of updating index pages when you insert or modify rows. However, updating key values in a hash table requires you to delete the row containing the key value and then insert a row containing the new value. This means that you should choose a non-volatile key for hashing whenever possible. Hashing offers high performance when you need essentially random access to individual tuples. It is not appropriate for applications that require sorting of the query result. In cases where both random access and sorting are required at different times, you can define a B-tree index as well as a hashing structure. This allows the optimizer the choice of the most efficient method for the specific query. The best candidates for the use of hash structures are applications in which the following occur:
You should not use a hash structure if your queries need to scan large areas, for instance, with BETWEEN clauses or with predicates containing <> factors. Each row of a table has a unique address called the tuple identifier, or TID. TID functionality provides the fastest possible data access to a single row. You can obtain the TID of any row with the SELECT statement. For more information on TID access refer to the ALLBASE/SQL application programming manual for the language you are using. |
|