HPlogo ALLBASE/SQL Database Administration Guide: HP 3000 MPE/iX Computer Systems > Chapter 2 Logical Design

Designing Indexes

» 

Technical documentation

Complete book in PDF
» Feedback

 » Table of Contents

 » Index

An index is defined on one or more columns in a table to facilitate rapid retrieval of data. ALLBASE/SQL decides whether to use an index in the process of creating an optimal access path for the data during query optimization. By default, you cannot specify the use of an index in an SQL statement; the optimizer will decide to use an index if it will improve performance during query processing. However, if you use the SETOPT statement you can override the optimizer and specify the use of an index.

ALLBASE/SQL uses a B-tree (balanced tree) design for indexes, as shown in Figure 2-7 “B-Tree Index Design”.

Figure 2-7 B-Tree Index Design

[B-Tree Index Design]

The tree has different levels:

  • Root level— a single root page containing key values that subdivide the index search.

  • Intermediate levels— pages containing key values that further subdivide an index search.

  • Leaf level— pages which point to data values in the indexed column or columns.

When an index is used to access a row on the basis of a key value, ALLBASE/SQL first looks at the root page, then follows pointers down the tree until it finds an appropriate leaf entry, which points to a specific location in a data page. In Figure 2-7, the root page contains one subdividing value, Larry. In order to find student Glen, we would first examine the root page, which points to the leftmost page on the next level, since Glen comes before Larry in this collating sequence.

This page in turn points to the second leaf page from the left, since Glen comes after Dave and before Larry. Finally, the leaf page points to a specific data page where the desired row is found.

Note that ALLBASE/SQL B-tree indexes are doubly linked, in that the leaf pages point to other leaf pages, so that the next higher or lower value in key order can be quickly located without "backing out" to a higher level in the tree.

Keep in mind the following general guidelines when designing indexes:

  • Indexes can only be defined on tables.

  • An index can only be defined on one table.

  • The maximum number of columns allowed per index is 16.

  • There is no limit to the number of indexes you can define.

  • Too many indexes can degrade performance because when a row is inserted, deleted, or updated in a table, the indexes must also be updated.

  • A small table may not need an index.

  • Cyclic applications requiring multiple access paths (for example, accounting applications for quarterly and annual reports) can create indexes, then drop them when finished. This reduces index overhead and maintenance and improves application performance. DDL must be enabled for this approach to be possible.

  • An index may improve the performance of queries containing a WHERE clause involving a comparison on the first column of a multicolumn index key.

ALLBASE/SQL updates indexes automatically as rows in the table are updated, inserted, or deleted. This adds considerable overhead to the UPDATE, DELETE, and INSERT operations. ALLBASE/SQL maintains index information and statistics in the SYSTEM.INDEX view in the system catalog, which is presented fully in the chapter, "System Catalog."

Determining Index Keys

All of the columns in an index together compose what is called a key. An index on a single column has a simple key. An index on multiple columns has a compound key.

Any column that you specify as a search item in a query will cause ALLBASE/SQL to look for an index with that column specified as part of a key. If no appropriate index is found, or if the optimizer decides not to use the index, ALLBASE/SQL performs a sequential search on that table. Sequential searches may degrade performance when most of the performed queries have a WHERE clause.

The ideal index has the one or two most common columns used in a WHERE clause defined as keys. Keep the following guidelines in mind when determining which columns to use for a key:

  • A simple key can be as long as 1010 bytes.

  • An index should contain a column that is used in the WHERE clause of a frequently executed query.

  • An index should contain a column that is used to verify the existence of a value, especially if it is a unique index.

If you are considering a compound key, you should also keep the following in mind:

  • A compound key can have up to 16 columns.

  • The length of a compound key must be less than or equal to 1024 bytes. Use the following formula to determine whether or not a proposed compound key is acceptable:

2NC + KL + 18 <= 1024

  • Where NC is the number of columns, and KL is the sum of the column lengths in the key. A single column index may not exceed 1010 bytes.

  • The order in which you specify columns in the CREATE INDEX statement determines the sort ordering: the first column listed is the primary, or first, order of sort; the second column listed is second in order; and so on.

The most common index is created on the primary key of a table. For example, the order number is the primary key in the Orders table. Therefore an index with the OrderNumber column as a key should be created on the Orders table.

How Index Keys are Used

ALLBASE/SQL uses simple and compound key indexes differently. Suppose most of your queries have a WHERE clause for the VendorNumber and OrderNumber, but the two columns are never used in the same WHERE clause:

   WHERE VendorNumber='StringConstant'

   WHERE OrderNumber=Integer

In this case, you should create two separate indexes, each with a simple key.

Even if the two columns are used in the same WHERE clause, but they are compared against each other, only one of the columns can be used as a search item.

   WHERE VendorNumber=OrderNumber

In this case as well, you should create two indexes with simple keys.

On the other hand, suppose the two columns are always used together in a WHERE clause:

   WHERE VendorNumber='StringConstant' AND OrderNumber=Integer

In this case you should create one index with a compound key.

While it is not required to have an index on each table in a join, defining an index on one or both of the tables may improve performance. In general, you should define an index on the columns specified in the join. For example, if the sample database has an application that joins the SupplyPrice table with the OrderItems table, then an index with VendPartNumber (the common column) should be defined on one of the tables if not both of them.

Determining Index Type

You can design one of four different types of index:

  • Unique

  • Clustering

  • Clustering and unique

  • Neither clustering nor unique

The unique index will not permit duplicate values for the key columns specified. The clustering index attempts to place rows with similar key values physically close to each other on disk.

Defining Unique Indexes

If you specify the unique option when creating an index, the columns named in the index key are kept unique. Unique indexes prevent duplicate data in the columns used as keys of the index.

In the sample database, the Vendors table has a unique index created on the VendorNumber column to ensure that a vendor number refers to only one vendor. A table can have multiple unique indexes. However, ALLBASE/SQL will not allow you to create a unique index on a table that already contains rows with duplicate values in the key columns.

Defining Clustering Indexes

Clustering indexes are closely related to physical design of the database. To understand what a clustering index does, you must know that ALLBASE/SQL arranges data in sections of a DBEFile called pages. A clustering index attempts to place all rows with similar key values on the same or consecutive pages. Because the rows are physically close, I/O overhead is reduced and performance improved whenever the rows are retrieved in key order. This can be helpful with queries in which you make use of LIKE and BETWEEN predicates.

A clustering index should be defined on a table after the initial loading of the table but before any additional rows are inserted in the table. It is recommended that you sort data on the clustering key before you load it into the table. Rows inserted before a clustering index is created are not repositioned after the index is created. Therefore, if you want to create a clustering index on a table that already contains data, you should unload the table using a SELECT statement with an ORDER BY clause, reload the table, then create the clustering index.

Keep the following guidelines in mind when considering a clustering index:

  • Regardless of the number of indexes you define on a table, only one index can be a clustering index.

  • A clustering index can be either non-unique or unique.

  • A clustering index should be based on the most commonly used sort sequence.

  • Clustering benefits applications that must search large amounts of data to retrieve rows in sequential order based on a key value.

  • A clustering index can be defined on an existing table; however, existing rows will not be repositioned.

  • A clustering index may improve the performance of a query containing a DISTINCT, GROUP BY or ORDER BY clause if the columns in the sort list match the first columns in the index definition.

  • A clustering index may improve the performance of a query containing a WHERE clause involving a comparison on the first few columns listed in the index.

  • Any index makes INSERTs and UPDATEs more expensive.

  • Clustering of data is maintained when there is a relatively large volume of DELETE operations followed by a similar volume of INSERT operations on the table. Clustering of data is not maintained when there is a larger volume of INSERT operations than DELETE operations.