HP 3000 Manuals

Designing Hash Structures [ ALLBASE/SQL Database Administration Guide ] MPE/iX 5.5 Documentation


ALLBASE/SQL Database Administration Guide

Designing Hash Structures 

Like an index, a hash structure specifies up to 16 key columns on a table
to facilitate rapid access to data.  But unlike an index, a hash
structure is not a separate object; instead, it is a way of arranging the
actual table data in DBEFile pages.  To create a hash structure, you must
define a hash key and specify a number of primary pages in a DBEFileSet
containing one or more empty TABLE or MIXED DBEFiles.

Because of the nature of hashing, you can define only one hash structure
for a single table.  You can, however, define multiple B-tree indexes on
the same table.  You cannot define a clustering index, because clustering
affects the physical placement of data, and this would be incompatible
with the physical placement chosen by the hash algorithm.

Another consequence of hashing is the requirement that you specify a
number of primary pages.  When the table grows to include more data than
will fit on the primary pages, new tuples are placed on overflow pages,
which can be located in the same or in other DBEFiles within the same
DBEFileSet.  The larger the number of overflow pages, the slower the
access to the average tuple.

Understanding the Hash Function 

Each time a row is inserted into the table, a hash function maps the
specified key value to a logical page in the DBEFileSet associated with
the hash structure.  If two keys hash to the same page (in what is called
a collision), then space is allocated on the same page if possible for
both rows.  If space cannot be allocated on the same page, a new page is
allocated (an overflow page) and linked to the old page.

To access data through a hash structure, ALLBASE/SQL calculates the
correct data page location in the DBEFileSet from the key value by means
of the hash function.

The hash function used by ALLBASE/SQL derives the primary page number in
three steps:

   1.  Creates a concatenated form of the key.
   2.  Folds the concatenated key into a 4-byte integer.
   3.  Computes the primary page number, as follows:

            Primary Page Number = (Folded Key) mod (Number of Primary Pages + 1)

The important thing to note is that hashing works best with a fairly even
distribution of key values, spreading the corresponding rows evenly over
the number of pages available.  A key with a skewed distribution will
attempt to place all rows on a correspondingly skewed set of pages.  For
good results, use a prime number of primary pages when you are hashing on
a non-integer key.

Choosing Hash Keys 

In choosing a hash key, one important consideration is your query design.
There must be an EQUAL factor for each key column in the predicates of
queries that will use the hash structure.

Another important issue is the distribution of key values.  The best key
results in a set of hash values that are evenly distributed among the
primary pages available.  The worst key results in hash values that
cluster tightly in a narrow range of primary pages, leaving others empty.

Another consideration is that the key must be unique.  It can either be a
unique single column value or a unique combination.  In addition to being
unique, a hash key should be non-volatile, that is, not subject to
frequent update.  Since you cannot use the UPDATE statement with a hash
key column, you must do a DELETE followed by an INSERT when a key
modification is necessary.

An integer key such as a social security number is ideal.  You can also
use multiple columns to create a unique key, as in the following example:

     CREATE PUBLIC TABLE PurchDB.OrderItems
       (OrderNumber       INTEGER           NOT NULL,
        ItemNumber        INTEGER           NOT NULL,
        VendPartNumber    CHAR(16),
        PurchasePrice     DECIMAL(10,2)     NOT NULL,
        OrderQty          SMALLINT,
        ItemDueDate       CHAR(8),
        ReceivedQty       SMALLINT )
     UNIQUE HASH ON (OrderNumber, VendPartNumber) PAGES=101
     IN OrderFS

Note that use of a multicolumn key does not, in itself, ensure
uniqueness.

Choosing the Number of Primary Pages 

The number of primary pages in a hash structure is the number of data
pages allocated as hash buckets.  The optimal number depends partially on
how much data you need to store, and partially on how sparse you wish the
DBEFile pages to be.  The larger the number of primary pages, the more
sparse the pages.

In general, choose a number that is large enough for the data you need to
store plus space for a reasonable amount of growth.  In practical terms,
the smallest number of primary pages would be equal to the size of the
data divided by the size of a page (about 3900 bytes, allowing for
overhead).  The largest number of primary pages you might choose would be
equal to the total number of rows; for a unique hash structure with a
completely even distribution of key values, this would mean one row per
page.

For small tables subject to frequent access, create a hash structure with
the number of primary pages equal to the number of rows and with an even
distribution of key values.  An example is a currency exchange table
containing entries for 50 currencies using a currency code key with
values ranging from 1001 to 1050 and 50 primary pages.  This structure
will enable you to lock only a single row at a time while accessing data,
thereby improving concurrency.



MPE/iX 5.5 Documentation