HP 3000 Manuals

Physical Data Design [ ALLBASE/SQL Performance Guidelines ] MPE/iX 5.0 Documentation
ALLBASE/SQL Performance Guidelines

Physical Data Design 

Physical data design means the arrangement of data inside tables, which
reside in DBEFiles, and the arrangement of DBEFiles in DBEFileSets.  This
section includes the following topics:

   *   Creating DBEFileSets.
   *   Creating DBEFiles.
   *   Creating Tables.
   *   Initial Table Loads.
   *   Unloading Data.
   *   Unloading and Reloading to Avoid Indirect Rows.
   *   Unloading and Reloading to Remove Overflow Pages.
   *   Tips on Deletions from Tables.

Additional information relating to physical design is found in the
section "Load Balancing" in the chapter "Guidelines on System
Administration." You may also find it useful to review the "Physical
Design" chapter in the ALLBASE/SQL Database Administration Guide.

Creating DBEFileSets 

DBEFileSets are logical groupings of DBEFiles.  DBEFileSets are also the
logical home of tables, which are always created in a particular
DBEFileSet.  The following guidelines offer suggestions on how to
distribute tables among DBEFileSets.

Avoiding the SYSTEM DBEFileSet for User Data.   

Although you can create user-defined tables in the SYSTEM DBEFileSet, it
is recommended that you create new DBEFileSets to keep user data separate
from system catalog data to facilitate maintenance and performance
tuning.  Here are the reasons why:

   *   System catalog data is accessed frequently.  If user data is kept
       in the SYSTEM DBEFileSet, then the pages used for system data
       become intermingled with the pages used for user data.  If
       ALLBASE/SQL needs to access a system page that physically resides
       near the end of the DBEFileSet, then it must first look through
       all the page table pages that come before it.  The average time to
       access a particular system page increases because of the extra
       data stored within the DBEFileSet.
   *   Sometimes ALLBASE/SQL uses pages in the SYSTEM DBEFileSet for
       temporary space.  If this space is not available, processing will
       terminate in an unsuccessful state.  Mixing user and system data
       makes it more likely that temporary space will not be available
       when needed.  See the section "Temporary Space in SYSTEM" in the
       chapter "Guidelines on System Administration."
   *   Grouping tables in separate DBEFileSets allows you to identify the
       space requirements for particular tables.  If you store all tables
       in the SYSTEM DBEFileSet and the DBEFileSet runs out of space, it
       will be difficult to tell how fast each table is growing.  It will
       also be harder to determine which table's update or insert
       operation caused the system to run out of space.

In particular, you should avoid creating hash structures in the SYSTEM
DBEFileSet.

Placing Large Tables in Separate DBEFileSets.   

You should place tables larger than 1000 pages in their own DBEFileSets.
When more than one table is stored in the DBEFileSet, the pages of
DBEFiles in the DBEFileSet become a mixture of pages from the different
tables.  In doing a sequential scan, if ALLBASE/SQL needs to access a
data page that physically resides near the end of the DBEFileSet, it must
first look through all the page table pages that come before it.  The
average time to access a data page thus increases because of the extra
data stored within the DBEFileSet.  The time increases severely when
several large tables are stored in the DBEFileSet.

Maintenance functions such as UPDATE STATISTICS also will take longer if
many large tables are stored in a single DBEFileSet.  Therefore, create a
separate DBEFileSet for each table or group of tables that you want to
maintain separately.

Gathering Small Tables into DBEFileSets.   

For smaller tables, there may be a performance advantage in grouping them
together into DBEFiles of 1000 pages or less.  When each small table
occupies its own DBEFileSet, you can expect some of the following
performance problems:

   *   I/O might increase, for several reasons:
          *   Each page table page has little information on it.  More
              pages would have to be read in from disk because the data
              physically resides on more separate pages.
          *   Because each page table page has little information on it,
              it will not stay in the data buffer pool.  Pages with more
              information have more locality of reference and are more
              likely to remain in the buffer pool (thus reducing real
              I/O).
       If the data buffer pool were large enough, the actual increased
       I/O might be negligible.
   *   Each DBEFileSet requires at least one DBEFile, which is an actual
       operating system file.  Opening a file to read from it is a slow
       operation.  Putting each table into its own DBEFileSet increases
       the number of files, and therefore increases the average time
       needed to access the data in these tables.
   *   Operating systems have limits on the number of files that can be
       open simultaneously.  To avoid exceeding the limit, successive
       close and open file operations may be required, which will degrade
       performance.  ALLBASE/SQL imposes its own limit of 50 files that
       may be open simultaneously.
   *   The operating system must search for the correct file descriptor,
       which may take longer with more files open.
   *   Disk space usage would increase, since many virtually empty page
       table pages would exist.

When small tables are placed in a single DBEFileSet, disk space is used
efficiently and I/O is minimized.  For example, if you have fifty tables
that average 5 pages, they will all fit comfortably into a single DBEFile
and use only one page table page.

Creating DBEFiles 

DBEFiles are the physical repositories of ALLBASE/SQL data on the
operating system.  Since you have great flexibility in defining DBEFile
types, sizes, and assignment to DBEFileSets, your choices often affect
the overall performance of your system.  Here are some suggestions for
creating DBEFiles:

   *   Create DBEFiles in sizes that are multiples of 253 pages to
       minimize the space used by page table pages.  Each 253-page
       segment contains one page table page plus 252 pages of data
       following the page table page.
   *   Create separate TABLE and INDEX DBEFiles, especially for large
       tables, so that they can be placed on separate disks to improve
       I/O performance.
   *   Create a MIXED DBEFile in its own DBEFileSet for a small table
       (less than 1000 pages) with a single index.  (In this case, data
       and index rows will be on alternating pages, which will result in
       faster processing and better space utilization).
   *   Use dynamic DBEFile expansion to avoid running out of DBEFile
       space at run time.  In the CREATE DBEFILE statement, specify the
       initial size, the maximum size, and the increment size.
   *   Use pseudo-mapped I/O on MPE/iX or raw files on HP-UX to shorten
       the I/O path length and eliminate operating system buffering of
       database pages.  This feature is appropriate only for large
       DBEFiles containing large tables accessed randomly.  Standard I/O
       is more appropriate for small DBEFiles or DBEFiles that are
       frequently accessed sequentially.  When appropriate on MPE/iX, use
       pseudo-mapped I/O with memory-resident data buffers.  File system
       prefetching is not available on MPE/iX when using pseudo-mapped
       I/O.
       Restrictions on pseudo-mapped files in MPE/iX:
          *   They cannot be dynamically expanded.
          *   You cannot use concurrent backup (that is, SQLUtil
              STOREONLINE does not work with pseudo-mapped files).

For additional information on pseudo-mapped files in MPE/iX, refer to the
appendix "Using Pseudo-Mapped DBEFiles" in the MPE/iX version of the
ALLBASE/SQL Database Administration Guide.  For information on raw files
in HP-UX, refer to the appendix "Using HP-UX Raw Files for DBEFiles and
Logs" in the HP-UX version of the ALLBASE/SQL Database Administration 
Guide.

Avoiding Extra DBEFile Space.   

When sequential scans are made frequently for queries involving tables in
a particular DBEFileSet, extra space should be minimized.  When
sequential scans are made, every page table page in the DBEFileSet is
read to determine if pages might qualify for the query.  When a large
amount of extra space exists, then extra I/O is required to fetch the
page table pages for empty DBEFile space.  One way to minimize the extra
space in your DBEFiles is by specifying dynamic space expansion in the
CREATE DBEFILE statement.

Creating Tables 

Here are some general suggestions on table creation:

   *   Create PUBLIC and PUBLICROW tables for maximum concurrency.
       PUBLICREAD can improve performance by reducing internal page
       locking.  Use PRIVATE for special purposes.  Note that PRIVATE is
       the default; usually, you want to specify something else.
   *   Choose data types which suit the programming language so as to
       avoid data conversions. 
   *   Native language (NLS) users should be aware that indexes and
       predicates that use USASCII (n-computer or C on HP-UX; NATIVE-3000
       on MPE/iX) columns perform faster than indexes and predicates that
       use native language columns.  If there is a choice, use USASCII
       for index columns in your tables, create indexes on USASCII
       columns, and use USASCII columns in your predicates.
   *   Integer values can be 5 - 15% more efficient than packed decimal
       values.
   *   When adding a column with ALTER TABLE and adding a default value,
       every row in the table is immediately updated to add the new
       value.  This causes all rows to become indirect rows.  It may be
       better to unload data, drop and recreate the table with the new
       column and default, then reload the data.
   *   If a column is to be used consistently in the WHERE clause, it
       should have an index.
   *   Wherever possible, load data for non-hash tables in sorted order.
       This is essential for tables with clustering indexes.  If a
       multi-column index exists, the order of its columns should be the
       same as the order of the columns in the load file.  Hash tables
       should not be loaded in sorted order.
   *   Consider combining tables that are constantly joined together.  It
       is a good idea to revisit normalization issues when actually
       creating tables.
   *   For large tables, place data and indexes on separate disks using
       the SQLUtil MOVEFILE command.  In general, spread the load of
       database and log files over your disks, and keep data and log
       files on separate devices.  Refer to the section "Load Balancing"
       in the "Guidelines on System Administration" chapter for more
       information.

Avoiding NULL and Variable Length Data.   
   *   Avoid NULLs and variable length data types (VARCHAR and VARBINARY)
       for the following reasons:
          *   Nulls always require additional CPU to check for null
              values.  Using NOT NULL can save as much as 5% in CPU
              because of the overhead of checking for nulls. 
          *   Use of NULL and VARCHAR or VARBINARY may cause wasted space
              due to the inability to use a shared tuple header.  (A
              separate header must be stored for each tuple that has a
              different size.)  However, the use of nulls or variable
              length columns may actually save space if there are only a
              few such rows per page.  If the difference between the
              maximum size of the row and average size of data stored in
              the row is greater than the size of the header, then use of
              NULL or variable size columns may be more efficient.
          *   Updating a NULL, VARCHAR, or VARBINARY column may cause the
              data to move to a different page, leaving an indirect tuple
              behind and increasing the number of I/Os required to access
              the data tuple.
          *   You can create a table with default values instead of nulls
              so that when columns are updated they will not have to move
              to another page because of tuple size increase.
   *   If VARCHARs or NULL columns are necessary to save space, create a
       table with VARCHARs or NULLs as the trailing columns in the table
       schema.  This is because when row lengths change, the amount of
       shifting required at the storage level is minimized.
   *   Use an appropriate (not an unreasonably long) length for variable
       length columns.  The maximum length can affect the performance of
       BULK SELECT and cursor operations.
   *   Indexes whose keys will be updated should never include NULL
       columns as keys, and rarely should they include VARCHAR or
       VARBINARY columns.  Variable length keys can cause long index
       chains and rearranging when index values are updated.

Using INTEGER Rather than SMALLINT Data.     

Since SQLCore only performs 4-byte arithmetic, all SMALLINT values are
automatically converted to INTEGERs before processing by SQLCore.  These
conversions do not affect the optimizer's choice of a scan plan, but they
may be costly in CPU for very large databases.  To avoid the conversion,
consider defining your SMALLINT columns as INTEGER columns.  The
drawback, of course, is that INTEGERs require twice as much storage space
as SMALLINTs.

Initial Table Loads 

In order to achieve the highest possible performance in loading data into
your tables for the first time, create a utility program that uses the
BULK INSERT command to add rows of data to the table.  For very large
tables, such a program can improve performance over ISQL's LOAD command
by 300% to 500%.  C would not be the best language choice for a load
program if you are using BULK INSERT with character data, because of
differences in character handling between C and ALLBASE/SQL.

For subsequent UNLOAD and LOAD operations, use the INTERNAL option of the
ISQL UNLOAD and LOAD commands.  Wherever it is appropriate, create
indexes after loading.

Be sure to turn off archive logging if you need to conserve log file
space, and use BULK INSERT statements.  Additional tips:

   *   Using archive logging during loading requires additional log file
       space, but it actually requires fewer I/Os than nonarchive
       logging.  However, the saving of I/Os must be weighed against the
       time required to back up log files so they can be reused.
   *   By setting constraints DEFERRED, you avoid concern over the order
       in which dependent values are inserted into a table when foreign
       or primary key constraints exist.  At COMMIT WORK time,
       constraints are applied to the loaded tables.
   *   Setting the DML atomicity to row level guarantees that savepoints
       will not be generated during a load, which can improve performance
       while non-archive logging is in effect.  Be sure to turn
       AUTOCOMMIT OFF in this case, since the duration of row level
       atomicity and deferral of constraints is the current transaction
       only; at automatic COMMIT WORK, levels return to STATEMENT LEVEL
       atomicity and IMMEDIATE constraint application.
   *   For optimal performance, use the following statements before
       starting the LOAD operation:

            isql=> SET AUTOCOMMIT OFF; 
            isql=> SET UNIQUE, REFERENTIAL, CHECK CONSTRAINTS DEFERRED; 
            isql=> SET DML ATOMICITY AT ROW LEVEL; 
            isql=> LOCK TABLE TableName IN EXCLUSIVE MODE; 
       Note the following:
          *   If you do not turn AUTOCOMMIT off, the exclusive lock will
              be released at the time of the first automatic COMMIT WORK,
              and the atomicity will return to the statement level.
          *   Turning AUTOCOMMIT off, deferring constraints, and setting
              atomicity to the row level together have the effect of
              reducing logging overhead greatly in non-archive mode
              because of the use of no-log pages.  However, logging is
              still performed for the allocation of new pages, and the
              non-archive log file must be large enough to include a
              record for each page.
          *   For PUBLIC and PUBLICROW tables, locking the table in
              exclusive mode avoids the overhead of obtaining locks as
              pages of data are added to the table.  If you are loading a
              table that has an index defined on it, locking the table in
              exclusive mode also provides optimal logging.
   *   You may wish to load a table first, then add constraints to it
       using the ALTER TABLE statement.  When the constraint is added,
       the data in the table is checked for consistency, and if
       inconsistent data if found, an error is generated.  In this case,
       it is your responsibility to remove the inconsistent tuples before
       attempting to add the constraint.
   *   If you wish to use rules tied to INSERT operations, create the
       rules after loading the table.  Otherwise, the operation of the
       rule will degrade the performance of the LOAD. Note, however, that
       the operation of the rule is not retroactive, so you must ensure
       the consistency of your data at the time the rule is created.
   *   Set the number of log buffers to at least 120 while loading.
   *   Set data buffer pages to 200 to reduce the size of the linked list
       and the expense of maintaining it.
   *   Load in single user mode to avoid lock contention.
   *   Add indexes after data is loaded.  For clustering indexes, sort
       the data before the load.
   *   LOAD INTERNAL is 3-5 times faster than LOAD EXTERNAL.

Unloading Data 

UNLOAD operations are subject to the following performance
considerations:

   *   To recluster data to improve the cluster count, unload the table
       with an ORDER BY clause, delete all rows from the table, then
       reload the data in the new sorted order.  This process is
       described more thoroughly earlier in this chapter under
       "Reclustering a Table." Also, see the next item.
   *   For large tables, you can UNLOAD INTERNAL to tape.  The tape must
       be labeled.  It is important to use a blocking factor that is as
       close to 16K bytes as possible, since ISQL uses 16K byte chunks.
       If your block size is less than 16K, the unload and load will take
       longer and use more tape.
   *   For UNLOAD operations on large amounts of data that you intend to
       sort, consider unloading the data first and then sorting with an
       external utility rather than using the ORDER BY as part of the
       UNLOAD. In this case, you need to use UNLOAD EXTERNAL.

Unloading and Reloading to Remove Indirect Rows 

The existence of indirect rows increases the amount of I/O that must be
performed to obtain data.  Indirect rows also consume more disk space,
since they frequently are not able to use the shared tuple header.
Therefore, indirect rows should be avoided.  Use the following procedure
to remove indirect rows from a table:

   1.  Unload the data from the table using the INTERNAL option.
   2.  Drop any referential constraints on other tables that refer to the
       unloaded table.
   3.  Drop the table.  This will also drop any B-tree indexes that have
       been defined on the table.
   4.  Issue the CREATE TABLE statement to recreate the table.  Do not
       include constraints.
   5.  Issue the COMMIT WORK statement to reclaim space.
   6.  Load the data back into the table using the INTERNAL option.
   7.  Use the CREATE INDEX statement to recreate any B-tree indexes that
       may have been dropped, and use the ALTER TABLE statement to add
       back constraints (including referential constraints) that may have
       been dropped.
   8.  Do an UPDATE STATISTICS for the table to update system catalog
       information about the table and its indexes and constraints.
   9.  Issue the COMMIT WORK statement.

Unloading and Reloading to Remove Overflow Pages 

Since many key values can hash to the same page address, it is possible
for a page in a hash structure to become full.  When this happens, a new
row must be inserted on an overflow page.  Overflow pages increase the
amount of I/O that must be performed to obtain table data.  The larger
the number of overflow pages, the slower the average access to any tuple.

Observe the AVGLEN column in SYSTEM.HASH to see the average number of
page accesses required to retrieve a particular row.  As the number of
overflow pages increases, so will this number.  Increasing the capacity
of the table should reduce AVGLEN, and thus improve performance.  Use the
following procedure:

   1.  Unload the data from the table using the INTERNAL option.
   2.  Drop any referential constraints on other tables that refer to the
       unloaded table.
   3.  Drop the table.  This will also drop any B-tree indexes that have
       been defined on the table.
   4.  Issue a COMMIT WORK statement.  This makes the space occupied by
       the table and its related indexes and constraints available for
       reuse.
   5.  If necessary, create additional TABLE or MIXED DBEFiles and add
       them to the DBEFileSet that will contain the redefined table.
   6.  Issue the CREATE TABLE statement, specifying a larger number of
       primary pages than when you previously created the table.  The
       CREATE TABLE statement should not include any constraints.
   7.  Load the data back into the table using the INTERNAL option.
   8.  Use the CREATE INDEX statement to recreate any B-tree indexes that
       may have been dropped, and use the ALTER TABLE statement to add
       back constraints (including referential constraints) that may have
       been dropped.
   9.  Issue an UPDATE STATISTICS statement for the table to update
       system catalog information about the table and its indexes and
       constraints.
  10.  Issue the COMMIT WORK statement.

Tips on Deletions from Tables 

   *   When deleting all rows from a table, it is more efficient to drop
       and later recreate the table than to use the DELETE statement
       without a predicate.  The DELETE statement on a large table may
       also cause the log to run out of space.
   *   When deleting the majority of rows from a large table, it may be
       more efficient to unload the rows to be retained, drop the table,
       recreate it, and reload the data, rather than selecting and
       deleting the larger number of rows.
MPE/iX 5.0 Documentation