HP 3000 Manuals

Physical Data Design [ ALLBASE/SQL Performance and Monitoring Guidelines ] MPE/iX 5.0 Documentation
ALLBASE/SQL Performance and Monitoring Guidelines

Physical Data Design 

Physical data design means the arrangement of data inside tables, which
reside in DBEFiles, and the arrangement of DBEFiles in DBEFileSets.  This
section includes the following topics:

   *   Creating DBEFileSets.
   *   Creating DBEFiles.
   *   Creating Tables.
   *   Initial Table Loads.
   *   Unloading Data.
   *   Unloading and Reloading to Avoid Indirect Rows.
   *   Unloading and Reloading to Remove Overflow Pages.
   *   Tips on Deletions from Tables.

Additional information relating to physical design is found in the
section "Load Balancing" in the chapter "Guidelines on System
Administration." You may also find it useful to review the "Physical
Design" chapter in the ALLBASE/SQL Database Administration Guide.

Creating DBEFileSets 

DBEFileSets are logical groupings of DBEFiles.  DBEFileSets are also the
logical home of tables, which are always created in a particular
DBEFileSet.  The following guidelines offer suggestions on how to
distribute tables among DBEFileSets.

Avoiding the SYSTEM DBEFileSet for User Data.     

Although you can create user-defined tables in the SYSTEM DBEFileSet, it
is recommended that you create new DBEFileSets to keep user data separate
from system catalog data to facilitate maintenance and performance
tuning.  Here are the reasons why:

   *   System catalog data is accessed frequently.  If user data is kept
       in the SYSTEM DBEFileSet, then the pages used for system data
       become intermingled with the pages used for user data.  If
       ALLBASE/SQL needs to access a system page that physically resides
       near the end of the DBEFileSet, then it must first look through
       all the page table pages that come before it.  The average time to
       access a particular system page increases because of the extra
       data stored within the DBEFileSet.

   *   Sometimes ALLBASE/SQL uses pages in the SYSTEM DBEFileSet for
       temporary space.  If this space is not available, processing will
       terminate in an unsuccessful state.  Mixing user and system data
       makes it more likely that temporary space will not be available
       when needed.  See the section "Temporary Space in SYSTEM" in the
       chapter "Guidelines on System Administration."

   *   Grouping tables in separate DBEFileSets allows you to identify the
       space requirements for particular tables.  If you store all tables
       in the SYSTEM DBEFileSet and the DBEFileSet runs out of space, it
       will be difficult to tell how fast each table is growing.  It will
       also be harder to determine which table's update or insert
       operation caused the system to run out of space.

In particular, you should avoid creating hash structures in the SYSTEM
DBEFileSet.

Placing Large Tables in Separate DBEFileSets.     

You should place tables larger than 1000 pages in their own DBEFileSets.
When more than one table is stored in the DBEFileSet, the pages of
DBEFiles in the DBEFileSet become a mixture of pages from the different
tables.  In doing a sequential scan, if ALLBASE/SQL needs to access a
data page that physically resides near the end of the DBEFileSet, it must
first look through all the page table pages that come before it.  The
average time to access a data page thus increases because of the extra
data stored within the DBEFileSet.  The time increases severely when
several large tables are stored in the DBEFileSet.[REV BEG] To determine
the size of a DBEFileSet, run SQLMON and go to the Static DBEFile
screen.[REV END]

Maintenance functions such as UPDATE STATISTICS also will take longer if
many large tables are stored in a single DBEFileSet.  Therefore, create a
separate DBEFileSet for each table or group of tables that you want to
maintain separately.

Gathering Small Tables into DBEFileSets.   

For smaller tables, there may be a performance advantage in grouping them
together into DBEFiles of 1000 pages or less.  When each small table
occupies its own DBEFileSet, you can expect some of the following
performance problems:

   *   I/O might increase, for several reasons:

          *   Each page table page has little information on it.  More
              pages would have to be read in from disk because the data
              physically resides on more separate pages.

          *   Because each page table page has little information on it,
              it will not stay in the data buffer pool.  Pages with more
              information have more locality of reference and are more
              likely to remain in the buffer pool (thus reducing real
              I/O).

       If the data buffer pool were large enough, the actual increased
       I/O might be negligible.

   *   Each DBEFileSet requires at least one DBEFile, which is an actual 
       operating system file.  Opening a file to read from it is a slow
       operation.  Putting each table into its own DBEFileSet increases
       the number of files, and therefore increases the average time
       needed to access the data in these tables.

   *   Operating systems have limits on the number of files that can be
       open simultaneously.  To avoid exceeding the limit, successive
       close and open file operations may be required, which will degrade
       performance.  ALLBASE/SQL imposes its own limit of 50 files that
       may be open simultaneously.

   *   The operating system must search for the correct file descriptor,
       which may take longer with more files open. 

   *   Disk space usage would increase, since many virtually empty page
       table pages would exist.

When small tables are placed in a single DBEFileSet, disk space is used
efficiently and I/O is minimized.  For example, if you have fifty tables
that average 5 pages, they will all fit comfortably into a single DBEFile
and use only one page table page.

Creating DBEFiles 

DBEFiles are the physical repositories of ALLBASE/SQL data on the
operating system.  Since you have great flexibility in defining DBEFile
types, sizes, and assignment to DBEFileSets, your choices often affect
the overall performance of your system.  Here are some suggestions for
creating DBEFiles:

   *   Create DBEFiles in sizes that are multiples of 253 pages to
       minimize the space used by page table pages.  Each 
       253-page segment contains one page table page plus 252 pages of
       data following the page table page. 

   *   Create separate TABLE and INDEX DBEFiles, especially for large
       tables, so that they can be placed on separate disks to 
       improve I/O performance.

   *   Create a MIXED DBEFile in its own DBEFileSet for a small table
       (less than 1000 pages) with a single index.  (In this case, data
       and index rows will be on alternating pages, which will result in
       faster processing and better space utilization). 

   *   Use dynamic DBEFile expansion to avoid running out of DBEFile
       space at run time.  In the CREATE DBEFILE statement, specify the
       initial size, the maximum size, and the increment size.

   *   Use pseudo-mapped I/O to shorten the 
       I/O path length and eliminate operating system buffering of
       database pages.  This feature is appropriate only for large
       DBEFiles containing large tables accessed randomly.  Standard I/O
       is more appropriate for small DBEFiles or DBEFiles that are
       frequently accessed sequentially. 
       When appropriate, use pseudo-mapped I/O with memory-resident data
       buffers.  File system prefetching is not available when using
       pseudo-mapped I/O.

       Restrictions on pseudo-mapped files: 

          *   They cannot be dynamically expanded.

          *   You cannot use concurrent backup (that is, SQLUtil
              STOREONLINE does not work with pseudo-mapped files).

For additional information on pseudo-mapped files, refer to the appendix
"Using Pseudo-Mapped DBEFiles" in the the ALLBASE/SQL Database 
Administration Guide.

Avoiding Extra DBEFile Space.     

When sequential scans are made frequently for queries involving tables in
a particular DBEFileSet, extra space should be minimized.  When
sequential scans are made, every page table page in the DBEFileSet is
read to determine if pages might qualify for the query. 
When a large amount of extra space exists, then extra I/O is required to
fetch the page table pages for empty DBEFile space.  One way to minimize
the extra space in your DBEFiles is by specifying dynamic space expansion
in the CREATE DBEFILE statement.

Creating Tables 

Here are some general suggestions on table creation:

   *   Create PUBLIC and PUBLICROW tables for maximum concurrency.
       PUBLICREAD can improve performance by reducing internal page
       locking.  Use PRIVATE for special purposes.  Note that PRIVATE is
       the default; usually, you want to specify something else.

   *   Choose data types which suit the programming language so as 
       to avoid data conversions. 

   *   Native language (NLS) users should 
       be aware that indexes and predicates that use USASCII
       (NATIVE-3000) columns perform faster than indexes and predicates
       that use native language columns.  If there is a choice, use
       USASCII for index columns in your tables, create indexes on
       USASCII columns, and use USASCII columns in your predicates.

   *   Integer values can be 5 - 15% more efficient than packed decimal
       values. 

   *   When adding a column with ALTER TABLE and adding a default value,
       every row in the table is immediately updated to add the new
       value.  This causes all rows to become indirect rows.  It may be
       better to unload data, drop and recreate the table with the new
       column and default, then reload the data. 

   *   If a column is to be used consistently in the WHERE clause, it
       should have an index.

   *   Wherever possible, load data for non-hash tables in sorted order.
       This is essential for tables with clustering indexes.  If a
       multi-column index exists, the order of its columns should be the
       same as the order of the columns in the load file.  Hash tables
       should not be loaded in sorted order.

   *   Consider combining tables that are constantly joined together.  It
       is a good idea to revisit normalization issues when actually
       creating tables.

   *   For large tables, place data and indexes on separate disks using
       the SQLUtil MOVEFILE command.  In general, spread the load of
       database and log files over your disks, and keep data and log
       files on separate devices.  Refer to the section "Load Balancing"
       in the "Guidelines on System Administration" chapter for more
       information.

Avoiding NULL and Variable Length Data.     

   *   Avoid NULLs and variable length data types (VARCHAR and VARBINARY)
       for the following reasons:

          *   Nulls always require additional CPU to check for null
              values.  Using NOT NULL can save as much as 5% in CPU
              because of the overhead of checking for nulls. 

          *   Use of NULL and VARCHAR or VARBINARY may cause wasted space
              due to the inability to use a shared tuple header.  (A
              separate header must be stored for each tuple that has a
              different size.)  However, the use of nulls or variable
              length columns may actually save space if there are only a
              few such rows per page.  If the difference between the
              maximum size of the row and average size of data stored in
              the row is greater than the size of the header, then use of
              NULL or variable size columns may be more efficient.

          *   Updating a NULL, VARCHAR, or VARBINARY column may cause the
              data to move to a different page, leaving an indirect tuple
              behind and increasing the number of I/Os required to access
              the data tuple.

          *   You can create a table with default values instead of nulls
              so that when columns are updated they will not have to move
              to another page because of tuple size increase.

   *   Use an appropriate (not an unreasonably long) length for variable
       length columns.  The maximum length can affect the performance of
       BULK SELECT and cursor operations. 

   *   Indexes whose keys will be updated should never include NULL
       columns as keys, and rarely should they include VARCHAR or
       VARBINARY columns.  Variable length keys can cause long index
       chains and rearranging when index values are updated.

Using INTEGER Rather than SMALLINT Data.     

Since SQLCore only performs 4-byte arithmetic, all SMALLINT values are
automatically converted to INTEGERs before processing by SQLCore.  These
conversions do not affect the optimizer's choice of a scan plan, but they
may be costly in CPU for very large databases.  To avoid the conversion,
consider defining your SMALLINT columns as INTEGER columns.  The
drawback, of course, is that INTEGERs require twice as much storage space
as SMALLINTs.

Initial Table Loads 
[REV BEG]

For optimal performance, use the following ISQL commands before starting
the load operation:

   *   SET LOAD_BUFFER

       Use this statement to enlarge the load buffer beyond the default
       size of 16,384 bytes.

   *   SET AUTOLOCK ON

       Avoid lock contention by locking the table in exclusive mode when
       the load is performed.  For PUBLIC and PUBLICROW tables, locking
       the table in exclusive mode avoids the overhead of obtaining locks
       as pages of data are added to the table.  If you are loading a
       table that has an index defined on it, locking the table in
       exclusive mode also provides optimal logging.

   *   SET AUTOCOMMIT ON and SET AUTOSAVE

       Together these commands cause the number of rows specified with
       AUTOSAVE to be automatically committed when the load buffer is
       full.  Should the load operation subsequently fail, you can insert
       the remaining rows with the LOAD PARTIAL command.

   *   SET SESSION DML ATOMICITY AT ROW LEVEL

       Setting the DML atomicity to row level guarantees that savepoints
       will not be generated during a load, reducing logging overhead
       when running in non-archive mode.

   *   SET SESSION UNIQUE, REFERENTIAL, CHECK CONSTRAINTS DEFERRED

       By deferring constraints, you avoid problems caused by the order
       in which dependent values are inserted into a table when foreign
       or primary key constraints exist.  Constraint checking is deferred
       until the end of the load operation.[REV END]

       You may wish to load a table first, then add constraints to it
       using the ALTER TABLE statement.  When the constraint is added,
       the data in the table is checked for consistency, and if
       inconsistent data if found, an error is generated.  In this case,
       it is your responsibility to remove the inconsistent tuples before
       attempting to add the constraint.

       Deferring constraints and setting atomicity to the row level
       together have the effect of reducing logging overhead greatly in
       non-archive mode because of the use of no-log pages.  However,
       logging is still performed for the allocation of new pages, and
       the non-archive log file must be large enough to include a record
       for each page.

Additional tips for improving load performance:

   *   For subsequent unload and load operations, use the INTERNAL option
       of the ISQL UNLOAD and LOAD commands.  LOAD INTERNAL is 3-5 times
       faster than LOAD EXTERNAL.

   *   Using archive logging during loading requires additional log file
       space, but it actually requires fewer I/Os than nonarchive
       logging.  However, the saving of I/Os must be weighed against the
       time required to back up log files so they can be reused.

   *   Set the number of log buffers to at least 120 while loading.

   *   Set data buffer pages to 200 to reduce the size of the linked list
       and the expense of maintaining it.

   *   Add indexes after data is loaded.  For clustering indexes, sort
       the data before the load.

   *   If you wish to use rules tied to INSERT operations, create the
       rules after loading the table.  Otherwise, the operation of the
       rule will degrade the performance of the load.  Note, however,
       that the operation of the rule is not retroactive, so you must
       ensure the consistency of your data at the time the rule is
       created.

Unloading Data 

UNLOAD operations are subject to the following performance
considerations:

   *   To recluster data to improve the cluster count, unload the table
       with an ORDER BY clause, delete all rows from the table, then
       reload the data in the new sorted order.  This process is
       described more thoroughly earlier in this chapter under
       "Reclustering a Table." Also, see the next item.

   *   For large tables, you can UNLOAD INTERNAL to tape.  The tape must
       be labeled.  It is important to use a blocking factor that is as
       close to 16K bytes as possible, since ISQL uses 16K byte chunks.
       If your block size is less than 16K, the unload and load will take
       longer and use more tape.

   *   For UNLOAD operations on large amounts of data that you intend to
       sort, consider unloading the data first and then sorting with an
       external utility rather than using the ORDER BY as part of the
       UNLOAD. In this case, you need to use UNLOAD EXTERNAL.

Unloading and Reloading to Remove Indirect Rows 

The existence of indirect rows increases the amount of I/O that must be
performed to obtain data.  Indirect rows also consume more disk space,
since they frequently are not able to use the shared tuple header.
Therefore, indirect rows should be avoided.  Use the following procedure
to remove indirect rows from a table:

   1.  Unload the data from the table using the INTERNAL option.
   2.  Drop any referential constraints on other tables that refer to the
       unloaded table.
   3.  Drop the table.  This will also drop any B-tree indexes that have
       been defined on the table.
   4.  Issue the CREATE TABLE statement to recreate the table.  Do not
       include constraints.
   5.  Issue the COMMIT WORK statement to reclaim space.
   6.  Load the data back into the table using the INTERNAL option.
   7.  Use the CREATE INDEX statement to recreate any B-tree indexes that
       may have been dropped, and use the ALTER TABLE statement to add
       back constraints (including referential constraints) that may have
       been dropped.
   8.  Do an UPDATE STATISTICS for the table to update system catalog
       information about the table and its indexes and constraints.
   9.  Issue the COMMIT WORK statement.

Unloading and Reloading to Remove Overflow Pages 

Since many key values can hash to the same page address, it is possible
for a page in a hash structure to become full.  When this happens, a new
row must be inserted on an overflow page.  Overflow pages increase the
amount of I/O that must be performed to obtain table data.  The larger
the number of overflow pages, the slower the average access to any tuple.

Observe the AVGLEN column in SYSTEM.HASH[REV BEG] or the AVGOVERFLOW
field of the SQLMON Static Hash screen[REV END] to see the average number
of page accesses required to retrieve a particular row.  As the number of
overflow pages increases, so will this number.  Increasing the capacity
of the table should reduce AVGLEN, and thus improve performance.  Use the
following procedure:

   1.  Unload the data from the table using the INTERNAL option.
   2.  Drop any referential constraints on other tables that refer to the
       unloaded table.
   3.  Drop the table.  This will also drop any B-tree indexes that have
       been defined on the table.
   4.  Issue a COMMIT WORK statement.  This makes the space occupied by
       the table and its related indexes and constraints available for
       reuse.
   5.  If necessary, create additional TABLE or MIXED DBEFiles and add
       them to the DBEFileSet that will contain the redefined table.
   6.  Issue the CREATE TABLE statement, specifying a larger number of
       primary pages than when you previously created the table.  The
       CREATE TABLE statement should not include any constraints.
   7.  Load the data back into the table using the INTERNAL option.
   8.  Use the CREATE INDEX statement to recreate any B-tree indexes that
       may have been dropped, and use the ALTER TABLE statement to add
       back constraints (including referential constraints) that may have
       been dropped.
   9.  Issue an UPDATE STATISTICS statement for the table to update
       system catalog information about the table and its indexes and
       constraints.
  10.  Issue the COMMIT WORK statement.

Tips on Deletions from Tables 

       [REV BEG]

   *   When deleting all rows from a table, it is more efficient to use
       the TRUNCATE TABLE statement instead of the DELETE statement.  On
       a large table, the DELETE statement may cause the log to run out
       of space, whereas the TRUNCATE TABLE statement incurs minimal
       logging.

   *   The performance of the TRUNCATE TABLE statement degrades when the
       table is hashed.  Dropping the table and recreating it may be
       faster than using the TRUNCATE TABLE statement.

   *   If the table specified by the TRUNCATE TABLE statement is included
       in a referential constraint, it may be more efficient to drop the
       constraint, issue the TRUNCATE TABLE statement, and readd the
       constraint.[REV END]
MPE/iX 5.0 Documentation