HPlogo ALLBASE/SQL Database Administration Guide: HP 3000 MPE/iX Computer Systems > Chapter 2 Logical Design

Arranging Data in Tables

» 

Technical documentation

Complete book in PDF
» Feedback

 » Table of Contents

 » Index

The basic data structure in an ALLBASE/SQL database is a table. Data is stored within a table as rows divided into columns. Indexes and views are created using the columns of a table.

To design the tables in your database from normalized data, you must:

  • define tables from the normalized data according to the uses of the data in your applications

  • define the columns in the tables

Defining Tables

To design tables you need to know how the applications use data in SELECT, DELETE, INSERT, and UPDATE operations. Data that is deleted, inserted, or updated at the same time should be put in the same table. Use these criteria to start grouping your data into tables.

To determine the composition of your tables, keep the following guidelines in mind:

  • The maximum number of tables you can define in a DBEnvironment is 231 -1.

  • The maximum number of columns in a table is 1023.

  • The maximum length of a column is 3996 bytes.

  • The maximum length of a row in bytes is (NC+1) * 2 + SCL <= 4000 where NC is the number of columns in the table and SCL is the sum of the column lengths.

  • If you add a column to a table using the ALTER TABLE statement, the added column is placed on the right-hand side of the table.

The following paragraphs are a few more guidelines to help you determine table design.

A row is the smallest unit you can delete or insert at a time. This means two columns should be placed in different tables if they can be inserted or deleted at different times. Assume a part name is never inserted or deleted unless a corresponding part number is inserted or deleted. Part name and part number should be in the same table. On the other hand, a vendor name can be inserted or deleted independent of, say, the order number. Therefore, the vendor name and order number should be placed in separate tables.

Interactively you can only use the UPDATE statement on one table at a time. This means if you use a single statement to update one column based on the value of another column, both columns must be in the same table. The unit price of a part is updated as it pertains to a particular part number column; therefore, the unit price and part number columns should be in the same table.

Planning Joins

A join is a query that selects columns from at least two tables. If many applications request information that can logically reside in two tables, you may want to place the information in a single table to improve performance. A table with a large number of columns can impair performance as will several smaller tables that are joined frequently.

The WHERE clause of the SELECT statement is used to specify the condition(s) under which rows are joined. ALLBASE/SQL allows joins on compatible data types, but for maximum efficiency, joins should be performed on identical columns.

NOTE: An application consistently containing queries that join more than six tables indicates that the tables are over-normalized and the database design should be re-examined.

Tables can be joined as Cartesian products where each row of one table is joined with every row in another table. However, in order for tables to be meaningfully joined, they must share a common column. If two tables do not share a common column, then a third table containing common columns for both tables must be introduced into the join, as shown in Figure 2-6 “Common Columns for Joins”.

Figure 2-6 Common Columns for Joins

[Common Columns for Joins]

Several of the sample database tables must be joined to provide the required data to the users and applications. However, the normalized data does not allow some tables to be joined directly. Refer to Figure 2-3. The OrderItems table cannot be meaningfully joined with the Vendors table because there is no common column in the tables. To facilitate retrieval for those applications that use the Vendors and OrderItems tables, the VendorNumber column was added to the Orders table, as shown in Table 2-1. This allows a join to be made between the Vendors and OrderItems tables using the Orders table.

Adding the VendorNumber column, of course, violates the third normal form. However, normalization is just a tool that helps you design efficient tables. It is up to the DBA to alter the table design to meet the needs of the users.

For additional information on joins, including Cartesian products, refer to the "SQL Queries" chapter of the ALLBASE/SQL Reference Manual.

Final Form of Sample Database Tables

The normalized data from the purchase order form produced the Vendors, Orders, OrderItems, and SupplyPrice tables. Integrated Peripherals, Inc. also designed a Parts table and an Inventory table to keep track of the internal parts. As a result, a total of six tables were designed from the defining and normalization phase. Columns are added to some of the tables to fit the user's needs. A VendorRemarks column is added to Vendors to keep comments for that vendor. A ReceivedQty column is added to the OrderItems table to denote when a shipment arrives. The resulting six tables are shown in Table 2-1.

Table 2-1 Sample Database Tables

TABLE COLUMNS
Parts PartNumber, PartName, SalesPrice
SupplyPrice PartNumber, VendorNumber, VendPartNumber, UnitPrice, DeliveryDays, Quantity
Vendors VendorNumber, VendorName, ContactName, PhoneNumber, VendorStreet, VendorCity, VendorState, VendorZipCode, VendorRemarks
Orders OrderNumber, VendorNumber, OrderDate
OrderItems OrderNumber, ItemNumber, VendPartNumber, PurchasePrice, OrderQty, ItemDueDate, ReceivedQty
Inventory PartNumber, BinNumber, QtyOnHand, LastCountDate, CountCycle, AdjustmentQty, ReorderQty, ReorderPoint

 

Defining Columns

Each attribute of each entity is defined in ALLBASE/SQL as a column in a table. The "Data Types" chapter in the ALLBASE/SQL Reference Manual contains basic information on column names and data types. Use the information presented there and the following guidelines to define columns. A column is defined by specifying:

  • Name

  • Data type

  • Size

  • Whether or not it can contain null values

Defining Column Names

Be precise in your names and descriptions so no user can misunderstand a data element or its meaning. Each column name can be as long as 20 characters. Try to choose meaningful, unambiguous names. For example, a name of "Qty" in the OrderItems table could refer to order quantity or received quantity while "OrderQty" can refer only to order quantity. Long names, however, may become unwieldy in a report. In the sample database, the column and table names have been made as unambiguous as possible.

Defining Column Data Type

A data type tells ALLBASE/SQL what type of data can be stored in the column and what can be done with the column. Thus, a column with an INTEGER data type can only have integers. A column with a numeric data type (FLOAT, DECIMAL, INTEGER, or SMALLINT) can be used in arithmetic operations. A column with an alphanumeric data type (CHAR or VARCHAR) can appear in a string operation such as a comparison using the LIKE predicate.

Some guidelines for data types are:

  • Match the data type to the data, for example:

    • Columns used for quantity or count should be defined as INTEGER or SMALLINT.

    • Columns containing real numbers or engineering data should be defined as REAL or FLOAT.

    • Columns used for money should be defined as DECIMAL so there is no loss of precision when rounding.

    • Columns containing alphanumeric characters should be defined as CHAR or VARCHAR. VARCHAR is used for particularly large alphanumeric columns like descriptions or comments. The VARCHAR data type is recommended when there are a few potentially large character strings, but most of the time the entire column will not be filled. The VARCHAR data type stores character data more efficiently because the column is not padded with blanks. The VendorRemarks column in the Vendors table is defined as VARCHAR.

    • Columns containing binary data or data of an unspecified kind should be defined as BINARY or VARBINARY. The VARBINARY data type is recommended when there are a few potentially large binary strings, but most of the time the entire column will not be filled. The VARBINARY data type stores binary data more efficiently because the column is not padded with zeroes.

    • For very large column values (greater than 3996 bytes), use the LONG BINARY or LONG VARBINARY data type.

    • Columns used for dates should be defined as DATE.

    • Columns used for times should be defined as TIME.

    • Columns used for timestamps should be defined as DATETIME.

    • Columns used for intervals should be defined as INTERVAL.

  • Avoid data conversions by using the same data type for columns that are frequently compared or used in a WHERE clause. For example, if you compare two columns frequently, do not define one as INTEGER and the other as FLOAT. ALLBASE/SQL will have to convert one number to the data type of the other which could result in performance degradation and possible loss of precision.

  • CHAR and VARCHAR columns are defined with a length in bytes. Therefore, if your log-on language is a 16-bit language such as Chinese, the number of two-byte characters you can store will be equal to half the column width.

Columns of type VARCHAR or VARBINARY may cause an extra tuple header to be stored. A tuple header is a description of the rows on a DBEFile page. If all rows on the page are the same, the header can be shared. A VARCHAR or VARBINARY column may be a different length in each row thus requiring each row to have its own tuple header. The calculations in chapter 3, "Physical Design", assume that each row has its own tuple header. Refer to the ALLBASE/SQL Reference Manual chapter on "Data Types" for further explanation.

The use of VARCHAR or VARBINARY data types can result in page shifting, that is, movement of data from one page to another when the size of the data changes. This can be a drain on performance.

Defining Column Size

You should always define your columns to be large enough to hold the largest piece of information you expect it will ever hold. This helps to avoid restructuring of tables at a later time. Column size affects both physical design and logging.

Rows are stored in 4096-byte DBEFile pages. After the space for a page header is used, 3996 bytes are left for data storage. During physical design row length is used in calculating the number of DBEFile pages needed to store data for a table. The sum of all columns plus a two byte overhead for each column is equal to the total row length:

RL = SC +(2 * NC)

where RL is row length, SC is the sum of the length of all columns in the table, and NC is the number of columns in the table.

Since a page has 3996 bytes for storage, a row of 2000 to 3000 bytes is going to waste space by taking one half to three quarters of a page and leaving the rest of the page empty. Column size can be adjusted to use pages more efficiently, or a large table can be broken into two smaller tables to improve page use. This is also discussed under "Calculating Row Length" in the "Physical Design" chapter in this guide.

Large columns that are modified frequently create proportionately large log records and consequently use more log file space. When you determine log file size in the "Physical Design" chapter, keep in mind the size of the columns that are being modified.

Defining Null Values For Columns

Columns that might not always have data available should be allowed to contain null values. A column containing a null value does not store any data. Null values are distinguished from zeros and blank character data. If the UnitPrice column contains a null value, the price is considered unavailable. However, if the UnitPrice column contains a zero, the price is $0.00 and the item is free. The default when defining columns is to allow null values. Columns not permitted to contain null values must be created with the NOT NULL option. Some guidelines for null values are:

  • If you create a column with the NOT NULL option, you must specify a value for that column whenever you insert a new row or update the column.

  • If a column is to be a key value in an index, you should define it as NOT NULL.

  • If an update application will not necessarily have all the information available, you should allow null values in the non-key columns (that is, omit the NOT NULL option for those columns).

There are application and physical design implications with the use of null values. Application programs that use null values must declare special variables called indicator variables to handle the null values. Refer to the ALLBASE/SQL Application Programming Guide for the language you are using for details on using indicator variables.

The use of null values can result in page shifting when the size of the row changes. This can be a drain on performance. For performance improvement, use the NOT NULL option on a column definition whenever possible. If a column is potentially null, SQLCore uses a two-byte overhead per column during query processing to check the null status of every selected column. Although SQLCore also needs to check the status for every inserted and updated value if the column is defined as NOT NULL, performance is better than if the column allowed null values.

In addition, a two-byte tuple header is stored on disk for each tuple which has a NULL value if the inserted tuple's header does not match the first shared header on the DBEFile page. The data types and sizes of the columns for the OrderItems and Vendors tables are shown in Table 2-2.

Table 2-2 Column Attributes for Two Tables

TABLECOLUMN ATTRIBUTES
OrderItemsOrderNumberINTEGERNOT NULL,
 ItemNumberINTEGERNOT NULL,
 VendPartNumberCHAR(16), 
 PurchasePriceDECIMAL(10,2)NOT NULL,
 OrderQtySMALLINT, 
 ItemDueDateCHAR(8), 
 ReceivedQtySMALLINT  
VendorsVendorNumberINTEGERNOT NULL,
 VendorNameCHAR(30)NOT NULL,
 ContactNameCHAR(30), 
 PhoneNumberCHAR(15), 
 VendorStreetVARCHAR(30)NOT NULL,
 VendorCityVARCHAR(20)NOT NULL,
 VendorStateCHAR(2)NOT NULL,
 VendorZipCodeCHAR(10), 
 VendorRemarksVARCHAR(60)