Arranging Data in Tables [ ALLBASE/SQL Database Administration Guide ] MPE/iX 5.5 Documentation
ALLBASE/SQL Database Administration Guide
Arranging Data in Tables
The basic data structure in an ALLBASE/SQL database is a table. Data is
stored within a table as rows divided into columns. Indexes and views
are created using the columns of a table.
To design the tables in your database from normalized data, you must:
* define tables from the normalized data according to the uses of
the data in your applications
* define the columns in the tables
Defining Tables
To design tables you need to know how the applications use data in
SELECT, DELETE, INSERT, and UPDATE operations. Data that is deleted,
inserted, or updated at the same time should be put in the same table.
Use these criteria to start grouping your data into tables.
To determine the composition of your tables, keep the following
guidelines in mind:
* The maximum number of tables you can define in a DBEnvironment is
231 -1.
* The maximum number of columns in a table is 1023.
* The maximum length of a column is 3996 bytes.
* The maximum length of a row in bytes is
where NC is the number of columns in the table and SCL is the sum
of the column lengths.
* If you add a column to a table using the ALTER TABLE statement,
the added column is placed on the right-hand side of the table.
The following paragraphs are a few more guidelines to help you determine
table design.
A row is the smallest unit you can delete or insert at a time. This
means two columns should be placed in different tables if they can be
inserted or deleted at different times. Assume a part name is never
inserted or deleted unless a corresponding part number is inserted or
deleted. Part name and part number should be in the same table. On the
other hand, a vendor name can be inserted or deleted independent of, say,
the order number. Therefore, the vendor name and order number should be
placed in separate tables.
Interactively you can only use the UPDATE statement on one table at a
time. This means if you use a single statement to update one column
based on the value of another column, both columns must be in the same
table. The unit price of a part is updated as it pertains to a
particular part number column; therefore, the unit price and part number
columns should be in the same table.
Planning Joins.
A join is a query that selects columns from at least two tables. If many
applications request information that can logically reside in two tables,
you may want to place the information in a single table to improve
performance. A table with a large number of columns can impair
performance as will several smaller tables that are joined frequently.
The WHERE clause of the SELECT statement is used to specify the
condition(s) under which rows are joined. ALLBASE/SQL allows joins on
compatible data types, but for maximum efficiency, joins should be
performed on identical columns.
NOTE An application consistently containing queries that join more than
six tables indicates that the tables are over-normalized and the
database design should be re-examined.
Tables can be joined as Cartesian products where each row of one table is
joined with every row in another table. However, in order for tables to
be meaningfully joined, they must share a common column. If two tables
do not share a common column, then a third table containing common
columns for both tables must be introduced into the join, as shown in
Figure 2-6 .
Figure 2-6. Common Columns for Joins
Several of the sample database tables must be joined to provide the
required data to the users and applications. However, the normalized
data does not allow some tables to be joined directly. Refer to Figure
2-3. The OrderItems table cannot be meaningfully joined with the Vendors
table because there is no common column in the tables. To facilitate
retrieval for those applications that use the Vendors and OrderItems
tables, the VendorNumber column was added to the Orders table, as shown
in Table 2-1. This allows a join to be made between the Vendors and
OrderItems tables using the Orders table.
Adding the VendorNumber column, of course, violates the third normal
form. However, normalization is just a tool that helps you design
efficient tables. It is up to the DBA to alter the table design to meet
the needs of the users.
For additional information on joins, including Cartesian products, refer
to the "SQL Queries" chapter of the ALLBASE/SQL Reference Manual.
Final Form of Sample Database Tables.
The normalized data from the purchase order form produced the Vendors,
Orders, OrderItems, and SupplyPrice tables. Integrated Peripherals, Inc.
also designed a Parts table and an Inventory table to keep track of the
internal parts. As a result, a total of six tables were designed from
the defining and normalization phase.
Columms are added to some of the
tables to fit the user's needs. A VendorRemarks column is added to
Vendors to keep comments for that vendor. A ReceivedQty column is added
to the OrderItems table to denote when a shipment arrives. The resulting
six tables are shown in Table 2-1.
Table 2-1. Sample Database Tables
------------------------------------------------------------------
| | |
| TABLE | COLUMNS |
| | |
------------------------------------------------------------------
| | |
| Parts | PartNumber, PartName, SalesPrice |
| | |
------------------------------------------------------------------
| | |
| SupplyPrice | PartNumber, VendorNumber, VendPartNumber, |
| | UnitPrice, DeliveryDays, Quantity |
| | |
------------------------------------------------------------------
| | |
| Vendors | VendorNumber, VendorName, ContactName, |
| | PhoneNumber, VendorStreet, VendorCity, |
| | VendorState, VendorZipCode, VendorRemarks |
| | |
------------------------------------------------------------------
| | |
| Orders | OrderNumber, VendorNumber, OrderDate |
| | |
------------------------------------------------------------------
| | |
| OrderItems | OrderNumber, ItemNumber, VendPartNumber, |
| | PurchasePrice, OrderQty, ItemDueDate, |
| | ReceivedQty |
| | |
------------------------------------------------------------------
| | |
| Inventory | PartNumber, BinNumber, QtyOnHand, |
| | LastCountDate, CountCycle, AdjustmentQty, |
| | ReorderQty, ReorderPoint |
| | |
------------------------------------------------------------------
Defining Columns
Each attribute of each entity is defined in ALLBASE/SQL as a column in a
table. The "Data Types" chapter in the ALLBASE/SQL Reference Manual
contains basic information on column names and data types. Use the
information presented there and the following guidelines to define
columns. A column is defined by specifying:
* Name
* Data type
* Size
* Whether or not it can contain null values
Defining Column Names.
Be precise in your names and descriptions so no user can misunderstand a
data element or its meaning. Each column name can be as long as 20
characters. Try to choose meaningful, unambiguous names. For example, a
name of "Qty" in the OrderItems table could refer to order quantity or
received quantity while "OrderQty" can refer only to order quantity.
Long names, however, may become unwieldy in a report. In the sample
database, the column and table names have been made as unambiguous as
possible.
Defining Column Data Type.
A data type tells ALLBASE/SQL what type of data can be stored in the
column and what can be done with the column. Thus, a column with an
INTEGER data type can only have integers. A column with a numeric data
type (FLOAT, DECIMAL, INTEGER, or SMALLINT) can be used in arithmetic
operations. A column with an alphanumeric data type (CHAR or VARCHAR)
can appear in a string operation such as a comparison using the LIKE
predicate.
Some guidelines for data types are:
* Match the data type to the data, for example:
* Columns used for quantity or count should be defined as
INTEGER or SMALLINT.
* Columns containing real numbers or engineering data should
be defined as REAL or FLOAT.
* Columns used for money should be defined as DECIMAL so
there is no loss of precision when rounding.
* Columns containing alphanumeric characters should be
defined as CHAR or VARCHAR. VARCHAR is used for
particularly large alphanumeric columns like descriptions
or comments. The VARCHAR data type is recommended when
there are a few potentially large character strings, but
most of the time the entire column will not be filled. The
VARCHAR data type stores character data more efficiently
because the column is not padded with blanks. The
VendorRemarks column in the Vendors table is defined as
VARCHAR.
* Columns containing binary data or data of an unspecified
kind should be defined as BINARY or VARBINARY. The
VARBINARY data type is recommended when there are a few
potentially large binary strings, but most of the time the
entire column will not be filled. The VARBINARY data type
stores binary data more efficiently because the column is
not padded with zeroes.
* For very large column values (greater than 3996 bytes), use
the LONG BINARY or LONG VARBINARY data type.
* Columns used for dates should be defined as DATE.
* Columns used for times should be defined as TIME.
* Columns used for timestamps should be defined as DATETIME.
* Columns used for intervals should be defined as INTERVAL.
* Avoid data conversions by using the same data type for columns
that are frequently compared or used in a WHERE clause. For
example, if you compare two columns frequently, do not define one
as INTEGER and the other as FLOAT. ALLBASE/SQL will have to
convert one number to the data type of the other which could
result in performance degradation and possible loss of precision.
* CHAR and VARCHAR columns are defined with a length in bytes.
Therefore, if your log-on language is a 16-bit language such as
Chinese, the number of two-byte characters you can store will be
equal to half the column width.
Columns of type VARCHAR or VARBINARY may cause an extra tuple header to
be stored. A tuple header is a description of the rows on a DBEFile
page. If all rows on the page are the same, the header can be shared. A
VARCHAR or VARBINARY column may be a different length in each row thus
requiring each row to have its own tuple header. The calculations in
Chapter 3, "Physical Design", assume that each row has its own tuple
header. Refer to the ALLBASE/SQL Reference Manual chapter on "Data
Types" for further explanation.
The use of VARCHAR or VARBINARY data types can result in page shifting,
that is, movement of data from one page to another when the size of the
data changes. This can be a drain on performance.
Defining Column Size.
You should always define your columns to be large enough to hold the
largest piece of information you expect it will ever hold. This helps to
avoid restructuring of tables at a later time. Column size affects both
physical design and logging.
Rows are stored in 4096-byte DBEFile pages. After the space for a page
header is used, 3996 bytes are left for data storage. During physical
design row length is used in calculating the number of DBEFile pages
needed to store data for a table. The sum of all columns plus a two byte
overhead for each column is equal to the total row length:
where RL is row length, SC is the sum of the length of all columns in the
table, and NC is the number of columns in the table.
Since a page has 3996 bytes for storage, a row of 2000 to 3000 bytes is
going to waste space by taking one half to three quarters of a page and
leaving the rest of the page empty. Column size can be adjusted to use
pages more efficiently, or a large table can be broken into two smaller
tables to improve page use. This is also discussed under "Calculating
Row Length" in the "Physical Design" chapter in this guide.
Large columns that are modified frequently create proportionately large
log records and consequently use more log file space. When you determine
log file size in the "Physical Design" chapter, keep in mind the size of
the columns that are being modified.
Defining Null Values For Columns.
Columns that might not always have data available should be allowed to
contain null values. A column containing a null value does not store any
data. Null values are distinguished from zeros and blank character data.
If the UnitPrice column contains a null value, the price is considered
unavailable. However, if the UnitPrice column contains a zero, the price
is $0.00 and the item is free. The default when defining columns is to
allow null values. Columns not permitted to contain null values must be
created with the NOT NULL option. Some guidelines for null values are:
* If you create a column with the NOT NULL option, you must specify
a value for that column whenever you insert a new row or update
the column.
* If a column is to be a key value in an index, you should define it
as NOT NULL.
* If an update application will not necessarily have all the
information available, you should allow null values in the non-key
columns (that is, omit the NOT NULL option for those columns).
There are application and physical design implications with the use of
null values. Application programs that use null values must declare
special variables called indicator variables to handle the null values.
Refer to the ALLBASE/SQL Application Programming Guide for the language
you are using for details on using indicator variables.
The use of null values can result in page shifting when the size of the
row changes. This can be a drain on performance. For performance
improvement, use the NOT NULL option on a column definition whenever
possible. If a column is potentially null, SQLCore uses a two-byte
overhead per column during query processing to check the null status of
every selected column. Although SQLCore also needs to check the status
for every inserted and updated value if the column is defined as NOT
NULL, performance is better than if the column allowed null values.
In addition, a two-byte tuple header is stored on disk for each tuple
which has a NULL value if the inserted tuple's header does not match the
first shared header on the DBEFile page. The data types and sizes of the
columns for the OrderItems and Vendors tables are shown in Table 2-2.
Table 2-2. Column Attributes for Two Tables
-------------------------------------------------------------------------------
- TABLE - COLUMN ATTRIBUTES -
-------------------------------------------------------------------------------
- OrderItems - OrderNumber INTEGER NOT NULL, -
- - ItemNumber INTEGER NOT NULL, -
- - VendPartNumber CHAR(16), -
- - PurchasePrice DECIMAL(10,2) NOT NULL, -
- - OrderQty SMALLINT, -
- - ItemDueDate CHAR(8), -
- - ReceivedQty SMALLINT -
-------------------------------------------------------------------------------
- Vendors - VendorNumber INTEGER NOT NULL, -
- - VendorName CHAR(30) NOT NULL, -
- - ContactName CHAR(30), -
- - PhoneNumber CHAR (15), -
- - VendorStreet VARCHAR(30) NOT NULL, -
- - VendorCity VARCHAR(20) NOT NULL, -
- - VendorState CHAR (2) NOT NULL, -
- - VendorZipCode CHAR (10), -
- - VendorRemarks VARCHAR(60) -
-------------------------------------------------------------------------------
MPE/iX 5.5 Documentation