Creating an Ideal Reporting Instance

By
Melanie Kacerek
HPWorld . 99 Paper #272

Quest Software, Inc.
610 Newport Center Drive, Suite 1400
Newport Beach, CA 92660

949-720-1434 telephone
949-720-0426 fax
mkacerek@quest.com email
www.quest.com

Do you dread the response from the user community when someone starts running a report against their data during the day? Do you try to postpone running all reports just to avoid those performance complaints? Is the first week of the month awful, because month end reports are running during the day?

As organizations grow, the number of users updating databases grows as well as the overhead associated with those users. To accommodate the increased memory, disk space, and CPU requirements, companies continue to purchase bigger and bigger systems.

Unfortunately, larger systems cannot always provide improved performance, because the company has conflicting processing needs. Users updating a database require quick inserts into selected tables. Reports use large queries, spanning numerous tables, usually with some complex joins. These different work profiles do not mix well on the same instance, since they work against each other. s disk I/O and caching patterns. When online transactions and report processing are accessing the same database, the end result is slow throughput for both.

This problem is easily identified on a system whose resources are already stressed. To maximize online productivity, most shops try to defer reports, running them during a shrinking batch window at night or on weekends to avoid this contention. But delaying the reports impacts the productivity of the decision support staff when the information is needed now.

Deferring reports reduces their accuracy and functionality. For example, if an inventory report can only be run at night, it always reflects yesterday. s totals, so purchasing decisions have built in latency. If a sale comes in today that zeros out an item. s inventory, it will not be noticed until the report is run tonight, and it will not be addressed until the report is reviewed tomorrow.

If that same report were run today against today's data, the re-stocking order could be placed today, and the next walk-in sale might occur one day sooner and represent one extra sale. However, if the performance for online users is dramatically reduced by running that report during the day, and if those online users are handling catalog sales, for example, then their reduced productivity may represent a greater loss of revenue.

How do you get reports run against today's data without affecting online productivity?

The challenge is to separate On Line Transaction Processing (OLTP) and report generation onto different systems. Reports and queries need to run against a separate Oracle instance, which must be maintained to reflect the activity on the production system. With different instances for OLTP and queries, performance for both types of access can be optimized. Performance improves by removing the conflicting I/O caching algorithms, and it can be optimized by adjusting the instances to perform their best for online inserts, updates, and deletes, or for queries.

Many businesses have already begun to implement this strategy of separating production data entry from reports and queries, but so far, the drawbacks of many solutions often outweigh the benefits.

What is a Viable Reporting Instance?

The first step in defining an optimal solution is defining the criteria for the desired result. In this case, the best scenario would provide a reporting instance with current data. Each market is different. For some businesses, "current" data can be last weekend. s backup. For others, week old data is completely useless, as would reports that are based on it. Many businesses expect a daily source of data with which to generate reports. Highly competitive companies in every market want reports based on the latest data, minutes old, not hours old or days old.

In addition to containing current data, this ideal reporting instance would be inexpensive in all ways. The ideal solution would not require extensive staffing nor consulting; it would be easy to install, and simple to maintain. An ideal solution would not require extra hardware, extra overhead, nor extra network bandwidth.

A simple solution to creating a reporting instance is to shutdown the production instance, extract the data, walk the results to the reporting system, and import the data. This solution does not require massive networks, unless the copy of the extract is shipped across the network. The extract can be stored on tape, and then the tape carried to the reporting system and restored. Again, this simple solution, "sneaker-net" doesn. t require extra hardware, extra software, or extra network bandwidth. However, it does require time and a person to perform the repetitive task.

In the time consumed by the extract and the import, the reporting instance has aged, even before the first query is run against it. And if the extract and import are manual processes, the reporting instance has staffing costs associated with it, in addition to providing stale data.

Back to the initial, unacceptable cost --- overhead. The ideal reporting instance should be available without replacing the overhead associated with running reports on an OLTP instance with the overhead required to provide the "substitute" reporting instance. The primary motivation for creating a separate reporting instance was to offload query processing. If generating the reporting instance requires similar overhead or an interruption to production activity, what has been gained?

Another key component of an ideal reporting instance is that the data is accessible. Many replication products that provide a copy of the production instance, do not also allow access to that copy, except in emergencies. In an "emergency," the copied data may be accessed. But at the moment the data is accessed, the data is no longer being updated, and it is immediately becoming obsolete. So, an ideal reporting instance is accessible while remaining up to date.

If a reporting instance could be located anywhere and reflect current production data, it would enable remote users to run their own ad-hoc queries to generate reports containing the data important to them. Another criteria for an ideal reporting instance is that it would not be limited to a geographic radius from the production system.

No Free Lunch

An ideal reporting solution would have no costs. Realistically, few worthwhile things are free. Certainly, a good reporting instance is not free. Even the "sneaker-net" solution in which an extract is made and walked over to another system for importing is not free. Sneaker-netting requires time, staffing, and results in aged data.

The question is, "which costs are acceptable?. System overhead . no, the idea was to reduce it, not replace it. Network bandwidth" maybe, within reason? Software costs? Hardware costs? Programming and testing costs?

The goal is a viable reporting instance. The method to achieve the goal is replication. The remaining question is: how?

Through replication, a duplicate copy of the production database is potentially created and maintained for the purpose of offloading reporting. Additional benefits of replication may include additional functionality inherent in the replica, such as data distribution, high availability, and disaster recovery. Data distribution is the process of delivering data to remote systems and ultimately, to remote users on those systems.

High availability refers to the ability of a second system to assume the responsibilities of the production system, so that the production system can be taken offline for maintenance. High availability can refer to the potential for a local secondary system with a copy of the data and the environment from the production system to "pinch hit" if the production system fails.

Disaster recovery is used to describe the ability to survive a dire situation resulting in the long-term disabling or destruction of the primary IS center, including the production system. With a robust disaster recovery plan, business can continue because a redundant copy of the production environment and current data exist outside the area affected by the disaster, such as a flood, fire, earthquake, tornado, hurricane, or power-outage.

In truth, an ideal reporting instance would offer more than current data at little or no costs. An ideal reporting instance would afford businesses several additional options: data distribution, high availability, and possibly disaster recovery.

Hardware-Based Replication

Hardware replication involves large-scale replication at the disk level. Hardware replication can create an initial replica and then copy data blocks as they change. In the case of an Oracle insert, the blocks containing the rows that changed, the blocks containing the redo log, the blocks containing the rollback segment, and the blocks containing any indices affected by the insert would all be replicated. A large transaction could modify many more blocks.

The price of hardware replication is high, frequently requiring specific disk drives and a large network pipeline to handle the large volume of data being replicated.

However, the resulting copy of hardware replication is a mirror image of the production system. The secondary copy contains the database plus any programs and external files stored on the disks replicated. Hardware replication is all-inclusive of the variety of datatypes contained in an Oracle instance. From a hardware solution perspective, if the data block changed, replicate it regardless of the contents.

Because hardware replication is so thorough, it provides an excellent form of disaster recovery insurance with its secondary copy.

One significant drawback of hardware replication is that the replica is normally inaccessible. The secondary copy cannot be accessed by the hardware replication solution and by reports at the same time. This limitation forces a staged implementation. Many hardware replication environments use periodic replication, allowing the hardware solution to replicate for a period of time, then stop replication in order to access the data, and then at another interval, cease reports and queries in order to resume replication.

With a hardware solution, the periodic replication process can be automatic, but the process to suspend replication and eventually to re-establish it is time-consuming. During the intervals when reports are running, the data against which the reports are running is aging. The longer the interval between replication activities, the older and less useful the reports are.

The replica created and maintained by a hardware solution is an exact replica. If you wanted to optimize the replica for reporting by adding additional indices, you would have to repeat the process prior to running reports each time.

A common limitation of hardware replication solutions is the distance between the production and reporting systems. Frequently, this distance cannot exceed 40 kilometers. So, if the anticipated disaster could span that distance, the replica would still be at risk.

Replication with Messaging

In contrast, messaging replication solutions can distribute data to remote locations over LAN, WAN, or the Internet. Messaging solutions are generally less expensive to purchase than hardware replication solutions. However, the few days required to implement a hardware replication tool are dwarfed by the coding requirements of a messaging solution. Every program that updates the data must be modified to call messaging routines that push the data to the reporting instance. If desired, reports can be modified to call other messaging routines to pull the data from the production system. If any of these applications is updated, the coding and testing must be repeated.

Once implemented, the messaging replication solution provides current data without geographic limitations to a multitude of destinations. It does not create the initial replica, nor does it replicate the entire environment, but it does replicate the database changes. And, by maintaining the replica database, a messaging solution provides a viable reporting instance.

Both hardware and messaging replication solutions require some overhead from the production system, but ordinarily, the CPU price is not massively detrimental to the performance on the OLTP system.

 

Software Replication

Software replication tools for Oracle come in two main flavors: trigger-based and log-based. "Trigger-based" replication refers to replication procedures coded within the database which are activated when specific data changes. "Log-based" replication uses the Oracle redo logs as its source of change information. Both of these methods have their strengths and their failings. With Oracle8, a third option has been introduced . the standby/read-only instance. It too has its strengths and its shortcomings.

Triggers enable a business to define exactly what data within the database is replicated. Trigger-based replication solutions are frequently completely contained within the Oracle database, which can facilitate continued management of the process. However, trigger-based replication tools heavily tax the production instance, making users question whether the cure is any better than the initial problem.

Many trigger-based solutions have built in latency, often starting the replication process once a transaction completes and commits on the production instance. If a transaction takes 20 minutes to run, the reporting instance starts out 20 minutes behind. Add the time required to transfer the data to the reporting system, and add the time to update the reporting instance. The reporting system could easily be 40 minutes behind the production system, if the transaction takes the same amount of time on the target system as it did on the source system without taking into account the network travel time.

Although trigger-based replication methods do not require the network bandwidth of a hardware replication solution, they frequently use SQL*Net to transfer the data, which can become a bottleneck to the replication process, adding to the latency of the target data.

However, some trigger replication solutions can be very fast; in fact, the secondary copy can be maintained synchronously with the production system. This technique eliminates the latency, but it creates another issue. If the transaction cannot be committed on both systems, it is prevented from committing on the production system. If this happens, business is affected on the production system, because replication cannot continue.

The primary drawback to most trigger-based replication solutions is the overhead required from the primary system and the production instance in order for them to work. In addition to the I/O involved in the Oracle transaction, the triggers require CPU time, and when the triggers write the data for replication to a queue, that I/O hits the production instance too.

Most trigger-based solutions use queue files within the Oracle database. In these cases, maintaining the queues means more I/O against the database. Additionally, queuing within the instance poses an additional risk. If the network fails, the replication data begins queuing within the instance. If the queuing tablespace fills, the production instance will fail. Consequently, when one disaster strikes, such as the network or target system being unavailable, MIS departments must divide their attention between addressing the primary problem and preventing a monumental secondary one.

Because triggers cost so much in terms of overhead, they are frequently implemented at intervals, similar to the hardware replication solutions. As such, they have the same additional latency, if replication is interrupted and delayed.

A limitation of many trigger-based solutions is the types of data they can replicate. Tables with referential integrity enabled, tables with long or long raw columns, tables without a primary key, and sequences are often seen on the "not supported" list for trigger-based replication products.

The nature of asynchronous trigger-based solutions violates the Oracle Read Consistency Model. With triggers, the resulting target instance receives its updates without maintaining both transaction order and session context all the way to the destination. Hence, the target instance can eventually go out of sync spontaneously, if the replication groups are not carefully designed and maintained, and if long and short transactions are combined to affect the same replication group.

Currently, most trigger-based replication tools are incapable of replicating the volume of changes that occur on a production database. Many trigger-based replication implementations are limited to a small number of tables, and a small number of transactions replicated per second. Hardware and messaging replication solutions are better equipped to handle the volume requirements of replicating production activity to target instances. So, although trigger-based solutions are flexible in terms of which data is replicated, they are limited in terms of how much data can be replicated.

On the bright side, trigger-based replicas are continually accessible, unlike the hardware replicas.

Redo Log-Based Replication

Log-Based replication begins after an initial replica is created. It replicates by retrieving changes to the selected objects from the Oracle redo log files.

Using the redo logs as the source of change information rather than triggers dramatically reduces the overhead requirements on the production system and, more importantly, on the production instance.

Log-Based replication minimizes the latency between the source and replica by capturing the modifications as soon as they are written to the Oracle log. When a COMMIT hits the redo logs, it is replicated to the target where the transaction is completed. If the transaction is cancelled, Oracle records the ROLLBACK in its redo logs. The log-based replication tool simply replicates the rollback too, keeping the target instance as an accurate representation of the source database.

With log-based replication, the reporting instance is maintained continuously, without the interruptions or delays found in hardware or trigger-based solutions. So, reports run against an instance maintained by a log-based replication tool reflect more current data than found in other types of replication implementations.

Unlike trigger-based replication that can select which data to replicate at the row level, most log-based replication tools provide selectivity to the table or sequence level only. However, although they do not provide the same granularity of replication, they can provide a larger variety of supported data types, including those which are typically not supported by trigger methods (sequences, tables without a primary key, tables with long or long raw columns, and tables with referential integrity enabled on both the source and target systems).

With log-based replication, the captured changes can be transferred in memory, or if needed, stored in queues outside the database. By maintaining the queues outside the database, log-based replication further reduces the I/O against the production instance.

Log-based replication is asynchronous. Being asynchronous allows this type of solution to allow for network outages that could paralyze other replication strategies. With the queues maintained outside the database, if the target database fails, queuing can continue, isolated from production, until the database is recovered.

Replicating the changes from the redo logs requires far less network bandwidth than hardware replication solutions. In addition to requiring less network bandwidth, some log-based solutions use a dependable, high-speed network protocol to transfer the data to remote systems via TCP/IP over a LAN or a WAN.

In contrast to trigger-based replication solutions, log-based solutions can support thousands of tables and thousands of transactions per second . the volume of tables, sequences, and transactions required by common ERP applications such as ORAPS, PeopleSoft, Baan, and SAP.

By replicating the variety of data types, log-based replication can provide not only a viable reporting instance, but an instance that can be used for high availability and disaster recovery, since it supports both local area and wide area networks. However, log-based replication does not replicate the business environment (files outside the instance), so additional effort is required to use the target instance for disaster recovery.

Log-based replication can adhere to the Oracle Read Consistency model, maintaining both operation order and session context from the source instance to the target. The target instance receives its updates fast, keeping its data current so reports run against it are valuable and timely.

Another feature with log-based replication is that the replica can be different from the production instance, and it can be optimized for reports. The replica. s schema can have additional indices to facilitate the query and report processing. The replica instance can be on a different hardware platform, a different O/S version, and a different Oracle version as well. This reduces the challenge of maintaining multiple identical systems, as required by trigger-based tools.

However, log-based replication does require that the tables being replicated match structurally from source to target. Replicated tables must have the same number of columns, and the columns must have the same type of data on each system. Which leads to another issue . DDL.

Most log-based replication products do not support complete DDL, so each instance must be structurally maintained. Some log-based solutions offer partial DDL support for TRUNCATE and ALTER TABLE, but those solutions still require all other DDL desired on the replica to be administered separately. Hardware replication doesn. t offer the option of differing schemas, but it does replicate every change made to the selected disk drives, including both DML and DDL.

Since log-based replication supports WAN environments, it can also be used for data distribution.

With this type of implementation, rather than receiving pre-run reports, remote users can run their own reports on their own reporting instance. Potentially, with log-based replication, one production instance could be replicated to many reporting instances, geographically removed from the production system.

Rather than remote users logging onto the already taxed production system to perform their queries, the data can be replicated to the remote users who can then perform local queries to obtain the data they desire.

Log-based replication is fast, with low overhead and minimal latency. However, it is limited. If a change is not in the redo logs, this style of replication will not see the change and it will not propagate the change.

Standby/Read-Only Reporting Instances

With Oracle8i, a new option is available. Prior to version 8i, Oracle. s standby database had to run in perpetual recovery mode, without the option to open it unless all production activity was switching to it (as in disaster recovery). Once the standby database had been opened, it had to be recreated in order to resume its recovery mode. So, in normal mode, the disaster recovery copy of the instance was dormant, and once you did use it, you had to start over, setting it up again.

With Oracle8i, the standby database gained a new feature. Now, a standby database can be opened in read-only mode, and after limited queries and reports are run, the database can be returned to recovery mode to allow additional archive logs to be applied.

This option has two drawbacks. First, read-only access is just that: read only. Absolutely no modifications can be made to the replica if you want to be able to return it to standby/recovery mode. This rule eliminates the option of using a temporary table to sort data for complex reports, or updating a sequence when a certain query is performed. This is a sharp contrast to the log-based replica instance that is completely open and accessible, allowing updates to temp tables or even updates to replicated tables in a master-to-master mode.

The other drawback to the read-only/standby database is the same as the limitation to a hardware replica. The data in the accessible form gets stale, but when the data is being updated by replication, it is inaccessible. This forces a choice between replication and access, requiring alternating update and report windows, limiting when and for how long reports can run before being cancelled in order to maintain the replication schedule.

Conclusion

What is the ideal replication solution for creating a viable reporting instance with a maximum of functionality and a minimum impact on the production system? Probably a combination of a hardware solution, which replicates the entire environment for disaster recovery, combined with a log-based replication tool that continuously maintains an accessible reporting instance. Or is there yet a better solution?

About the Presenter

Melanie Kacerek is the product manager for the multi-million dollar SharePlex® suite of products at Quest Software, Inc. She has worked with replication products for seven years. Prior to that, Ms. Kacerek was an account manager for an ERP vendor, assisting customers with complex implementations and conversions. She may be reached at mkacerek@quests.com or 800-306-9329 ext. 2500.

Author | Title | Track | Home

Send email to Interex or to the Webmaster
©Copyright 1999 Interex. All rights reserved.