What is your biggest concern when you need to make a change in the database—be it some minor change, such as altering initialization parameters and database attributes, or major but inevitable ones such as applying patchsets? What about for your upgrade to Oracle Database 11g itself?
For me, the biggest concern is the risk of the change "breaking" something. Even the most minor changes can have a domino effect, eventually producing a visible impact.
To minimize this risk, most shops make the change in a control environment similar to the production one, apply a workload similar to your production system's, and observe the impact. It's rather trivial, at least technologically speaking, to replicate your production system but reproducing the workload is a different story. That's easier said than done.
Most organizations attempt to do that using some third-party load generation tool that can run automatically to simulate real user activity. Although this approach may be adequate in most cases, it's never a truly faithful reproduction of your production database workload. These third-party tools merely execute a pre-written query several times with different parameters; you have to supply the query to the tool and give it a range of parameters that it can use randomly. This is not a representative workload of your production system but rather merely the running of a small part of your production workload executed several times—resulting in as little as 1 percent of your application code being tested. Worst of all, these tools require you to supply all the queries from the production workload yourself, which can take weeks or months for small applications or even up to a year for complex ones.
If Oracle DBA could, wouldn't it be a better approach to record all database operations—DML-related and otherwise—inside the database itself, and then replay them in the very sequence they occurred?
Enter Database Replay
In Oracle Database 11g, your wish is granted, and then some. The new Database Replay tool works like a DVR inside the database. Using a unique approach, it faithfully captures all database activity beneath the level of SQL in a binary format and then replays it either in the same database or in a different one (which is exactly what Oracle DBA would want to do prior to making a database change). Oracle DBA can also customize the capture process to include certain types of activity, or perhaps exclude some.
Database Replay delivers half of what Oracle calls Oracle Database 11g's "Real Application Testing" option; the other half is provided by another tool, SQL Performance Analyzer. The main difference between these two tools is the scope involved: whereas Database Replay applies to the capture and replay of all (subject to some filtering) activities in a database, SQL Performance Analyzer allows Oracle DBA to capture specific SQL statements and replay them. (Oracle DBA can't see or access specific SQLs captured in Database Replay, while in SQL Performance Analyzer Oracle DBA can.) The latter offers a significant advantage for SQL tuning because Oracle DBA can tweak the SQL statement issued by an application and assess its impact. (SQL Performance Analyzer is covered in a forthcoming installment in this series.)
Conceptually, Database Replay works in the sequence shown in the figure below.
So, what does Database Replay provide that third-party tools don't? Well, other tools merely replay several synthetic statements, which Oracle DBA provide. In contrast, Database Replay does not need Oracle DBA to provide SQL statements. Since it captures all activity underneath the SQL, you don't risk missing out on some key operations that may be the root of any performance issue. In addition, since you can capture selectively—for specific users, programs, and so on—and you can specify a time period when the workload is captured, you can replay specific workloads that cause you problems, not the entire database.
- Oracle DBA start a capture process that records activities against the database.
- The process writes the activitiesing to special files called "capture files" in a directory called /capture directory/.
- After a while Oracle DBA stop the capture process and move these capture files to a test system in a directory called /replay directory/.
- Oracle DBA start a replay process and several replay clients to replay all these capture files.
- The capture files are applied against the test database.
For instance, you notice that the month-end interest calculation program is causing issues and you suspect that changing a parameter will ease the process. All you have to do is capture the workload for the duration the month-end program runs, make the change in parameter on a test system, and then replay the capture files on that test system. If the performance improves, you have your solution. If not, well, it's only a test system. You didn't impede the operation of the production database.
In my opinion, this tool alone makes the upgrade to Oracle Database 11g worthwhile. Now, I'll show you how it works.
The first task is to capture the workload from your database. All the tasks are done either via command line or Oracle Enterprise Manager Database Control, but you'll use the latter here.
You have just captured the workload in the files in the directory /home/oracle/dbcapture!
- The workload captured is stored in the system on files—the veritable "tape" inside your "camcorder." This directory should be empty. So, the first task will be to create the directory if you don't have one. For this example, create a directory called /home/oracle/dbcapture.
$ cd /home/oracle
$ mkdir dbcapture
- Create a directory object in the database for this directory:
SQL> create directory dbcapture as '/home/oracle/dbcapture';
- Now Oracle DBA are ready to initiate capture. To demonstrate a real-life example, Oracle DBA will create a simple test harness that will generate a lot of INSERT statements and insert into a table called TRANS.
create table trans (
Here is the little PL/SQL code snippet that does the trick. It generates 1,000 insert statements and executes them. (Note that it generates 1,000 distinct insert statements, not inserts 1,000 times in the same statement or program.)
for ctr in 1..1000 loop
l_stmt := 'insert into trans values ('||
'sysdate - '||
execute immediate l_stmt;
Just create the file with the contents as above; do not run it. Call this file add_trans.sql.
(All the above steps were necessary for this lesson only. With the exception of the directory object, they are not needed when Oracle DBA perform the operation in production.)
- In the real world, Oracle DBA would probably run the replay on a different database. For our purposes here, however, Oracle DBA will merely flashback the same database and replay the activities there. Oracle DBA can mark this spot by creating a Restore Point called GOLD.
SQL> create restore point gold;
Now, Oracle DBA are ready to capture. Navigate to the main Database Replay page in Oracle Enterprise Manager Database Control. From the home page, choose Software and Support (shown in the figure below, marked by "1").
- Click Database Replay (under Real Application Testing) to launch the Database Replay page (see below).
- In the left-hand pane, you will see a series of activities. Choose the first activity (Step 1: Capture Workload) by clicking on the Go to Task icon next to it.
- The next screen brings up three assumptions you should carefully examine and confirm before starting the capture process:
- That the current database can be restored on the replay system to the SCN when workload capture begins
- That there is enough disk space to hold the captured workload
- That you are ready to restart the database before workload capture begins, if you choose to
- Tick all check boxes to acknowledge.
- Click Next.
- The next screen has two different action items. On the top half of the screen you will see two radio buttons for you to choose if you want to restart the database before the capture process.
When you start the capture process, there could be in-flight transactions, some of which may be captured and some not. Restarting the database will void these in-flight transactions. Restarting the database clears this "noise". Furthermore, restarting the database gives you a clean backup to be restored on a test system, ensuring that you are replaying the activities on a system that has the same SCN number as the production system.
For these reasons, especially the first one, Oracle recommends that you restart the database prior to capture (and this selection is default). But you don't have to. If you don't want to restart, choose the other radio button.
- The bottom part of the screen now shows something similar to that shown below.
Now you will record the filters that the capture process will take into account while capturing activities. Two filters are there by default: to exclude all activities coming from Oracle Management Server and those coming from Oracle Management Agent.
You can add additional filters too. For example, to add a filter to exclude all perl programs, click Add Another Row and enter "perl" and "%perl%" in the fields "Filter Name" and "Value" respectively. Similarly, correct a small mistake in the default parameter—the value of the Oracle Management Agent filter should be "%emagent%", not "emagent%".
Or, suppose you want to exclude all SYS user actions. Then you will need to choose USER from the Session Attribute drop down box and enter SYS in the "Value" column.
- Click Next. This brings up a screen similar to the one shown below:
- In this screen, choose the directory name from the drop-down box where capture files will be stored. In this case you have used the directory DBCAPTURE. If you have not created this directory as shown in earlier steps, Oracle DBA can still create it by clicking Create Directory Object. Then click Next.
- In the next screen Oracle DBA will see the Job Details such as when it needs to be executed and so on. Choose the radio button Immediate to execute this immediately.
- Fill in the other details in the page such as the OS username, SYS password, and so on, and click Next.
- The next screen, labeled "Step 5 of 5", shows Oracle DBA all the information Oracle DBA entered such as the job name and the exclusion filters. If everything looks as Oracle DBA desired, Click Submit. Otherwise you can go back to make changes.
- Once you hit Submit, the workload capture will start. You will see a confirmation screen as shown below.
Note the Status, which shows "In Progress".
- Now that the workload is being captured, run your simulation workload from a SQL*Plus prompt. Of course, in a real-life system, you will not need to run any simulation; you will merely let the capture run for a while to capture all your workload.
SQL> connect arup/arup
This will execute 1,000 insert statements into the table TRANS.
- After the workload is completed, click the Stop Capture button as shown in the screen above. You will be asked to confirm.
- Oracle takes Automated Workload Repository (AWR) snapshots automatically before and after the workload capture. In the next screen you will be asked if you want to export the AWR data. This is important if you replay on a different system and you would want to export the AWR data from this database to the target database, as shown in the screen below. Click Yes.
- This will create a Scheduler Job to export the AWR. Click on the job name and refresh the status screen until you see the jobs disappears from the Running tab.
Now that you have captured the workload, you can replay it. Usually, you will want to replay in a separate, test system so you will need to copy the files in the directory /home/oracle/dbcapture to a new host. Make sure that the directory is empty before you copy files to it. For learning purposes, here you will use the same database for replay.
Replaying in the same database is an uncommon but conceivable requirement. For example, you may want to replay the transactions in your main system and after testing is complete flashback to the starting point. You may have an outage window within which Oracle DBA want to test the effect of a parameter change, which Oracle DBA would do in the same database.
Oracle DBA will need to pre-process the workload captured before you can play it. Pre-processing makes these captured files ready for replay.
The workload has now been pre-processed and is ready for replay.
- Go the main Database Replay page.
- Select Step 2: Preprocess Workload.
- Choose the directory object from the drop down list box. It will show the captured workload. In your case, it's DBCAPTURE. If you have not created the directory object, you can easily create the directory by clicking the appropriate button.
- Click Preprocess Workload.
- In the next page you will be asked to provide a job name and the associated details like host username and password. Accept the defaults unless you want a specific job name. Choose to run this job immediately. The host userid and passwords should be already populated. If they are not, enter the appropriate values; click Submit.
- In the next page, you will see a conformation and a link to see the job status. Click on it.
- Refresh this screen until you see the status as "Succeeded."
After the workload is captured and pre-processed, Oracle DBA can replay it in the test database. Again, for learning purposes, you have pre-processed the workload in the same database and will use the same database to replay the activities. To do so, you have to reset the database back to the starting point. You can easily do that by flashing it back to the restore point GOLD you created during capture process.
SQL> shutdown immediate;
... database shuts down ...
SQL> startup mount
... instance starts and mounts the database ...
SQL> flashback database to restore point gold;
... database will be flashed back ...
SQL> alter database open resetlogs;
... database is opened ...
Now you are at a point before the workload started and you can replay the workload you captured earlier. Follow the steps below to replay it.
Click Continue, which bring up a screen such as the following:
- Go to the main Database Replay screen from the Database homepage as shown in the "Capturing" section.
- From the menu, select Step 3: Replay Workload. This will take you to the main Replay screen.
You will see a drop down box for choosing the directory. Choose the directory where you placed the replay files. This is the directory object; not the actual UNIX directory. In the earlier example, you used the directory object DBCAPTURE, so choose that one. If you have not created the directory yet, you can click Create Directory and create a directory object.
- Click Setup Replay in the top right-hand corner.
- The next screen brings up a list if information about what is about to happen. Here is the lowdown on each of these informational items.
You can change all the unreferenced parameters by clicking the links shown on the page. Please note that you will navigate away from the Database Replay page when you click on any one of them. So, it's preferable to change them separately, in SQL*Plus. Click Continue.
Enter a Replay Name or accept the default.
The next screen shows you some potential issues due to unresolved references to DB links, directories, and so on.
If you like Oracle DBA can make changes to the replay system on the right-hand side of the screen. In this example, as you are running on the same database, this step will not be necessary.
Click Next. This will bring up a screen as shown in the figure below:
This screen shows that the replay process is now waiting for replay clients. The replay clients are executed from outside the Database Control screen. These are client programs that read the captured workload and replay them. The program is named wrc (both on UNIX and Windows systems). To start a replay client, you need to go to the UNIX prompt and execute the following line:
$ wrc userid=system password=* replaydir=/home/oracle/dbcapture
Of course, you need to supply the correct password for SYSTEM. Change the directory name if you stored the captured files in a different place. It should return with the following message:
Workload Replay Client: Release 220.127.116.11.0 - Production on Tue Sep 4 19:50:44 2007
Copyright (c) 1982, 2007, Oracle. All rights reserved.
Wait for the replay to start (19:50:44)
At this time the replay client is merely waiting for the replay governor (Database Control) to tell it to start. You may decide to start more clients to process the workload in parallel.
Immediately go to the Database Control screen. You should see the screen has changed to display the fact that the replay clients are connected. It displays the host name they connected from, the OS process id, and so on:
Click Next and then Submit to start the replay process. If you go to the UNIX session now, you will see an additional message: "Replay started (01:49:56)". The screen will display the progress bar that displays how much of the data has been processed so far.
After some time the UNIX session will show "Replay finished (01:50:35)". At that time, if you check the Database Control screen, you will see a screen similar to the following.
This shows the detailed status of the replay job. The key field is "Status" at the upper left-hand corner that shows "Completed", indicating that the job is complete.
Now you can analyze the run. The screen shows the metrics on the lower half of the screen, under heading "Comparison". In this example it indicates that the replay completed in 39.08% of the capture time. So, is this good news? Were the changes you implemented effective?
Not necessarily. Look at the next metric—Database Time—which is 180% of the capture. To dig more, click on the tab Report, which brings up the screen shown below:
This screen shows various options for reports. Start with the simplest, the Workload Report. This report does not compare performance but shows you "divergence"—how much data was different in the replay. For instance, if you had a record with ID 3, it was updated and later deleted. During replay, suppose it was first deleted and then updated; that would qualify as divergence. The less the divergence, the more accurate the replay.
But don't stop there. For a definitive analysis, examine the AWR Compare Period Report, shown just below, for periods during the capture and the replay and see the difference for many other metrics such as latch contention, locks, redo generation, consistent gets, and so on, which give you a much better and clearer picture of the impact of your changes.
This report shows the differences between the capture and replay loads. During replay, physical writes and reads went up to 367% and 111% of that during capture respectively. Other parameters, such as sorts and logical reads, also went up, albeit not so dramatically. So you could conclude that whatever changes were made hurt performance rather than helped it.
Database Parameter Change—Consider this scenario: You are wondering if you should change the default value of the parameter db_file_multiblock_read_count from 16 to 256. Well, is 256 good, or should you set it to 128? Or, should it be 64 or 32? The choices are finite but the impact may be limitless; changing the value has a profound impact on the optimizer so what might help a query can break 100 others. How can you determine the optimal value of the parameter?
Database Replay comes in very handy in this situation. You can capture the workload from the production system, move the captured load into a different test system, set the db_file_multiblock_read_count to 32, and then replay the workload. Then you can flashback the database to the original state, set the value to 64, and replay the same workload. You can re-execute this cycle of flashing back, setting the value, and replaying the captured load for all possible values of the parameter. In each of the replays, you will run the AWR reports before and after each replay and compare them. You can then choose the value of the parameter that yields the best overall result. Without Database Replay, it would have been impossible to determine the best value.
OS Upgrades—You are planning to upgrade the OS or even apply a small patch to fix an I/O problem, but how can Oracle DBA ensure that it will not break anything or introduce some other problem? Simple: Just capture the load and replay it in a test system where the patch is applied. This technique applies to kernel parameter changes as well.
Applying Patches—Say you hit a bug and there is a patch available for it. But you are not sure what impact it will have on existing operations and, of course, you and 1,000 others in your organization would give anything to find out. Database Replay is your answer.
Debugging—There's always a pesky program (or programs?) that throw out some results that you don't expect. Fortunately, debugging has never been easier with the Database Replay. Just capture the workload during which the programs run, move to a new system, change the program logic to put in some debugging information, replay the workload, analyze the output, and look like a hero. If it doesn't work the first time, don't lose heart. Repeat the process (from the replay onward; you don't need to capture again) until you find the solution.
Object Changes—You want to add an index or convert an index from b-tree to bitmap. What impact will it have on the INSERT statements? And where? Don't speculate; just get the captured workload and replay it in the test system.
Database Upgrades—This is the holy grail of change assurance. The time has come for upgrading to Oracle Database 11g. The zillion-dollar question is: Will all your applications work just as well, or even better? Instead of speculating, just capture the workload from Oracle Database 10g and replay it in Oracle Database 11g. You are not testing some synthetic transactions on the new version—rather, you are testing the very same SQL your applications use every day. If something doesn't go as planned, tune it in the new system until you are absolutely satisfied with the outcome.
(Note: As of this writing, the support of workload capture from Oracle Database 10g is not possible. That functionality will be available in a future production version of Oracle Database 11g, however.)
Platform Changes—Suppose you want to migrate your database platform from Solaris to HP-UX, where asynch I/O is not available for filesystems. Will the performance be same? Why speculate? Just capture the workload in Solaris and replay in HP-UX.
Conversion to Oracle Real Application Clusters (RAC)—This is a common question: you are planning to convert the database from a single instance to a RAC one. Will the applications behave the same way? The only way to find out is to run an actual workload, capture it, and replay in the RAC database.
Change is never painless but it doesn't have to be unbearable either. You can mitigate many risks by capturing the exact activities your end users put into the system using the new Database Replay tool and then replay them on a test system to accurately gauge the impact of change, all through a few mouse clicks and key strokes. Remember, you can test the functionality of applications as well, not just performance.