By: Frank C. Wright, Database Administrator, Ross Group Inc
Situation
You’ve been requested to take an ad-hoc backup of a large table before applying a manual update on a subset of rows to fix a logical data corruption problem. The request is made in this manner (vs. just taking a backup of the database) to allow for online access to the original version of the data. The request could come from anyone of the following: User community, IT Management, or your own personal conscience. If you’re a DBA (part/full time) or of a measure twice - cut once mentality, the source is probably the last of the three.
Problem
The table is huge, relatively speaking. You could be working for a company with a large SAN infrastructure with the rows in the table number in the billions or you could be working on a standalone server and the company has no money in the budget for buying more disk(s).
Making an entire copy of the table is a quick and dirty task, but presents problems especially when disk space is at a premium. It makes it easy to look at a snapshot of what the data looked like prior to the update for comparison purposes. As long as the system is locked down it provides an easy method for restoring if a post operation review doesn’t meet with approval. However, if any considerable length of time has passed since the backup and user activity has resumed, a complete table restore is unacceptable as you will lose your most recent activity. It is possible to perform comparisons with only a partial backup as described below. Due to these reasons it doesn’t make sense to make a backup of the entire table if you are only updating a small subset of the records.
Solution & Tips
- Obtain the WHERE clause for the update statement you are about to perform and append it to a SELECT statement from the table you want to modify.
- Append that to a Create Table As Select (CTAS) statement. For the new table name, take the name of the table you are going to be updating and append a datestamp to the name of the table in the script. (e.g. _20090615). You can alternatively append a ticket number if you have change control processes in place and want to relate the table to a specific ticket. You can always query the metadata to get the creation date of this table.
-
-
- Apply Change.
- Run the above statement.
- Verify the row count in this table against the expected row count to be updated.
- Run the update statement against the originaltable.
- Perform Data Comparisons. You will want to examine your updates compared to the original data for validation purposes.
- Side By Side Comparison: If this table has a PK then this is a straight forward join. You don’t need any other criteria since the backup version of the table is already restricted to the set of rows that you updated. If only a few columns were updated, this is a fairly quick query to write. Otherwise look to the next example for comparing data
-
Stacked Comparison: This comparison is easier when you want to compare the entire row regardless of how many columns were updated. The label identifies the source as wellas consistently sorts the sources within all instances of matching keys.
-
Full Table Comparison: You may need to validate the data against a full copy of the table. For example there may be a report you need to run and use that to compare the before and after (and you forgot to run the “before” report before you updated the data :0). If you have the query handy that the report uses, you can run that query with a modification to retrieve a before image.
Replace in the FROM clause with the following (including parentheses) leaving the alias “as A” intact. This will merge the records from the backup into the results from the live table (minus the records you updated)
- Capture After Image (optional for recovery): You may need to restore the original values long after the system has resumed normal operation. If your table doesn’t have an audit timestamp column (i.e., LastUpdate) you might consider this step as it will be used to see if the rows you want to (re) update with original values have been updated since you applied this manual update. This implementation is borrowed from the concept of Optimistic Locking.
-
Rerun an alternate version of the CTAS statement from step 2. You might not be able to use the same criteria if any of the columns in your criteria were part of the SET clause (i.e., they were updated). Use the following criteria to match on the Keys
-
Later, if requested, run the following update to only revert rows that haven’t been modified since the manual update
If you need more information on how the Ross Group can help you solve your IT problems, contact Gary Codeluppi at 937-912-3273 or visit the Ross Group Inc web site at www.rossgroupinc.com.
|