Updating primary key of primary and child tables for large tables
I have a fairly large database with a main table with one column GUID (custom GUID-like algorithm) as the primary key and 8 child tables that have a foreign key relationship with that GUID column. All tables have approximately 3-8 million records. None of these tables have BLOB / CLOB / TEXT or any other fancy datatypes just regular numbers, varchars, dates and timestamps (about 15-45 columns per table). There are no partitions or other indexes other than primary and foreign keys.
Now the custom GUID algorithm has changed, and although there were no collisions, I would like to migrate all the old data to use the GUIDs generated using the new algorithm. No other columns should be changed. The first number one priority is data integrity and performance is secondary.
Some of the possible solutions I could think of were (since you will probably notice that they all revolve around just one idea)
- add a new column ngu_id and fill with a new gu_id; disable restrictions; update child tables with ngu_id as gu_id; renaname ngu_id-> gu_id; re-enable restrictions
- read one master record and its dependent child records from child tables; insert into the same table with a new gu_id; delete all entries with old gu_ids
- fall restrictions; add a trigger to the main table so that all child tables are updated; start updating old gu_id with new new gu_ids; re-enable restrictions
- add a trigger to the main table so that all child tables are updated; start updating old gu_id with new new gu_ids
- create a new column ngu_ids in all main and child tables; create foreign key constraints on ngu_id columns; add an update trigger to the master table to cascade the values ββof child tables; insert new gu_id values ββinto the ngu_id column; remove old foreign key constraints based on gu_id; remove the gu_id column and rename ngu_id to gu_id; update restrictions if necessary;
- use
on update cascade
if available?
My questions:
- Is there a better way? (I can't bury my head in the sand, I gotta do it)
- What is the most appropriate way to do this? (I have to do this in oracle, sqlserver and mysql4, so vendor specific hacks are welcome)
- What are the typical failures for such an exercise and how can they be minimized?
If you are still with me, thanks and hope you can help :)
source to share
Your ideas should work. the first is probably the way I will use. Some caveats and things to think about when doing this: Don't do this if you don't have a current backup. I would leave both values ββin the main table. That way, if you ever have to sort through some of the old documents that you need to record, you can do it. Take out the database for maintenance while you do, and put it in single user mode. The very last thing you need when doing something like this is a user trying to make changes when you are in the middle of a stream. Of course, the first action in user mode is the aforementioned backup. You probably need to schedule the downtime for some time,when it is easiest to use. Test on dev first! It should also help you know how long it takes for you to close production. You can also try several ways to find out which one is the fastest. Be sure to inform users in advance that the database will go down at the scheduled time for maintenance, and when they can expect it to be available again. Make sure the timing is right. It really makes people go crazy when they plan to delay to run quarterly reports and the database is unavailable and they didn't know it. There are quite a lot of records, you can trigger updates of child tables in batches (one of the reasons not to use cascading updates). This may be faster than trying to update 5 million records with a single update.However, do not try to update one entry at atime, or you will still be here next year doing this task. Place the indexes in the GUID field on all tables and re-create them when done. This should improve the performance of the change.
source to share
It is difficult to tell what is the "best" or "most appropriate" approach since you have not described what you are looking for in the solution. For example, should tables be queryable while migrating to new IDs? Should they be available for simultaneous modification? Is it important to complete the migration as quickly as possible? Is it important to minimize the space used for migration?
Having said that, I would prefer # 1 over your other ideas, assuming they all fit your requirements.
Anything with a trigger to update child tables seems error-prone and more complicated and most likely won't work the same way as # 1.
Is it safe to assume that new IDs will never collide with old IDs? If not, solutions based on updating the ids one at a time will have to worry about collisions - this will be in a messy rush.
Have you considered using CREATE TABLE AS SELECT
(CTAS) to populate new tables with new IDs? You will create a copy of the existing tables and this will require additional space, but it will most likely be faster than updating existing tables. The idea is: (i) use CTAS to create new tables with new IDs instead of old ones, (ii) create appropriate indexes and constraints on new tables, (iii) drop old tables, (iv) rename new tables for old names.
source to share