Methods for comparing data between different schemas
Are there methods for comparing the same data stored in different schemas? The situation is as follows. If I have a db with schema A and it stores data for a function, say 5 tables. Scheme A -> Scheme B is executed during the update process. During the update process, some transformation logic is applied and the data is stored in 7 tables in schema B. What I need is some way to check the data integrity, basically I would have to compare different schemas when factoring in the transformation logic. Other than writing some custom t-sql sprocs to compare data, is there an alternative method? I'm leaning towards python to automate this, are there any python modules that can help me? To better illustrate my question,the following diagram is a rough picture of one of the many datasets I need to compare, properties 1,2,3 and 4 are carried over from schema source to destination, but propagated across different tables.
Table1Src Table1Dest
| |
--ID(Primary Key) --ID(Primary Key)
--Property1 --Property1
--Property2 --Property5
--Property3 --Property6
Table2Src Table2Dest
| |
--ID(Foreign Key->Table1Src) --ID(Foreign Key->Table1Dest)
--Property4 --Property2
--Property3
Table3Dest
|
--ID(Foreign Key->Table1Dest)
--Property4
--Property7
source to share
Basically, you should create object representations for both versions of the schema and then compare the objects. This is best done if they all fit into memory at the same time; if not, you need to iterate over all the objects in one view, select the corresponding object in the other view, compare them, and then do the same in reverse.
The hard part can be getting the representations of the objects; you can see if SQLAlchemy can use your tables conveniently. SQLAlchemy is, in principle, capable of mapping existing schema definitions to objects.
source to share