Oracle: fastest way to compare tables containing CLOBs and get diff

Suppose I have two tables with columns Col1, Col2 and Col3, which are VARCHAR2, CLOB and NUMBER respectively.

How can I get the diff of these tables? (i.e. List of records that exist in table B but not in table A)

Table A:
╔═══════╦═════════════════╦══════╗
β•‘ Col1  β•‘      Col2       β•‘ Col3 β•‘
╠═══════╬═════════════════╬══════╣
β•‘ P1111 β•‘ some_long_text1 β•‘ 1234 β•‘
β•‘ P1111 β•‘ some_long_text1 β•‘ 1233 β•‘
β•‘ P1111 β•‘ some_long_text2 β•‘ 1233 β•‘
β•šβ•β•β•β•β•β•β•β•©β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•©β•β•β•β•β•β•β•

Table B:
╔═══════╦═════════════════╦══════╗
β•‘ Col1  β•‘      Col2       β•‘ Col3 β•‘
╠═══════╬═════════════════╬══════╣
β•‘ P1111 β•‘ some_long_text1 β•‘ 1234 β•‘
β•‘ P1111 β•‘ some_long_text1 β•‘ 1235 β•‘
β•‘ P1112 β•‘ some_long_text2 β•‘ 1233 β•‘
β•šβ•β•β•β•β•β•β•β•©β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•©β•β•β•β•β•β•β•

Expected results:
╔═══════╦═════════════════╦══════╗
β•‘ Col1  β•‘      Col2       β•‘ Col3 β•‘
╠═══════╬═════════════════╬══════╣
β•‘ P1111 β•‘ some_long_text1 β•‘ 1235 β•‘
β•‘ P1112 β•‘ some_long_text2 β•‘ 1233 β•‘
β•šβ•β•β•β•β•β•β•β•©β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•©β•β•β•β•β•β•β•

      

+3


source to share


5 answers


To compare types LOB

you can use the function DBMS_LOB.COMPARE

.



SELECT table_b.* 
  FROM table_b
  LEFT JOIN table_a
    ON table_b.col1 = table_a.col1
   AND DBMS_LOB.COMPARE(table_b.col2, table_a.col2) = 0
   AND table_b.col3 = table_a.col3
 WHERE table_a.col1 IS NULL;

      

+2


source


You can use the dbms_lob.substr () function as shown below to use the minus operator



select co11,dbms_lob.substr(col2),col3 from a
minus
select co11,dbms_lob.substr(col2),col3 from b;

      

+1


source


In Oracle I think you can do this

SELECT * FROM TableB WHERE (Col1, Col2, Col3) NOT IN (SELECT Col1, Col2, Col3 from TABLEA)

      

Other DBS you have joined:

SELECT * FROM TableB left outer join TableA 
on (a.Col1=b.Col1 and a.Col2=b.Col2 and a.Col3=b.Col3)
WHERE a.col1 is null

      

You will probably need to perform a checksum / hash on the CLOB column, but if included in the comparison.

0


source


As per your expected results, it looks like Left Join will work well.

Something because of this:

Select B.Col1
      ,B.Col2
      ,B,Col3
  FROM TableB B
  LEFT OUTER JOIN TableA A
    ON B.Col1 = A.Col1
 WHERE B.Col2 = A.Col2
   AND B.Col3 = A.Col3
   AND A.Col1 IS NULL

      

Since the condition WHERE

filters the dataset, it makes sense to match everything, including the values ​​that NULL

match Table A. Including only A.Col1 IS NULL

in WHERE

, then you're sure to only see values ​​from TableB that don't have a matching TableA value.

The entrances are expensive and the Left Joins are even more expensive. Joining only one key value should help efficiency (especially since you want to match all records anyway). By placing other join predicates in the clause WHERE

, you can use a filter.

Now - as far as suits yours CLOB

- there may or may not be any benefit to hashing these values. It depends on the size of the data and the hashing algorithm used. The Oracle optimizer can automatically select the hash column to compare, or you can force it to use a function.

I believe that the author should be allowed to make this choice - and I am sure there will be others who disagree with me, and they will all have good reasons. My point is, why force an extra step if it's not necessary when the optimizer can make that decision on its own?

If this is going to be a widely used query (such as a storage routine) that will be called frequently, then it might be useful to create a column that stores the precomputed hash CLOB

for easy comparison. This change will almost certainly eliminate the overhead by requiring each CLOB to be hashed at runtime, which can be very CPU intensive. I personally would not recommend indexing the hashed column as I expect each CLOB record to be likely to be unique. If so, then the PK of the table should be sufficient for matching based on row uniqueness.

0


source


1) You must create a UDT for the lob objects.

create or replace type lob_wrapper is object 
( x clob,
  hash varchar2(100),
 constructor function lob_wrapper(p_x clob)  return self as result,
 MAP MEMBER FUNCTION get_hash  RETURN varchar2

 )
 ;

create  or replace type body lob_wrapper  
as 
 constructor function lob_wrapper(p_x clob)
    return self as result
  as
   temp_ varchar(1000) :=  p_x;
  begin 
       self.x := p_x;
-- add here better implementation of hashing clob. 
       select ora_hash(temp_) into self.hash from dual;   
   return;
  end; 
 MAP MEMBER FUNCTION get_hash  RETURN varchar2 is 
   v_hash varchar2(4000);
 begin    
  return hash;
 end;
end;

      

The object construct computes the hash for the clob. In the example I use ora_hash

, but you should choose the best solution ( dbms_crypto.hash

).

The map function get_hash

on an object is called when the db tries to compare two objects.

select col1,lob_wrapper(col2) col2 ,col3 from test_clob_b
minus 
select col1,lob_wrapper(col2) col2 ,col3 from test_clob_a

      

Add another selection to get the original value from the object.

select col1,t.col2.x oringal_value,col3, t.col2.hash hash_value from (
select col1,lob_wrapper(col2) col2 ,col3 from test_clob_b
minus 
select col1,lob_wrapper(col2) col2 ,col3 from test_clob_a
) t;

      

0


source







All Articles