Fast Insert Oracle Hash Cluster Table

Since I started the process of inserting 7M rows from one table to two others, I am now wondering if there is a faster way to do this. The process is expected to complete in an hour, within 24 hours of processing.

Here's how it's done:

Data from this table

RAW (word VARCHAR2(4000), doc VARCHAR2(4000), count NUMBER);

      

should find a new home in two other cluster tables T1 and T2

CREATE CLUSTER C1 (word VARCHAR2(4000)) SIZE 200 HASHKEYS 10000000;
CREATE CLUSTER C2 (doc VARCHAR2(4000)) SIZE 200 HASHKEYS 10000000;

T1 (word VARCHAR2(4000), doc VARCHAR2(4000), count NUMBER) CLUSTER C1(word);
T2 (doc VARCHAR2(4000), word VARCHAR2(4000), count NUMBER) CLUSTER C2(doc);

      

via manual commit Java Inserts like this

stmtT1 = conn.prepareStatement("insert into T1 values(?,?,?)");
stmtT2 = conn.prepareStatement("insert into T2 values(?,?,?)");

rs = stmt.executeQuery("select word, doc, count from RAW");

conn.setAutoCommit(false);

while (rs.next()) {
    word = rs.getString(1);
    doc = rs.getString(2);
    count = rs.getInt(3);

    if (commitCount++==10000) { conn.commit(); commitCount=0; }

    stmtT1.setString(1, word);
    stmtT1.setString(2, doc);
    stmtT1.setInt(3, count);

    stmtT2.setString(1, doc);
    stmtT2.setString(2, word);
    stmtT2.setInt(3,count);

    stmtT1.execute();
    stmtT2.execute();
}

conn.commit();

      

Any ideas?

+1


source to share


4 answers


The first thing I recommend is to make a simple insert-select statement and let the database handle all data movements. Not very useful if you are moving data between two machines or if you do not have rollback segments large enough to handle the entire request.

Second, I have to learn about the addBatch () method . When you've written your code, it does database feedback for every row you insert, which adds network overhead.



Third, if you already have many rows in your target tables, you need to drop all indexes before your inserts and recreate later. If you leave the indexes in place, they must be updated for each row, adding to the overhead of dirty blocks.

Finally: do you need clustered tables? My experience is that they don't buy you a lot (leave it alone: ​​this experience was in the same tablespace).

+3


source


Well, you cannot call a RAW table in Oracle - it is a reserved word, so an ORA-00903 error will be thrown.

As an aside, you should use:



insert all
into t1
into t2
select * from RAW
/

      

"Row by row is equal to slow deceleration" :)

+1


source


Unless you have a specific reason to process the data in your application, I would go for a straight INSERT AS SELECT. Using Parallel DML can make a huge difference for you.

Check also the INSERT ALL syntax (1 read for 2 records) if that suits your needs.

If you don't have IO problems, 1h should be more than enough ...

Hello

0


source


Conceptually similar to addBatch, you can write a PL / SQL procedure that takes arrays (word, doc, count) and handles server-side inserts. This is conceptually similar as you reduce network travel by sending multiple records in one shot and you can achieve better performance. On the other hand, it is more complex and fragile as it requires writing PL / SQL on the server side and will require additional array logic on the client side. Oracle TechNet has several examples of this.

// Nikolay

0


source







All Articles