Fast Insert Oracle Hash Cluster Table
Since I started the process of inserting 7M rows from one table to two others, I am now wondering if there is a faster way to do this. The process is expected to complete in an hour, within 24 hours of processing.
Here's how it's done:
Data from this table
RAW (word VARCHAR2(4000), doc VARCHAR2(4000), count NUMBER);
should find a new home in two other cluster tables T1 and T2
CREATE CLUSTER C1 (word VARCHAR2(4000)) SIZE 200 HASHKEYS 10000000;
CREATE CLUSTER C2 (doc VARCHAR2(4000)) SIZE 200 HASHKEYS 10000000;
T1 (word VARCHAR2(4000), doc VARCHAR2(4000), count NUMBER) CLUSTER C1(word);
T2 (doc VARCHAR2(4000), word VARCHAR2(4000), count NUMBER) CLUSTER C2(doc);
via manual commit Java Inserts like this
stmtT1 = conn.prepareStatement("insert into T1 values(?,?,?)");
stmtT2 = conn.prepareStatement("insert into T2 values(?,?,?)");
rs = stmt.executeQuery("select word, doc, count from RAW");
conn.setAutoCommit(false);
while (rs.next()) {
word = rs.getString(1);
doc = rs.getString(2);
count = rs.getInt(3);
if (commitCount++==10000) { conn.commit(); commitCount=0; }
stmtT1.setString(1, word);
stmtT1.setString(2, doc);
stmtT1.setInt(3, count);
stmtT2.setString(1, doc);
stmtT2.setString(2, word);
stmtT2.setInt(3,count);
stmtT1.execute();
stmtT2.execute();
}
conn.commit();
Any ideas?
source to share
The first thing I recommend is to make a simple insert-select statement and let the database handle all data movements. Not very useful if you are moving data between two machines or if you do not have rollback segments large enough to handle the entire request.
Second, I have to learn about the addBatch () method . When you've written your code, it does database feedback for every row you insert, which adds network overhead.
Third, if you already have many rows in your target tables, you need to drop all indexes before your inserts and recreate later. If you leave the indexes in place, they must be updated for each row, adding to the overhead of dirty blocks.
Finally: do you need clustered tables? My experience is that they don't buy you a lot (leave it alone: ββthis experience was in the same tablespace).
source to share
Unless you have a specific reason to process the data in your application, I would go for a straight INSERT AS SELECT. Using Parallel DML can make a huge difference for you.
Check also the INSERT ALL syntax (1 read for 2 records) if that suits your needs.
If you don't have IO problems, 1h should be more than enough ...
Hello
Conceptually similar to addBatch, you can write a PL / SQL procedure that takes arrays (word, doc, count) and handles server-side inserts. This is conceptually similar as you reduce network travel by sending multiple records in one shot and you can achieve better performance. On the other hand, it is more complex and fragile as it requires writing PL / SQL on the server side and will require additional array logic on the client side. Oracle TechNet has several examples of this.
// Nikolay
source to share