Loading data from Java to Postgresql is very slow

I have 26 CSV files that I want to pull from the internet on a nightly basis and load them into a Postgresql table. It works for me using Java, PreparedStatement and Batch. Despite this, the performance is very slow. It takes 30 minutes to grab 6,000 or so records and put them in Postgresql. This is my first time doing something like this, so I definitely don't have a starting point as to whether it's fast or slow.

To get the file I am using this code.

URL grabberUrl = new URL(csvUrl);
URLConnection grabberConn = grabberUrl.openConnection();
BufferedReader grabberReader = new BufferedReader(new InputStreamReader(grabberConn.getInputStream())); 

      

Then I use PreparedStatement and take values ​​from the input stream and set them

con = DriverManager.getConnection(url, user, password);
pst = con.prepareStatement("insert into blah(name, year) values(?, ?)");
pst.setString(1, name);
pst.setString(2, year);

      

Then I tweak the inserts. I've tried values ​​between 100 and 1000 with no significant performance change.

pst.addBatch();

if (count == 100) { 
count = 0;
pst.executeBatch(); 
}

      

Does anyone have any suggestions as to what I can do to speed things up?

+3


source to share


2 answers


If you can access the files from the PostgreSQL server, try using the copy operator. See link http://www.postgresql.org/docs/9.3/static/sql-copy.html



Also, if you know the quality of the data, you can temporarily remove any table constraints and drop any indexes. You can add constraints and index after loading data.

+1


source


Try the following:

PGConnection con = (PGConnection) DriverManager.getConnection(...);

CopyManager copyManager = con.getCopyAPI();

copyManager.copyIn("copy mytable from stdin with (format csv)", grabberReader);

      



If mytable is heavily indexed, drop the indexes, load and re-create the indexes.

+1


source







All Articles