Java.net connection.socketexception reset

I am trying to download a very large dataset from a remote Amazon RedShift server (Postgresql database). User log data. Because the data is very large. I am fetching the IDs of the users who visit the website for a given period of time and then fetch their logs recursively.

The code looks like this.

static Connection getUserLogConn() throws SQLException, ClassNotFoundException {
        System.out.println("-------- PostgreSQL "
                + "JDBC Connection Testing ------------");

        Class.forName("org.postgresql.Driver");
        Connection connection = null;
        connection = DriverManager.getConnection("<address>", "<username>", "<password>");
        return connection;
    }

    static LinkedList<String> extractAllUIDsFromRemote( Connection connection ) throws SQLException, UnknownHostException {
        LinkedList<String> allUIDs = new LinkedList<String>();
        String query = "SELECT distinct uid " + 
                       fromStr +
                       " WHERE ts >= " + startTime;
        if(!endTime.equals(""))
            query += " AND ts < " + endTime;

        System.out.println("Sent SQL to RedShift: " + query);

        // ***Below statement is where the exception occurs ***
        ResultSet rs_uid = connection.createStatement().executeQuery( query ); 

        System.out.println( "Received all UIDs successfully" );

        int n = 0;
        while( rs_uid.next() ) {
            // The cursor points to a row in the result
            n++;
            String uid = rs_uid.getString( "uid" );
            allUIDs.add(uid);
        }
        System.out.println( n + " docs are retrieved." );

        return allUIDs;
    }    


    static void queryIndividualUserLog( Connection connection, LinkedList<String> uids ) throws SQLException, UnknownHostException {
        MongoDBManager db = new MongoDBManager( database, "FreqUserLog" );
        db.createIndex("uid");
        db.createIndex("url");

        StringBuffer sb = new StringBuffer();
        int i = 0;
        for( String uid : uids ) {
            sb.append( "uid='" + uid + "'" );
            // Compose SQL query every 10000 users
            if( ( i != 0 && i % 10000 == 0 ) || i == uids.size() - 1 ){
                System.out.println("Processing up to User " + i);
                String query = "SELECT * " + 
                               fromStr +
                               " WHERE " + sb.toString() +
                               " AND ts >= " + startTime;
                if(!endTime.equals(""))
                    query += " AND ts < " + endTime;

                System.out.println("Sent SQL to RedShift for retrieving individual users' logs");
                **ResultSet rs_log = connection.createStatement().executeQuery( query );** // This step takes time to wait for the response from RedShift
                System.out.println( "Received individual users' logs succesfully" );

                while( rs_log.next() ) {
                    db.insertOneLog( rs_log ); // one log = one doc, i.e. one row
                }
                System.out.println( "Have written to DB." );
                sb = new StringBuffer();
            }
            else {
                sb.append( " OR " );
            }   
            i++;
        }
        System.out.println(uids.size() + " user log are stored into DB");
    }

    public static void main(String[] args) throws ClassNotFoundException, SQLException, UnknownHostException {

        Connection connection = getUserLogConn();
        if(connection != null) {
            System.out.println( "Connect succesfully" );


        /** Extract all users' UIDs, and store them in FreqUserIDs collection */
            LinkedList<String> allUIDs = extractAllUIDsFromRemote( connection );

        /** Query all records of freq users from RedShift, and store them in FreqUserLog collection */
            queryIndividualUserLog( connection, allUIDs );

        connection.close();
    }

      

However, the problem is that sometimes the exception is thrown. The statement in the comment is "***" in the code where the problem occurs.

org.postgresql.util.PSQLException: An I/O error occured while sending to the backend.
at org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:218)
at org.postgresql.jdbc2.AbstractAbstractJdbcAbstractedly5statement.excute(AbstractJdbcabstractedly5statement.java:561)
...
Caused by java.net.SocketException: Connection reset
at java.net.SocketInputStream.read(Unknown Source)
at java.net.SocketInputStream.read(Unknown Source)
at org.postgresql.core.VisibleBufferedInputStream.readMore(VisibleBufferedInputStream.java:143)
org.postgresql.core.VisibleBufferedInputStream.ensureBytes(VisibleBufferedInputStream.java:112)
org.postgresql.core.VisibleBufferedInputStream.read(VisibleBufferedInputStream.java:194)
org.postgresql.core.PGStream.Receive(PGStream.read)

      

Since I cannot access the remote Postgresql server, I have no database log. I was looking for this problem. Many of the related questions are about "Connection reset by peer" and not "connection reset" here. Someone says "connect reset" means the connection is closed on that side, that is, on my side. But I don't know why this is happening and how to fix it. Thank you.

UPDATE: I am guessing that the query process usually takes too long because the data is too large. So the problem keeps waiting for a response from RedShift. In this case, my program is closing the connection due to a timeout. I don't know if this is true ... If so, is there a better solution? (I mean, better than decreasing the number of users who request each time. The number is now 10,000).

+3


source to share





All Articles