Managing Connections to mysql DB using JDBC

106 Views Asked by At

Background

Need to process close to 60 million records from mysql DB using JDBC. The interaction here is just "selection" from DB based on a primary key, id. The fields fetched are id, name and description. Some other asynchronous threads will process fetched data and do not involve DB interaction.

Now this program takes a range of ids as arguments e.g. 1 to 100, 101 to 200 etc. and multiple instances of this program will run on separate machines. Thus each program instance will process different range of ids.

This is a one time job to create backfill data but will hit production DB for mere selection.

Observation

 mysql> SHOW STATUS WHERE `variable_name` = 'Threads_connected';
 +-------------------+-------+
 | Variable_name     | Value |
 +-------------------+-------+
 | Threads_connected | 12    |
 +-------------------+-------+

When I used the range 25 to 95, it had only 20 records but the number of connections observed were 12 and they remained so while the entire processing was going on.

Code

public void processRange(String fromid, String toid) {
    if (!fromid.isEmpty() && !toid.isEmpty()) {
        try {
            Class.forName("com.mysql.jdbc.Driver");
            this.connection = DriverManager.getConnection(this.url, this.user, this.password);
            this.statement = connection.prepareStatement("SELECT id, name, contents FROM sometable WHERE id >= " + fromid + " and id <= "  + toid);
            this.results = statement.executeQuery();

            while(results.next()) {

              if(name != null && !name.isEmpty() && contents != null && !contents.isEmpty()) {
               //PROCESS using async futures
                }
            }
        } catch (SQLException sqle) {
            sqle.printStackTrace();
        } catch (Exception exp) {
            exp.printStackTrace();
        } finally {
            try {
                if(results != null) {
                    this.results.close();
                }
                if(statement != null) {
                    this.statement.close();
                }
                if(connection != null) {
                    this.connection.close();
                }
            } catch (SQLException sqle) {
                sqle.printStackTrace();
            }
        }
    }
}

Problem

This is pretty simple JDBC code. Now if I run multiple instances of this process as mentioned in the Background, how do I make sure that DB does not go down with "Too many connections" error. Is there any way to just fetch the data, close the connection and process async "offline" so as to release connections ASAP.

On the other hand, I think looping through the range and then fetching data for each id is the worst thing to solve this problem as against fetching in a batch. Please comment if anybody thinks otherwise.

UPDATE

With current speed it will take months to finish the task...! Any other parallelism method to make the entire task faster ?

I am also trying to use MAP REDUCE to achieve the same, though I am not sure if that is a valid use case.

The above observation was based on a local machine. When I tested it on a relatively powerful box, the connections were very less as compared to max number. I am running 5 instances of this process on same machine looking at a range of 10000 ids (total 50K) processed by resultset. Still I have 17 connections and max allowed are 500. But it takes a lot of time to finish as such and CPU usage shoots to 99%. So need some tweaking to make it run faster or distribute to other machines.

2

There are 2 best solutions below

0
On BEST ANSWER

The problem was not with DB. The bottle neck was FTP. The images were being created based on info fetched from DB and it was supposed to be FTPed to another host. So instead putting each image to FTP server, multiple images were zipped together and then uploaded.

4
On

Case 1: Most of the code is in MySQL, not in Java:

Create a special "user" login for this process. Set max_user_connections = 12. "Catch" the associated error when you try to Connect. Sleep a second and retry.

Note that, depending on which MySQL version you are using, what your code does, how many CPU cores you have, etc, having too many threads will actually hurt performance. Some older benchmarks show 4-8 as a practical limit. Some newer ones (version 5.7) show 64. (But that is for a certain workload, not yours.)

After you have gone to all this effort, you may find that having multiple asynchronous processes does not help nearly as much as you hoped.

Case 2: Most of the processing is outside MySQL:

It should be ok to have max_connections = 1000 and spawn lots of threads on lots of servers. This is because each MySQL thread will be in 'Sleep' state most of the time.