Best way to archive (in flat file) and then purge huge data

704 Views Asked by At

I have written below program to achieve this:

           try {
                PreparedStatement statement = connection.prepareStatement(
                    "SELECT * FROM some_table some_timestamp<?)");
                statement.setTimestamp(1, new java.sql.Timestamp(dt.getTime()));
                ResultSet resultSet = statement.executeQuery();

                CSVWriter csvWriter = new CSVWriter(new FileWriter(activeDirectory + "/archive_data" + timeStamp + ".csv"), ',');
                csvWriter.writeAll(resultSet, true);
                csvWriter.flush();

              } catch (Exception e) {
                 e.printStackTrace();
               }


            // delete from table
            try {
                PreparedStatement statement = connection.prepareStatement(
                        "DELETE FROM some_table some_timestamp<?)");
                statement.setTimestamp(1, new java.sql.Timestamp(dt.getTime()));
                statement.executeUpdate();
            } catch (Exception e) {
                e.printStackTrace();
            }


        }

        dbUtil.close(connection);

Above program would just work fine for an average scenario but I would like to know how can I improve this program which:

  1. Works smoothly for a million records without overloading the application server

  2. Considering there would be many records getting inserted into the same table at the time this program runs, how can I ensure this program archives and then purges the exact same records.

Update: I m using openscv http://opencsv.sourceforge.net/

1

There are 1 best solutions below

0
On

I would like to suggest several things:

  1. refrain from using time as limit point. It can be cause unpredictable bugs. Time can be different in different places and different environment so we should be careful with time. Instead time use sequence
  2. Use connection pool to get data from database
  3. Save information from db in different files. You can store them on different drives. After that you have to concatenate information from them.
  4. Use memory mapped files.
  5. Use multi-threaded model for getting and storing/restoring information. Note: JDBC doens't support many threads so connection pool is your helper

And these steps are only about java part. You need to have good design on your DB side. Not easy, right? But this is price for using large data.