What is the best file format to save dynamoDB table?

639 Views Asked by At

because I have a lot of problems with pipeline on Amazon I have decided to use java to backup my database on file. My table is 50GB then I need the best way to save it. This is the java code to read elements and write them into file :

public static void fetchItems() {
    try{
        FileWriter file=new FileWriter(path);
        ScanResult result = null;
        long sum=0;
        do{
            ScanRequest req = new ScanRequest();
            req.setTableName(dataTable);

            if(result != null){
                req.setExclusiveStartKey(result.getLastEvaluatedKey());
            }

            result = dynamoDB.scan(req);
            List<Map<String, AttributeValue>> rows = result.getItems();

            for(Map<String, AttributeValue> map : rows){
                try{
                    JSONObject json=new JSONObject(map);
                    file.write(json.toString());
                } catch (NumberFormatException e){
                    System.out.println(e.getMessage());
                } catch (IOException e) {
                    System.out.println(e.getMessage());

                }
            }
            sum+= result.getItems().size();
            System.out.println("Result size: " + sum);

        } while(result.getLastEvaluatedKey() != null);
        file.flush();
        file.close();
    }catch(IOException e){
        System.out.println(e.getMessage());

What extension of file and how can i save my data? With my code the file is too big and a lot of fields are null (problems with new JSONObject(map) ), does someone has an idea? In the file I find writen :

{"leaseOwner":{"SS":null,"BS":null,"b":null,"s":"ip-120-115-91-22346.eu-west-1.compute.internal:ef5c43f7-f5b7-49cf-99e6-8601db2922e2","n":null,"l":null,"NS":null,"m":null,"NULL":null,"BOOL":null},"leaseKey":{"SS":null,"BS":null,"b":null,"s":"shardId-000000000002","n":null,"l":null,"NS":null,"m":null,"NULL":null,"BOOL":null},"ownerSwitchesSinceCheckpoint":{"SS":null,"BS":null,"b":null,"s":null,"n":"0","l":null,"NS":null,"m":null,"NULL":null,"BOOL":null},"checkpoint":{"SS":null,"BS":null,"b":null,"s":"49551567310479336289724454124452290401607015400753594402","n":null,"l":null,"NS":null,"m":null,"NULL":null,"BOOL":null},"leaseCounter":{"SS":null,"BS":null,"b":null,"s":null,"n":"34905","l":null,"NS":null,"m":null,"NULL":null,"BOOL":null}}{"leaseOwner":{"SS":null,"BS":null,"b":null,"s":"ip-120-115-91-22346.eu-west-1.compute.internal:ef5c43f7-f5b7-49cf-99e6-8601db2922e2","n":null,"l":null,"NS":null,"m":null,"NULL":null,"BOOL":null},"leaseKey":{"SS":null,"BS":null,"b":null,"s":"shardId-000000000000","n":null,"l":null,"NS":null,"m":null,"NULL":null,"BOOL":null},"ownerSwitchesSinceCheckpoint":{"SS":null,"BS":null,"b":null,"s":null,"n":"0","l":null,"NS":null,"m":null,"NULL":null,"BOOL":null},"checkpoint":{"SS":null,"BS":null,"b":null,"s":"49551567310434734799327392878132951190473279252744634370","n":null,"l":null,"NS":null,"m":null,"NULL":null,"BOOL":null},"leaseCounter":{"SS":null,"BS":null,"b":null,"s":null,"n":"34913","l":null,"NS":null,"m":null,"NULL":null,"BOOL":null}}{"leaseOwner":{"SS":null,"BS":null,"b":null,"s":"ip-120-115-91-22346.eu-west-1.compute.internal:ef5c43f7-f5b7-49cf-99e6-8601db2922e2","n":null,"l":null,"NS":null,"m":null,"NULL":null,"BOOL":null},"leaseKey":{"SS":null,"BS":null,"b":null,"s":"shardId-000000000001","n":null,"l":null,"NS":null,"m":null,"NULL":null,"BOOL":null},"ownerSwitchesSinceCheckpoint":{"SS":null,"BS":null,"b":null,"s":null,"n":"0","l":null,"NS":null,"m":null,"NULL":null,"BOOL":null},"checkpoint":{"SS":null,"BS":null,"b":null,"s":"49551567310457035544525923501292620796040147120590684178","n":null,"l":null,"NS":null,"m":null,"NULL":null,"BOOL":null},"leaseCounter":{"SS":null,"BS":null,"b":null,"s":null,"n":"34912","l":null,"NS":null,"m":null,"NULL":null,"BOOL":null}}

After I will have to implement also restore application. Thanks

2

There are 2 best solutions below

4
On

I did it with csv format which seems most concise. the first line should contain field names (sort of like excel title) so they are not repeated every line and stil enable parsing the file. null values are just adjacent commas.

0
On

AWS recently released a DynamoDB cloud native backup to S3 feature which requires no code. In the article they do mention other alternatives as well that have out of the box support for extracting data from DynamoDB into S3 but it makes the most sense to use the feature built into DynamoDB.

In saying that the file format will still contain the type ('S' for string, 'N' for number and so on) within the JSON object (as well as NULL data), but at least it breaks the files into smaller JSON compressed files that you can easily extract from S3.

The greatest advantage of the S3 backup in DynamoDB is that you don't consume any of the read capacity units of the table so there's no chance of throttling of your table (could be dangerous in production).