Inserting data in bins for the first time in aerospike followed by increment

450 Views Asked by At

I am storing some counters in aerospike - say counter a,b and c along with a parent_id say pid and obviously a pk say pk. I need to increment the counters from my service so I wrote three functions incrementA, incrementB and incrementC.

Now assume, I had to call function incrementA from the service. The counter a gets incremented but none of the other counters get initialised to 0. Now, I understand that when I call incrementB or incrementC the respective counters will get incremented but I can't find a way to initialise the pid.

I could think of the following ways to solve the above issue:

  1. Write the initialisation logic in my service. So, in aerospike I have something like {'pk':'pk1', 'a':0,'b':0,'c':0,'pid':'pid1'}.
  2. Whenever I do an increment for any counter, I also do an initialization for other counters or the pid if not already done.

The problem with the above ways is:

  1. To initialize from the service, I will have to check if the initialization is already done (otherwise I will be resetting my counters). This would essentially double the number of aerospike calls. (Would batching calls help here?)
  2. In case of initializing for others while incrementing a particular counter, I will be updating the pid again and again with the same value.

I would appreciate it if someone could suggest a better way!!

PS: I need pid in the db for every pk because I need to query all the pks having the same pid.

2

There are 2 best solutions below

0
On

You can call incrementX (A,B or C) to increment X without initialising the others and on the side of the application that reads the counters if a bin of a specific counter doesn't exists treat it like its counter is 0.

Regarding the pid I would consider adding it (or creating the record with it) in another flow - once you know the pk/parent relationship (even if its when the pk is created) add/update the record (pk and pid only) - not on every incrementation request.

The default value of RecordExistsAction field of the WritePolicy (part of every write method in Aerospike) is UPDATE meaning Create if not exists or Update if exists so it doesn't matter which runs first - the incrementation or the pk - parent_id relationship.

Another solution (a bit of an overkill) might be to add a singleton service that contains a Map<Integer, Boolean> of pk ids and an initialization indications.

  • Every time the application starts populate the map based on the records you already have in the database.

When incrementation occurs:

  • If the pk is initialized -> increment the relevant counter.
  • If not -> initialise the pk as you suggested {'pk':'pk1', 'a':0,'b':0,'c':0,'pid':'pid1'} and add the pk to the singleton map to mark it as initialized.
0
On

@paradocslover - Once you accept the options and select one, details of implementation are straightforward.

Aerospike is record centric and does not store record versions or record history. A client can only find if the pid key:value exists in the record when it transacts with that record.

You have 3 options here.

a) Pessimistic - client always sends the pid, insert in record if not there.

b) Optimistic - client assumes pid is there, if not found, go and insert it as a second transaction.

In b) there are two options - (i) if not found, raise an exception - catch the exception and retry with pid k:v data. That causes client code to incur inefficiency

or (ii) always read back the pid, if null, insert it in next transaction.

In terms of network bandwidth penalty, (a) and (b)(ii) are equivalent - one is sending pid, other is getting pid - with every transaction. So really, you have to decide what is the best option for your use case.

If it is possible for you to separately always initialize the record and it will be a rare case that you will have an exception of pid k:v not found, go that route. If you want to handle pid insert-if-not-present in single transaction, send pid every time. It will be used if needed. Reading back the pid approach has network penalty + second transaction but no exception generation. If pid generation with every transaction is non-trivial, then you can use this approach.

Note: When you go multi-transaction approach, you must consider possibility of second transaction not completing. In strong consistency mode you have much better protection in corner cases (split brain, master got write, replica failed etc.) When you consider all those scenarios, you are better off with (a) - always send pid, use if needed.