Use multiple fields as key in aerospike loader

1.1k Views Asked by At

I am wanting to upload a psv file with records holding key statistics for a physician, location and a practice, stored per day.

A unique key for this entry would consist of a:
physician name,
practice name,
location name, and
a date of service.

Four fields all together.

Configuration file example for Aerospike loader shows only version with single key, and I am not seeing the syntax for multiple entries.

Can someone advise me please if this would be possible to do (have configuration listing multiple key fields using columns from the loaded file), and also show me the example.

3

There are 3 best solutions below

0
On

You can create a byte buffer and convert the fields into bytes then add them to the byte buffer. But when reading you will need to know the dataType or the format for keys to extract them from the byte buffer.

 var keyVal = new ArrayBuffer[Byte]
  for ( j<- 0 until keyIndex.length)
    {
        val field = schema(keyIndex(j))
        field.dataType match {
        case value: StringType => {
           keyVal = keyVal.+=(row(keyIndex(j)).asInstanceOf[String].toByte)
        }
        case value: IntegerType => {
           keyVal = keyVal.+=(row(keyIndex(j)).asInstanceOf[Integer].toByte)
        }
        case value: LongType => {
           keyVal = keyVal.+=(row(keyIndex(j)).asInstanceOf[Long].toByte)
        }
      }
   }
   val key: Key = new Key(namespace, set,keyVal.toArray)

KeyIndexes = array containing the index of key fileds

Schema = schema of the fileds.

row = a single record to be written.

When extracting the values if you know the schema for the keys Like you made a key from int, int,Long you can extract by first4bytes.toInt and next4.toInt and Last8.toLong.

0
On

There is no simple answer as to the "best way" and it depends on what you want to query at speed and scale. Your data model will reflect how you want to read the data and at what latency and throughput.

If you want high speed (1-5ms latency) and high throughput (100k per second) of a particular piece of data, you will need to aggregate the data as you write it to Aerospike and store it using a composite key that will allow you to get that data quickly e.g. doctor-day-location.

If you want a statistical analysis over a period of time, and the query can take a few seconds to several minutes, then you can store the data in a less structured format and run Aerospike aggregations on it, or even use the Hadoop or Spark directly on the Aerospike data.

2
On

Join the keys into one string. For readability, use separator like ":".

It might useful to know that aerospike does not store original keys, it stores digests (hashes) instead.