how to get type of column value

4k Views Asked by At

I 'm using Java to connect to Cassandra. I want to do something like check the data type of column i.e; whether it is long or UTF-8 Because, if it is long then I can get the value as column.value.getLong() but if it is UTF-8 or other, I have to convert ByteBuffer to String. Can somebody help me how can I find the type of Column ?

3

There are 3 best solutions below

1
On

https://issues.apache.org/jira/browse/CASSANDRA-2302 is a Cassandra feature request for implementing ResultSet.getMetaData. A comment provides information on how it might be accessed:

ResultSet rs = stmt.executeQuery("select ...");
ResultSetMetaData md = rs.getMetaData();
CassandraResultSetMetaData cmd = md.unwrap(CassandraResultSetMetaData.class);

However, I'm afraid it wasn't implemented until Cassandra 0.8. Your question is tagged cassandra-0.7.

2
On

To get the column specific information, you will first have to iterate through the Column Family definitions in the Keyspace definition and match the Column Family by name -- can use the thrift API, but I would suggest using Hector.

With the Column Family Definition, iterate through the Column Metadata, and finding the match for the desired column. Then refer to the matching Column Definition, to get the validation class. If there is no metadata, or no matching column, the validation class will be the default validation class in the Column Family Definition.

Using Hector API, the following will list all Column Families in the keyspace and complete details on the CF name passed as argument.

public static void main(String[] args) {
    String hostPort = "localhost:9160";
    String cfname = null;

    if (args.length < 1)
    {
        System.out.println("Expecting <CF>  as arguments");
        System.exit(1);
    }
    cfname = args[0];

    Cluster cluster = HFactory.getOrCreateCluster( "myCluster", hostPort );
    KeyspaceDefinition ksdef = cluster.describeKeyspace("myKeyspace");

    for (ColumnFamilyDefinition cfdef: ksdef.getCfDefs()) {
        System.out.println(cfdef.getName());
        if (cfdef.getName().equals(cfname)) {
            System.out.println("Comment: " + cfdef.getComment());
            System.out.println("Key: " + cfdef.getKeyValidationClass());
            System.out.println("Comparator: " + cfdef.getComparatorType().getTypeName());
            System.out.println("Default Validation:" + cfdef.getDefaultValidationClass());
            System.out.println("Column MetaData:");
            for (ColumnDefinition cdef: cfdef.getColumnMetadata()) {
                System.out.println("  Column Name: " + Charset.defaultCharset().decode(cdef.getName()).toString());
                System.out.println("    Validation Class: " + cdef.getValidationClass());
                System.out.println("    Index Name: " + cdef.getIndexName());
                System.out.println("    Index Type: " + cdef.getIndexType().toString());
            }
        }
    }


}

If you run that, you will notice that any validation class will belong to the org.apache.cassandra.db.marshal package and each type is derived from AbstractType.

Once you have the Type, you can make decisions on your data. For example, if writing a data dumper tool, you might just want to get the string representation of each column and you can use the AbstractType to get the string representation of the value, using the TypeParser to create the type.

E.g. a non-Hector method I used to do this looks like

private String getAsString(java.nio.ByteBuffer bytes, String marshalType) {

    String val = null;
    try {
        AbstractType abstractType = TypeParser.parse(marshalType);
        val = abstractType.getString(bytes);
    } catch (ConfigurationException e) {
        e.printStackTrace();
    }

    return val;
}

You could use this method to dump out keys and column names; those type names are in the Column Family Definition as well.

One quick shortcut, if you know the column value is a string, since there is no method on the byte buffer to getString, you have to use java.nio.charset.Charset:

Charset.defaultCharset().decode(col.getValue()).toString()
0
On

Typically I know what data types to expect in my application particularly if I'm using static column families; but if I am using dynamic column families, or if I just want to keep my code generic, I tend to set my columns to BytesType and serialize/deserialize them as Object types.

e.g. consider the following column family:

create column family album
  with key_validation_class = 'UTF8Type'
  and comparator = 'UTF8Type'
  and default_validation_class = 'BytesType';

Using Hector's ObjectSerializer you can read and write your column values as Object types. The values will actually be serialized objects in your column family, and when deserialized in Java code the values will become usable Java objects. The following is what my client code would look like:

/* some code left out for brevity */

String columnFamily = "album";
ThriftColumnFamilyTemplate<String, String> template;

public void write(String key, Map<String, ?> album)
  Mutator<String> mutator = template.createMutator();

  for (Entry<String, ?> entry : album.entrySet()) {
    mutator.addInsertion(key, columnFamily, HFactory.createColumn(entry.getKey(),
        entry.getValue(), StringSerializer.get(), ObjectSerializer.get()));
  }
  mutator.execute();
}

public Map<String, ?> read(String key) {
  ColumnFamilyResult<String, String> result = template.queryColumns(key);

  Map<String, Object> album = new HashMap<String, Object>();
  for (String name : result.getColumnNames()) {
    HColumn<String, ByteBuffer> column = result.getColumn(name);
    album.put(name, ObjectSerializer.get().fromByteBuffer(column.getValue()));
  }
}

Here's a simple test to show you that the column values retain their Object types after deserialization from the column family:

public static void main(String[] args) {
  Map<String, Object> album = new HashMap<String, Object>();
  album.put("name", "Up The Bracket");
  album.put("release", 2002);
  album.put("in_stock", true);

  /* write into column family and read it back out */
  client.write("up_the_bracket", album);
  Map<String, ?> result = client.read("up_the_bracket");

  /* the column values are deserialized back into their original types */
  assert result.get("name") instanceof String;
  assert result.get("release") instanceof Integer;
  assert result.get("in_stock") instanceof Boolean;
}