Assume that I have a sample User table with fields (userid, username, dob, loc, email). I have different columns names like one user can have only userid, user name and second user can have userid, username, dob and third users may have other columns and so on (representing wide column store). Each user can have their own set of columns. How to represent this in a single CQL table as data is sparse across different columns. Do I need to go with multiple user tables to represent this info?
Please suggest how to convert sample table into CQL table.
I see what you're asking. No, you should be able to build all columns into a single table. Let's say that I create a
users
table like this:And then I INSERT some rows like this:
If I run an unbound query to pull back all rows and columns, I see this:
So this should tell you a couple of things.
Cassandra isn't really "schema-less" in the same sense that it was in the pre-CQL world. You do have to define each column in your table definition.
With #1 being said, you do not have to provide a value for each column.
Columns which have not been assigned a value display as "null." Keep-in-mind, that they are not true null values, they just display that way.
In my INSERTs, I only specified the columns for which I had corresponding values. I did not set the others to empty or null. Explicitly setting a column value to null creates a tombstone.
Now to my point about understanding your query requirements, you usually want to create a table for each query required. I have the "users" table keyed by
userid
...but really, how useful is that? Applications typically query by things likeemail
orloc
.To handle email, I would create a new table with the same column definitions called "users_by_email." The main difference (other than the name) would be the PRIMARY KEY definition:
PRIMARY KEY (email,username)
Then I could query a user by email like this:
That's why knowing your query patterns makes a difference. In the first example, the "users" table cannot support a query by
email
.