I am currently developing a project and researching the best way to retrieve data from industrial factory sensors connected to PLCs (the controller of the machinery in a factory for example the control motors, speeds, switches... ).
I will explain the objective to achieve and I think my case could be extrapolated to so much different types of industries:
I have several PLCs that give me a lot of different data values. (Many of these values are only booleans and other are analog values, real type for example.)
I will have more than 10.000 sensors in a whole factory.
I want to retrieve the data at least every second for the analog values (for example motor rmp, temperature, humidity....).
For the digital values the data will be saved with timestamp when a event appears.
I want to use Cassandra with timeseries because it looks the most promising and faster technlogy to do that.
My question is about storing analog values every second. Is it better to have a schema like:
timestamp, sensor1, sensor2, sensor3, sensor4
and row and group it by parts in the factory or is it better that
every sensor has his own table
?
The whole system will be developed in Java and it will provide the data to an external company in order to analyse it.
It's not quite clear what your query is. You mention "I want to retrieve the data at least every second for the analogic values (for example motor rmp, temperature, humidity....)".
Does that mean you're querying every second for all 10K sensors? Or for a specific sensor, or for a group of sensors? In cassandra, it's vital to know what your query is before looking at data models. If you're looking for 1 second granuality, one option may be to feed incoming data streams to Spark Streaming, and have the Spark Streaming code save to a Cassandra table that suits what you want to query.
As for the options you mention, it's hard to say without knowing the exact nature of your queries. Having one key ronded to the second may be an option - that would mean 10K or so entries per partition, assuming a data rate or 1/s per sensor. Having a table per sensor would be weird, but you may have a partition per sensor with timestamps for each entry. It really depends on your query.
Perhaps if you gave us an example of how you intend to retrieve the data, we can help better?