This applies to rethink DB. Is it better to have a schema with embedded arrays or hash fields as arrays? Keeping my intentions simple.. I am trying to keep track of daily statistics. But I am in between deciding what schema structure is better. Let me elaborate..
Pure array schema:
schema = [
{
title: 'foobar',
dates: [
{
date: 20130926,
views_count: 10,
click_count: 10
},
{
date: 20130927,
views_count: 20,
click_count: 20
},
{
date: 20130928,
views_count: 30,
click_count: 30
}
]
}
]
Hash field array schema:
schema = [
{
title: 'foobar',
dates: [
'20130926' => {
views_count: 10,
click_count: 10
},
'20130927' => {
views_count: 20,
click_count: 20
},
'20130928' => {
views_count: 30,
click_count: 30
}
]
}
]
One that I can think of is.. It's easier to prevent duplicity of dates with the latter. Any other advantages? Or, is there a common convention that developers prefer?
IMHO, your application trumps any DBMS. Instead of focusing on your DB storage choices, focus on your application needs, programming and performance. Persist your program data, then iteratively benchmark and optimize for the dominant cases with schema changes only when necessary. Your application will answer your storage questions. For example, in general:
(1) if you need predominantly ordered access to dates, then use array (2) if you need fast random access to many dates, then use hash
Both the programming language and DBMS semantics matter. Even if the DB has ordered hashes, you language could loose this, e.g., Hashes in Ruby 1.9 and newer are ordered, but were unordered previously.
Of course your schema choices are VERY important. But IMHO, a primary strength of document (non-columnar object data NoSQL) DBMSs is the ability to match natural data structures in programming languages. So I gently encourage you to go back to your application/program as the focus for both questions and answers.